Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

The Empire Strikes Back - Alder Lake details revealed at Intel Architecture Day 2021

1 minute ago, leadeater said:

Apple is doing what is best for their archecture and ecosystem and Intel, and AMD for that matter, have to do what is best for theirs. Something that works well for one may not work as well for the other. Bigger wider cores with more execution units fundamentally do make sense but you have to actually be able to utilize them and that is already a problem on all x86 CPUs right now so you're potentially just doing to make that worse.

Absolutely however there is a perfomance/W cost here due to the perf/W being non linear with frequencies. By (being forced) to have a less wide (decoder and core) intel (and AMD) are required to target higher clock speeds to get the same perfomance, this means it is much harder for them to hit that perf/W that apple have. Also the increased frequency requirement likely results in much more cpus that fail the binning target and are unable to run at that frequency goal. 

Link to post
Share on other sites
14 hours ago, leadeater said:

I think you're a little too trusting of Intel's performance claims about those little cores. They should be more than adequate however they are not cache rich. And no they will not contribute, the only types of applications that won't have problems is like Cinebench or other non thread dependent applications that do not need to pass data or synchronize. As soon as you have dependency then the little cores become near as much completely useless to the big cores.

This is why they are perfect for running multiple low (or even single) thread count background tasks. Like the system doing a re-index of the files for better search. Background tabs in a web browser (limitations of JS multithreading over the background task apis mean the little cores not having large shared cache does not really matter). Also threads that are handling video playback were the video decode is happening on other dedicated decoders but you need some cpu time to hand hold it, eg that twitch streams your watching on the side.

Link to post
Share on other sites
Just now, leadeater said:

burn-the-witch-monty-python.gif

 

haha sorry, this just struck my awful sense of humor.

😂 my dyslexia strikes again 🙂 

Link to post
Share on other sites
6 minutes ago, hishnash said:

eg that twitch streams your watching on the side.

Nope 😉

 

But anyway overall yes, like I mentioned for my gaming PC yea hybrid makes more sense as 16 big fat beefy cores aren't really all that useful, you'd also likely be able to achieve higher clocks on those higher powered cores too. The overall flow on benefits is probably quite a few.

 

I'd be interested to see gaming + streaming tests, because that was like the biggest thing Ryzen 1000 and 2000 era that got banged on about a lot. If the stream side of things actually stay on the E cores and the P cores are left for the game you've got a very nice combination there. My worry however is stream encoding isn't a light task and may get shoved up to the P cores, or back and forward which is not at all ideal. I hope there is a way to hint to the scheduler that "I want these to stay on E cores".

Link to post
Share on other sites
2 minutes ago, leadeater said:

If the stream side of things actually stay on the E cores and the P cores are left for the game you've got a very nice combination there. My worry however is stream encoding isn't a light task and may get shoved up to the P cores, or back and forward which is not at all ideal. I hope there is a way to hint to the scheduler that "I want these to stay on E cores".

So this will depend a lot on the codec you opt to use. On apples chips if you use the correct codecs and apis (typically streaming aps do not) then there is almost 0 cpu time needed for the stream and you effectively point the H265 encoder to the GPU frame buffer (as there is no need for a memory copy) and E core is more than able to that the output buffer of that point it at the SSL crypto system to encrypt it and then point that at the network interface to send it. But with a dedicated GPU to keep up with the raw number of copy commands from the GPU to the CPU (assuming the GPU is not setup to do a push onto a ring buffer) maybe you would need a P core yer.

Link to post
Share on other sites
7 minutes ago, hishnash said:

So this will depend a lot on the codec you opt to use. On apples chips if you use the correct codecs and apis (typically streaming aps do not) then there is almost 0 cpu time needed for the stream and you effectively point the H265 encoder to the GPU frame buffer (as there is no need for a memory copy) and E core is more than able to that the output buffer of that point it at the SSL crypto system to encrypt it and then point that at the network interface to send it. But with a dedicated GPU to keep up with the raw number of copy commands from the GPU to the CPU (assuming the GPU is not setup to do a push onto a ring buffer) maybe you would need a P core yer.

You can use fixed function encoders now yea, the problem comes in from their inflexibility which is why some choose to do it on CPU instead. That's where all the buzz came from when AMD brought 8 actually decent cores to market, though to be fair a lot of that was trying to find a use for them and justify their existence lol.

 

Things like NVEC are actually really good and fine however you can achieve better results on CPU as you can optimize better for the target streaming platform with their defined limitations on bit rates, resolutions and frame rates etc.

 

If you can run up OBS all on E cores and get 100% game performance then expect to see Intel trumpet this wide and loud, as part of their "real world benchmarks" drum they have been hitting recently. Never mind that this use case is tiny but that didn't stop AMD and reviewers at the time either heh.

Link to post
Share on other sites
5 minutes ago, leadeater said:

You can use fixed function encoders now yea, the problem comes in from their inflexibility which is why some choose to do it on CPU instead. That's where all the buzz came from when AMD brought 8 actually decent cores to market, though to be fair a lot of that was trying to find a use for them and justify their existence lol.

 

I don't really understand this at all, professional video applications on macOS will use the fixed function encoding pathways  all the time and these meat the requirements of that industry (when they have access to apples H265 encoder). Is it that they internal (intel or AMD) encoders just are not flexible enough or is it that gamers want to have more knobs to turn that are good for them?

 

 

8 minutes ago, leadeater said:

If you can run up OBS all on E cores and get 100% game performance then expect to see Intel trumpet this wide and loud, as part of their "real world benchmarks" drum they have been hitting recently. Never mind that this use case is tiny but that didn't stop AMD and reviewers at the time either 

For sure! Through there will be a use case (not in streaming) for gamers and other pros of having these lower power cores. I see the real benefit being that the L1, L2 cache of the P cores will be cleared less oftener due to context switches to random background system tasks (assuming the OS is any good at figuring out what a random background task is... something i am extremely skeptical about).

Link to post
Share on other sites
1 minute ago, hishnash said:

I don't really understand this at all, professional video applications on macOS will use the fixed function encoding pathways  all the time and these meat the requirements of that industry (when they have access to apples H265 encoder). Is it that they internal (intel or AMD) encoders just are not flexible enough or is it that gamers want to have more knobs to turn that are good for them?

It's really only a problem when you have to scale down the bitrate, when you only have like 6Mbps to work with being able to make tweaks to the encoder can help a lot with overall output quality. If you're not limited to 6Mbps then sure go ham, no problem!

 

It's not something I actually do but it's a widely talked about thing for twitch streaming etc.

Link to post
Share on other sites
17 minutes ago, leadeater said:

It's really only a problem when you have to scale down the bitrate, when you only have like 6Mbps to work with being able to make tweaks to the encoder can help a lot with overall output quality. If you're not limited to 6Mbps then sure go ham, no problem!

 

Is that due to them not having good enough internet or due to twitch limiting the upload back width? Or are twitch (and other services) unable to accept HEVC video and need you to use H264 (looking at their website looks like they only mention NVENC and h264 so yer)... 

Sad that these streaming platforms lag so far behind the industry, we will soon get SOCs with embedded h266 encode/decode pathways and these will end up being reserved for professional users of high end cameras (that will also start to soon encoded into this format).

Link to post
Share on other sites
1 hour ago, hishnash said:

Is that due to them not having good enough internet or due to twitch limiting the upload back width?

It's Twitch limiting upload bandwidth in to their RTMP ingest servers. They do it to ensure quality of service and to also limit how much resources they need server side, and also their aggerate bandwidth. Also they don't allow variable bit rate either, only constant bitrate.

 

https://stream.twitch.tv/encoding/

Link to post
Share on other sites
On 8/20/2021 at 1:33 AM, porina said:

One thing I didn't like is they compared 4t against 4t, but Skylake was using HT. I see it as 4c vs 2c4t. The "extra" threads from HT or SMT do help a bit, but not nearly as much as many think it does. A good case like Cinebench R15/R20/R23 is only around 30% benefit. So assuming it applies to Cinebench, it is now 4c vs 2.6c effective.

 

I've long speculated, more for AMD than Intel, if SMT might go away. It is much more predictable in performance to run code on a core without SMT, and it also takes out one of the many potential security weaknesses.

I understand that but then consider they say they can achieve 80% more performance than the “2.6” Skylake Cores (2.6 x 1.8 = 4.68). Are they really saying that 4 efficiency cores is equal to more than 4 Skylake cores? It doesn’t sound right to me…

Link to post
Share on other sites
8 hours ago, leadeater said:

I think you are greatly underestimating the usage of AVX2, all the current game engines can compile with AVX2 code paths and will do so unless you specify not to. Most of the physics engines benefit and utilize AVX2, AVX2 itself is way old now, Haswell on the Intel side.

Well, I know intel is pushing for better SIMD utilization in the major game engines (UE come to mind), but besides their marketing materials is there evidence how much this is actually implemented for tangible results?

Outside of games, I know the Topaz AI suite is implementing SIMD optimized routines through Intel's Embree toolchain, but this also applies to the GPU path for inference that's much faster anyway and used as a default option anyway.

Latest Cinema4D renderer is still compiled with mostly 128-bit SSE/AVX op's and large portion of the code is still scalar.

 

Quote

Additionally AVX silicon requirements is fractionally tiny, even the biggest AVX-512 units use hardly are die space, like very low single digit percentage.

Well, the second 512-bit FMA unit in SKL-X is not that small, but it's not the transistor budget the issue here -- it's the peak power load required for activating so many ALU lanes at the same time and the inevitable clock throttling as a consequence, reducing the performance benefit.

Actually, I don't mind AVX-512 being "economically" implemented over the existing 256-bit FMA pipes (just like the AVX was for Sandy Bridge). While the raw throughput will be the same (1x512 = 2x256), there's still some benefits to be drawn by the dedicated mask register set and the doubled number of vector registers in general.

Link to post
Share on other sites
1 hour ago, schwellmo92 said:

I understand that but then consider they say they can achieve 80% more performance than the “2.6” Skylake Cores (2.6 x 1.8 = 4.68). Are they really saying that 4 efficiency cores is equal to more than 4 Skylake cores? It doesn’t sound right to me…

No, the claim was never about 4 Skylake cores, but 2c4t.

 

The claim was that 4 E cores can give 80% more performance than 2C4T Skylake cores. It does not imply the same clock was used, or the same power was used. The 80% reduction in power at same performance was at a different operating point so wont be considered for this part.

 

So let's work it out, effectively what is an E core worth in Skylake cores? 4 E cores = 180% of 2 Skylake HT cores. 2 E cores are 180% of one Skylake HT core, or one E core is 90% of a Skylake HT core.

 

In this comparison, the Skylake core may use HT so there is an unknown variable scaling function depending on the workload there. If we take 30% improvement of Cinebench and use that to estimate what a Skylake core does without HT, one E core would be equivalent to 69% of a Skylake core without HT. The number will of course vary depending on actual workload. Still, for multi-thread workloads, one E core is not more performant than one Skylake core with or without HT. A part of the performance increase claimed was from counting threads not cores.

TV Gaming system: Asus B560M-A, i7-11700k, Scythe Fuma 2, Corsair Vengeance Pro RGB 3200@21334 4x16GB, MSI 3070 Gaming Trio X, EVGA Supernova G2L 850W, Anidees Ai Crystal, Samsung 980 Pro 2TB, LG OLED55B9PLA 4k120 G-Sync Compatible
Streaming system: Asus X299 TUF mark 2, i9-7920X, Noctua D15, Corsair Vengeance LPX RGB 3000 8x8GB, Gigabyte 2070, Corsair HX1000i, GameMax Abyss, Samsung 970 Evo 500GB, Crucial BX500 1TB, BenQ XL2411 1080p144 + HP LP2475w 1200p60
Desktop Gaming system (to be retired): Asrock Z370 Pro4, i7-8086k, Noctua D15, G.Skill Ripjaws V 3200 2x8GB, Asus Strix 1080 Ti, NZXT E850 PSU, Cooler Master MasterBox 5, Optane 900p 280GB, Crucial MX200 1TB, Sandisk 960GB, Acer Predator XB241YU 1440p144 G-sync

Former Main system (to be retired): Asus Maximus VIII Hero, i7-6700k, Noctua D14, G.Skill Ripjaws 4 3333@2133 4x4GB, GTX 1650, Corsair HX750i, In Win 303 NVIDIA, Samsung SM951 512GB, WD Blue 1TB, Acer RT280k 4k60 FreeSync [link]
Gaming laptop: Lenovo Legion, 5800H, DDR4 3200C22 2x8GB, RTX 3070, 512 GB SSD, 165 Hz IPS panel


 

Link to post
Share on other sites
On 8/20/2021 at 2:42 AM, leadeater said:

I think what you are inadvertently pointing out is that most people do not need a 5950X to play PC games yet do it anyway.

I do need it, as it is futureproof for atleast 4 years (I Think).

 

MROTF seems to be Stable....What's up with that? Time to call in the COLONEL!

I do have a question:

When will the name of the thread actually reflect what we are talking about?

 

 

Link to post
Share on other sites
4 hours ago, DuckDodgers said:

Well, I know intel is pushing for better SIMD utilization in the major game engines (UE come to mind), but besides their marketing materials is there evidence how much this is actually implemented for tangible results?

Well the problem there is tangle results, for physics I would say absolutely yes. Actually testing that is exceedingly hard namely because CPUs without AVX2 are much slower in other ways and unless a game has a execution flag to disable a certain instruction set, which most do not then how are you going to test with and without AVX2 on the same CPU to get proper A vs B data?

 

It's a near impossible thing to test properly, namely because CPUs have had AVX2 for 8 years now.

 

However just because there is a code path and compiler support for AVX2 doesn't mean it has been optimally implemented, that doesn't mean the AVX path has been either. So the AVX2 path is still more likely to be higher performance than the AVX path or the SSE4 path etc etc.

 

One can complain for days about application optimization but the reality is it's not going to happen, not now, not next year, not ever unless the tools themselves do a better job. Even gone are the days of exceedingly well optimized and bespoke console games, hardware and development tools. Path of least resistance will always win out eventually, Nvidia and Intel know this well which is why they bank roll certain things.

 

4 hours ago, DuckDodgers said:

Well, the second 512-bit FMA unit in SKL-X is not that small

Yes it is. It's a tiny part of the Skylake core and the Skylake cores only take up a proportion of the total die area. You can go look at the picture I post of the Skylake-SP core with the second AVX-512 unit, it's not even 10% of the core.

 

Here is a representation of the point

image.png.ca2a9ca8d69a4741688a834785a28548.png

 

And this is still twice the area, for both, of [Architecture Name]-S cores that only have a single AVX-512 unit. It's completely insignificant when talking about die area. All the second AVX-512 units combined together on this very high core count CPU would struggle to total a single Skylake core.

 

4 hours ago, DuckDodgers said:

it's the peak power load required for activating so many ALU lanes at the same time and the inevitable clock throttling as a consequence, reducing the performance benefit.

Well now you are changing what you said. And sure AVX-512 units use more power, while archiving significantly more performance so what is the issue? If the application does not use AVX-512 then this is a non existence problem in the first place. If an application does use AVX-512 then it's far better to use it and use that extra power then to not. If you cannot cool it then you've either removed the turbo limits or need to adjust AVX offset.

 

No matter which way you spin this AVX-512 when used it only a benefit. But neither are any of these reasons to keep it in consumer platform cores though, as is why Intel disabled it because it's still in the silicon die on every single CPU so we can entirely put this aspect to bed. If Intel at all thought the area usage was a problem in any way then the Golden Cove cores in Alder Lake would not have it present, but it's still there in the core.

Link to post
Share on other sites
16 minutes ago, The Unknown Voice said:

I do need it, as it is futureproof for atleast 4 years (I Think).

That's not how it works. Games in 4 years are not going to substantively benefit from 16 cores like that and in 4 years time the CPU cores in the current generation is going to be significantly more performant than Zen 3. You should only buy what you need now and then replace more often than stump up large amounts of money for something which will become outdated and slower than modern mid range options. All you're going to see is the same thing with Zen 2 vs Zen 3, a 5600X completely outclasses a 3950X in games. If this did not hold true for Zen 2 then it's not going to hold true for Zen 3 either.

Link to post
Share on other sites
3 minutes ago, leadeater said:

That's not how it works. Games in 4 years are not going to substantively benefit from 16 cores like that and in 4 years time the CPU cores in the current generation is going to be significantly more performant than Zen 3. You should only buy what you need now and then replace more often than stump up large amounts of money for something which will become outdate and slower than modern mid range options. All you're going to see is the same thing with Zen 2 vs Zen 3, a 5600X completely outclasses a 3950X in games.

This is the model I'm looking into:

https://www.newegg.com/black-asus-rog-strix-scar-17-g733qr-ds98-gaming-entertainment/p/2WC-000N-03HJ8?Item=9SIAFXNDV66132

 

More than I possbly need, but for what I do, I would rather buy it filled with high performance parts, than keep buying lower end computers more often.

 

My Dell is 11 years old, and runs just fine, but the onboard video stinks.

Dell said that it would not support windows 10, but it works just fine, and will get a good cleaning and a ssd upgrade hopefully in the next few months.

 

MROTF seems to be Stable....What's up with that? Time to call in the COLONEL!

I do have a question:

When will the name of the thread actually reflect what we are talking about?

 

 

Link to post
Share on other sites
4 hours ago, porina said:

Still, for multi-thread workloads, one E core is not more performant than one Skylake core with or without HT. A part of the performance increase claimed was from counting threads not cores.

And for anything that likes cache this will be even more likely to be the case. Still my opinion is that we really aren't talking about the E cores in the right way, we know they are going to have adequate enough performance so exactly how much I don't think matters so much. It's more about how they can be utilized, what work can be taken off the P cores so they are not burdened with those tasks. Or power efficiency when on battery in a laptop, or idle power usage.

 

I think the performance of the E cores is most important in the 2 + 8 configuration CPU models, or anything with lower amounts of P cores. For these CPUs the E cores are going to be contributing significant amounts of total performance and the P cores will be doing anything that is frequency or latency sensitive and not highly threaded. If an application can utilize 8 threads then it should put that on the 8 E cores rather than limit itself to 2 P cores or place 2 or 4 threads on the P cores then probably have weird execution time and latency difference between the threads.

 

When I'm looking at Alder Lake and the 8 + 8 configurations I just don't actually see those 8 E cores as being all that useful, on desktop all we're going to see is the package power limit being hit by the P cores and then the E cores will do nothing at all. Intel PL1 limits and real world usage of power, that's going to be real interesting to look at, 125W isn't a lot to work with on 8 P cores and 8 E cores.

Link to post
Share on other sites
13 minutes ago, The Unknown Voice said:

This is the model I'm looking into:

https://www.newegg.com/black-asus-rog-strix-scar-17-g733qr-ds98-gaming-entertainment/p/2WC-000N-03HJ8?Item=9SIAFXNDV66132

 

More than I possbly need, but for what I do, I would rather buy it filled with high performance parts, than keep buying lower end computers more often.

 

My Dell is 11 years old, and runs just fine, but the onboard video stinks.

Dell said that it would not support windows 10, but it works just fine, and will get a good cleaning and a ssd upgrade hopefully in the next few months.

 

Oh ok you're looking at laptops. Well I look at them a little less but I suspect you'll have to go with those higher model mobile CPUs anyway as to get the better GPU options they are paired with among other things. Btw the mobile CPUs have a maximum of 8 cores (and 16 threads) so they aren't filled with cores you don't really need.

 

The 5900HX is the best CPU option you can pick in a gaming focused laptop that has good cooling, with the extra CPU power allowed on the HX models you really are getting portable desktop performance since it's nearly on par with the desktop 5800X.

Link to post
Share on other sites
5 minutes ago, leadeater said:

Oh ok you're looking at laptops. Well I look at them a little less but I suspect you'll have to go with those higher model mobile CPUs anyway as to get the better GPU options they are paired with among other things. Btw the mobile CPUs have a maximum of 8 cores (and 16 threads) so they aren't filled with cores you don't really need.

 

The 5900HX is the best CPU option you can pick in a gaming focused laptop that has good cooling, with the extra CPU power allowed on the HX models you really are getting portable desktop performance since it's nearly on par with the desktop 5800X.

I'm also spoiled with a 17 inch screen on the Dell.

That is why I'm going with that size screen (not seeing very well lately), and the 300hz refresh display helps a little.

It is a little overboard, but with running virtual machines, it should do just fine.

Just like the server I built for my former job, it called for a 486, and it got a 6 core, because of the vm's we ran. It still does very well, even after I upgraded the heatsink/fan with one from a FX8370, and dropped the temps 20 degrees c.

 

MROTF seems to be Stable....What's up with that? Time to call in the COLONEL!

I do have a question:

When will the name of the thread actually reflect what we are talking about?

 

 

Link to post
Share on other sites
2 hours ago, leadeater said:

Yes it is. It's a tiny part of the Skylake core and the Skylake cores only take up a proportion of the total die area. You can go look at the picture I post of the Skylake-SP core with the second AVX-512 unit, it's not even 10% of the core.

Small cost is when Intel added a second FADD unit to the SIMD pipeline in Skylake. Chip yield from a wafer scales non-linearly with the defect errors -- every area saving directly reflects on the economy of scale. Hyperscaler clients can absorb the HW costs -- they worry more about software licenses, but the consumer market is much too sensitive to pay for a deadweight silicon.

Link to post
Share on other sites
3 hours ago, leadeater said:

When I'm looking at Alder Lake and the 8 + 8 configurations I just don't actually see those 8 E cores as being all that useful, on desktop all we're going to see is the package power limit being hit by the P cores and then the E cores will do nothing at all. Intel PL1 limits and real world usage of power, that's going to be real interesting to look at, 125W isn't a lot to work with on 8 P cores and 8 E cores.

8 core Zen 2/3 parts have 88W PPT. If Intel 7 brings perf/W in line with that, 125W is plenty for all the cores.

 

I also wonder how the E cores will be used. Our earlier estimate at the performance assumes they'll be used at max performance, but we shouldn't forget the 80% lower power consumption (than 2c4t Skylake) at a different lower performance point. It will be a balance of perf/W, not just looking for max perf as we have traditionally in enthusiast space.

 

 

While here, I missed a detail that has since been pointed out. I wrongly though each cluster of 4 E cores has 4MB L3 cache. It turns out it was shared L2 cache! There is no L3 on those. 

TV Gaming system: Asus B560M-A, i7-11700k, Scythe Fuma 2, Corsair Vengeance Pro RGB 3200@21334 4x16GB, MSI 3070 Gaming Trio X, EVGA Supernova G2L 850W, Anidees Ai Crystal, Samsung 980 Pro 2TB, LG OLED55B9PLA 4k120 G-Sync Compatible
Streaming system: Asus X299 TUF mark 2, i9-7920X, Noctua D15, Corsair Vengeance LPX RGB 3000 8x8GB, Gigabyte 2070, Corsair HX1000i, GameMax Abyss, Samsung 970 Evo 500GB, Crucial BX500 1TB, BenQ XL2411 1080p144 + HP LP2475w 1200p60
Desktop Gaming system (to be retired): Asrock Z370 Pro4, i7-8086k, Noctua D15, G.Skill Ripjaws V 3200 2x8GB, Asus Strix 1080 Ti, NZXT E850 PSU, Cooler Master MasterBox 5, Optane 900p 280GB, Crucial MX200 1TB, Sandisk 960GB, Acer Predator XB241YU 1440p144 G-sync

Former Main system (to be retired): Asus Maximus VIII Hero, i7-6700k, Noctua D14, G.Skill Ripjaws 4 3333@2133 4x4GB, GTX 1650, Corsair HX750i, In Win 303 NVIDIA, Samsung SM951 512GB, WD Blue 1TB, Acer RT280k 4k60 FreeSync [link]
Gaming laptop: Lenovo Legion, 5800H, DDR4 3200C22 2x8GB, RTX 3070, 512 GB SSD, 165 Hz IPS panel


 

Link to post
Share on other sites
8 hours ago, porina said:

8 core Zen 2/3 parts have 88W PPT. If Intel 7 brings perf/W in line with that, 125W is plenty for all the cores.

True but they are also a little bit slower than the higher PPT parts. I still think were are going to be power limit on the 8 + 8 models as you're only getting about 15W per P core without allowing any power at all for the E cores or ring bus or IMC.

 

If we take a look at Intel's current 10SF mobile parts they are already heavily power and thus performance limited, much more so than Intel 7/10ESF is going to make up for.

117496.png

Couple of things here, the 65W TDP 11980HK is only competitive with AMD's 35W HS parts (here or there to which is faster at what). The 11980HK is not much faster than the 1185G7 which is 28W, that should be some serious warning signs as to power scaling with performance even for Intel 7/10ESF.

 

Quote

Intel 7 offers a 10-15 percent performance/Watt gain over 10 nm SuperFin

Both of us know that these gains are referenced in more ideal situations and power runaway at the higher end is still going to be a thing. Golden Cove cores will still easily chug all the power they want and especially so if the goal is to attain peak performance so 125W isn't actually a lot to work with. AMD also have this same problem even with the 5800X.

 

8 hours ago, porina said:

While here, I missed a detail that has since been pointed out. I wrongly though each cluster of 4 E cores has 4MB L3 cache. It turns out it was shared L2 cache! There is no L3 on those. 

They do have L3 cache, 3MB per 4 core module. 8 P cores is a total of 24MB L3 cache and two 4 core E core modules is 6 MB L3 cache for a total of 30MB L3 cache. Have a look at the diagrams again and you'll see LLC paired with the E core modules. Technical specifications of Gracemont also state 3MB L3 cache.

Link to post
Share on other sites
8 hours ago, DuckDodgers said:

Small cost is when Intel added a second FADD unit to the SIMD pipeline in Skylake. Chip yield from a wafer scales non-linearly with the defect errors -- every area saving directly reflects on the economy of scale. Hyperscaler clients can absorb the HW costs -- they worry more about software licenses, but the consumer market is much too sensitive to pay for a deadweight silicon.

I don't think you read everything I wrote because Golden Cove cores have in silicon AVX-512 and on desktop and mobile are only microcode feature disabled. There is zero die area saving from restricting AVX-512 support because it's just that, restriction not removal.

 

It doesn't matter how it benefits the consumer, very limited because it's a tiny saving, as it would cost Intel hundreds of millions to develop a new core without in silicon AVX-512 when they literally do not need to.

 

1%-3% more yield per wafer isn't going to have any affect on product cost anyway, that's specifically only an Intel consideration. The product will be priced at the market rate and it's on Intel as to what they find an acceptable gross margin. If they don't price it within reason of performance then nobody is going to buy it. I'm not even going to speculate on cost saving if die area were reduced because it's simply more complicated than that when talking final product cost.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×