Jump to content

Intel Powers on Meteor Lake

porina

Intel_IM_Meteor_Lake_678x452.png.9e97c8cd82d169a209ce56daba1e0f87.png

 

Quotes

Quote

Alongside Intel’s regular earnings report yesterday, the company also delivered a brief up on the state of one of their most important upcoming products, Meteor Lake. Intel’s first chiplet/tile-based SoC, which completed initial development last year, has now completed power-on testing and more. The news is not unexpected, but for Intel it still marks a notable milestone, and is important proof that both Meteor Lake and the Intel 4 process remain on track.

 

Summary

Intel has stated that Meteor Lake CPU has been powered on, with suggestions it is already capable of running an OS. This will be Intel's 14th gen arriving next year, with Raptor Lake expected to arrive before it this year. Meteor Lake is Intel's first mainstream client "chiplet" offering, as well as being the first to use Intel 4 process node.

 

My thoughts

AMD's consumer offering included chiplet designs since Zen 2 in 2019. The question has been asked when Intel will move in that direction also. From the image shown Intel's approach doesn't exactly mirror AMD's and a single CPU may be broken down into more parts. If the manufacturing side can be tamed, this may offer more flexibility in offerings. Not all parts need to be manufactured on the same process or even only from within Intel. It will be interesting to see how Intel's implementation goes too, since AMD's implementation trades off core count scaling with limited internal connectivity, which is not so much a problem with monolithic optimised designs.

 

It is also no surprise Intel are pushing forward on their fabrication recovery process. Intel 4 node as its name implies would be comparable in between other 3nm/5nm nodes, such as those from TSMC. TSMC 5nm is available today with 3nm expected to be in volume in 2023.

 

Sources

https://www.anandtech.com/show/17366/intel-meteor-lake-client-soc-up-and-running

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Wasn't ponte vecchio tiled? Were they using Intel tech or tsmc tech?

Link to comment
Share on other sites

Link to post
Share on other sites

what is this meant for? Phones or like basic laptops/chromebooks? 

"If a Lobster is a fish because it moves by jumping, then a kangaroo is a bird" - Admiral Paulo de Castro Moreira da Silva

"There is nothing more difficult than fixing something that isn't all the way broken yet." - Author Unknown

Spoiler

Intel Core i7-3960X @ 4.6 GHz - Asus P9X79WS/IPMI - 12GB DDR3-1600 quad-channel - EVGA GTX 1080ti SC - Fractal Design Define R5 - 500GB Crucial MX200 - NH-D15 - Logitech G710+ - Mionix Naos 7000 - Sennheiser PC350 w/Topping VX-1

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, WolframaticAlpha said:

Wasn't ponte vecchio tiled? Were they using Intel tech or tsmc tech?

Ponte Vecchio uses Intel own EMIB to combine Intel and TSMC made chiplets and HBM. The difference is Meteor Lake is for the mainstream so we will actually see this in products probably towards end of next year.

 

15 minutes ago, bcredeur97 said:

what is this meant for? Phones or like basic laptops/chromebooks? 

Meteor Lake is the successor to Raptor Lake, which in turn is the successor to Alder Lake. Variations should cover most of the mainstream, and possibly entry level workstation/server. Low power low cost devices such as chromebooks might might remain on something like Atom (or 1P+nE?).

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I guess this will have the 8P + 24E cores as the highest SKU?

12900K = 8P + 8E

13900K? = 8P + 16E

14900K? = 8P + 24E?

15900K? = 8P + 32E (This one was confirmed to be the target by Intel)

 

TBH I'm not sold on the core count increase being so focused on the E-cores, I personally would rather have something like 12P+16E, ideally having the option would be the best, so people that benefit from the core count can go that way(8+24), while others can go with a more balanced set up, but that probably wouldn't work/make sense for Intel.

Link to comment
Share on other sites

Link to post
Share on other sites

On 5/1/2022 at 3:17 AM, KaitouX said:

I guess this will have the 8P + 24E cores as the highest SKU?

12900K = 8P + 8E

13900K? = 8P + 16E

14900K? = 8P + 24E?

15900K? = 8P + 32E (This one was confirmed to be the target by Intel)

 

TBH I'm not sold on the core count increase being so focused on the E-cores, I personally would rather have something like 12P+16E, ideally having the option would be the best, so people that benefit from the core count can go that way(8+24), while others can go with a more balanced set up, but that probably wouldn't work/make sense for Intel.

The issue is they'll have clock problems if they use more P cores. Their CPU's are already furnaces because they run them at such high clocks. They'll have to do something big with IPC to achieve that. I don't think they can with current P core design.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, RejZoR said:

The issue is they'll have clock problems if they use more P cores. Their CPU's are already furnaces because they run them at such high clocks. They'll have to do something big with IPC to achieve that. I don't think they can with current P core design.

I mean, they could just make the parts with higher core counts all-core clock a bit lower. The 12900K when limited to 180W is only 5% or so slower than when at 250W and 165W is less than 10% slower, so if Intel is willing to have a 241W all core boost CPU they could make the P-core count higher without issues from the power side, they just need to sacrifice the clock a bit.

https://i.ibb.co/BjQBkMx/All-MT-Benchmarks.png

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, KaitouX said:

I mean, they could just make the parts with higher core counts all-core clock a bit lower. The 12900K when limited to 180W is only 5% or so slower than when at 250W and 165W is less than 10% slower, so if Intel is willing to have a 241W all core boost CPU they could make the P-core count higher without issues from the power side, they just need to sacrifice the clock a bit.

Doesn't Intel already do this?

I mean, AMD does it. The 5950X is lower clocked than the 5900X because it has more cores. The 5900X is also lower clocked than the 5800X.

The turbo is higher, but the all-core clocks are lower to keep power and heat in check. It only reaches the advertised 5GHz frequency if you only have 1 core loaded. If you go up to 12 cores loaded then you'll be down to ~4GHz, and if you got all 16 cores loaded then they will "only" run at ~3.7GHz, possibly even as low as 3.4GHz depending on the binning and the cooler.

 

 

I'd also prefer a 12P+16E config.

Seems a bit more balanced, although people buying those super high end processors are probably not looking for balance. I guess there is an argument to be made that if you need more than 8 cores then chances are you will benefit from the additional E cores more than you would from a couple of more P cores.

Link to comment
Share on other sites

Link to post
Share on other sites

RIP AMD, when they are really going to push some wild launches. As with the recent bad cpu launches.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/30/2022 at 4:18 PM, porina said:

Meteor Lake is the successor to Raptor Lake, which in turn is the successor to Alder Lake. Variations should cover most of the mainstream, and possibly entry level workstation/server. Low power low cost devices such as chromebooks might might remain on something like Atom (or 1P+nE?).

Is Raptor Lake on the same socket as Alder Lake? Or is the plan from Intel to have Meteor Lake and Raptor Lake on the next socket together? Don't remember exactly what the roadmaps was for Intel socket releases.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, thechinchinsong said:

Is Raptor Lake on the same socket as Alder Lake? Or is the plan from Intel to have Meteor Lake and Raptor Lake on the next socket together? Don't remember exactly what the roadmaps was for Intel socket releases.

Not paid close attention but I think Raptor is same socket as Alder. The two gens-ish per socket thing has been going on for a while.

 

Sandy/Ivy

Haswell/Haswell refresh/Broadwell

Skylake/Kaby

Coffee/Coffee refresh

Comet/Rocket

Alder/Raptor?

Meteor?

 

Given the changes in Meteor a change to the platform might not be a bad idea.

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, LAwLz said:

Doesn't Intel already do this?

I mean, AMD does it. The 5950X is lower clocked than the 5900X because it has more cores. The 5900X is also lower clocked than the 5800X.

The turbo is higher, but the all-core clocks are lower to keep power and heat in check. It only reaches the advertised 5GHz frequency if you only have 1 core loaded. If you go up to 12 cores loaded then you'll be down to ~4GHz, and if you got all 16 cores loaded then they will "only" run at ~3.7GHz, possibly even as low as 3.4GHz depending on the binning and the cooler.

Yes they do. The base frequency of the 12900K is only 3.2GHz, while the 12700K's is 3.6GHz and the 12600K's is 3.7GHz. It's all about power limits - the 5950X, 5900X and 5800X all use approximate the same amount of power when under a 100% load (according to GN). The only way to achieve this is to send less power to each core. This is one of the benefits of hybrid core architectures - the efficiency cores can make better use of the limited power budget in these scenarios, thereby improving performance. It hardly matters if an E-core can't hit the same single-core performance highs as a P-core if you're stuck running the P-cores at 3.2GHz anyway due to power constraints.

 

5 hours ago, LAwLz said:

I guess there is an argument to be made that if you need more than 8 cores then chances are you will benefit from the additional E cores more than you would from a couple of more P cores.

Pretty much this. I can't think of any workloads that would break this principle other than gaming (which as of today doesn't need more than 8 P-cores) or server stuff like databases (which can probably be improved with time to work better on hybrid architectures, they just don't do particularly well on them today). So for desktop use it's probably the better choice, even if it does feel super odd.

 

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

"Integrated AI Acceleration" ?
Who do they think they are? Google?

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Senzelian said:

"Integrated AI Acceleration" ?
Who do they think they are? Google?

Hmm... I didn't notice that, since I'm not so interested in that area. Thinking about it some more, Intel have or will offer such functionality in two ways I can think of. There were the VNNI instructions included in some versions of AVX-512 (in a quick search, included in Ice Lake and Rocket Lake), and/or XMX instructions included in Arc graphics.

 

Providing it in AVX-512 would imply they return that functionality as it was disabled in Alder Lake. If they only offered it through Arc then it would not be present in any F skus.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

On 5/3/2022 at 8:42 AM, tim0901 said:

This is one of the benefits of hybrid core architectures - the efficiency cores can make better use of the limited power budget in these scenarios, thereby improving performance. It hardly matters if an E-core can't hit the same single-core performance highs as a P-core if you're stuck running the P-cores at 3.2GHz anyway due to power constraints.

I would have to take a much better look in to it but the "just have more P cores and clock them lower" still has a bit of merit to it.

 

Cadence_hot-chips-next-gen-general-purpo.

https://semiengineering.com/the-next-generation-of-general-purpose-compute-at-hot-chips/

 

As you can see in Intel's graph above even a very power limited P-Core still has higher performance than a E-Core, left side graph. However the right side shows different story. There are other factors like die size required. You can get a lot more E-Cores in the same space so it's much cheaper to put in E-Cores to service multi-thread workloads in that respect. Or operating the E-Cores below the minimum power point of the P-Cores.

 

What I would like to see on the above graph on the right is "8 P-Cores" or "10 P-Cores", that's a detail I suspect is left off as it's not as much better than Intel wants to show and it doesn't fully communicate the benefits of the E-Cores and hybrid design. I'm pretty sure if you draw a line up from the P in Power on the left side of the graph, take that power and then deploy 8 or 10 cores at that power level it'll be faster than 2 P-Cores and 8 E-Cores while also being the same power draw overall.

 

Maybe Intel just actually need to go all in and have like a maximum of 2 or 4 P-Cores and then just massively scale E-Cores. Having parts with 8 P-Cores makes the Hybrid approach a little weird because you end up falling in to the "why not just more P-Cores then" debate.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, leadeater said:

There are other factors like die size required. You can get a lot more E-Cores in the same space so it's much cheaper to put in E-Cores to service multi-thread workloads in that respect. Or operating the E-Cores below the minimum power point of the P-Cores.

One P-core requires roughly as much space as 4 E-cores (Note: some things are mislabelled in this picture, but the cores are correct).

 

Intel could probably make a really massive chip with a ton of P cores clocked really low to keep power in check, but my guess is that the cost and yields are what is preventing that from being a good idea.

It would just be very inefficient use of die space.

 

 

13 minutes ago, leadeater said:

Maybe Intel just actually need to go all in and have like a maximum of 2 or 4 P-Cores and then just massively scale E-Cores. Having parts with 8 P-Cores makes the Hybrid approach a little weird because you end up falling in to the "why not just more P-Cores then" debate.

I think their strategy is fine. Raptor Lake will double the amount of E-Cores, and I think 8 high performance cores is a good place to be right now. It allows them to compete in low to medium threaded workloads up to 8 cores, and then be really good at multi-threading.

Chances are that if they reduced the number of P-cores people would be worried about performance in for example games or "medium" threaded applications.

 

Hopefully the i5 will also get some E-cores.

6 P-cores and 4 E-cores in a 200 dollar processor would be amazing.

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, LAwLz said:

I think 8 high performance cores is a good place to be right now.

True right now I'd agree with that, maybe some much later time that might change. Not really sure on that one, lot of other dependent things on that to make it a good approach.

 

13 minutes ago, LAwLz said:

Chances are that if they reduced the number of P-cores people would be worried about performance in for example games or "medium" threaded applications.

Heh very true. Though not sure how many of "those" actually exist. Probably games right now I would suspect, but then game engines understanding hybrid CPUs and doing intelligent thread placement may solve that reducing the need for that many P-Cores.

 

13 minutes ago, LAwLz said:

Hopefully the i5 will also get some E-cores.

6 P-cores and 4 E-cores in a 200 dollar processor would be amazing.

The zero E-Cores in i5 is actually the strangest thing about Alder Lake for me. I would have assumed this was the original target product for this rather than i7/i9. As is apparent for all the Mobile CPUs. I guess implementation cost was too high so better to get that settled on higher profit margin products.

Link to comment
Share on other sites

Link to post
Share on other sites

58 minutes ago, leadeater said:

Heh very true. Though not sure how many of "those" actually exist. Probably games right now I would suspect, but then game engines understanding hybrid CPUs and doing intelligent thread placement may solve that reducing the need for that many P-Cores.

Very likely, in the future.

And then the next step after that is to ratchet up the disparity between cores (sizes, design strengths, etc.) even more.

Think 4P cores that are 2x as large as current pcores and get 20-40% better ST performance... and a truly massive amount of e cores.

Ideally you want to get to the point where there's a dash of hardware acceleration wherever you can and then specialized large and small cores that are complementary to the overall design (this is more or less what M1 is).

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

37 minutes ago, leadeater said:

Maybe Intel just actually need to go all in and have like a maximum of 2 or 4 P-Cores and then just massively scale E-Cores. Having parts with 8 P-Cores makes the Hybrid approach a little weird because you end up falling in to the "why not just more P-Cores then" debate.

Do people really know what they want? It may be a problem to offer to the market what people think they want, vs what they really need.

 

For the purposes of how many P cores we need, we'd have to characterise actual software. Perhaps with some weighting about how much impact it has and how often it is used. Userbenchmark had a go and were crucified for being anti-AMD because it wasn't Cinebench. 3DMark CPU Profile had a go for gaming but I don't know if anyone actually uses it seriously.

 

Before Alder Lake, I'd say I "need" 8 cores to have a leading edge gaming system, as it would at least match current gen high end consoles. I don't know if that is the case now. Maybe fewer P cores and more E cores are better? I don't have an Alder Lake system and don't intend to buy one while it is relevant, but would love to do this testing. I'm sure someone else has done it but it is always a challenge to see what variables were changed in tests. So best I can do is fall back to have at least the perf you think you need.

 

A side thought does occur though. We know much software doesn't scale well with more cores (or threads). Even the software that we currently think of as scaling well, does that hold indefinitely? Back to Cinebench, I've only had systems up to 16 cores/32 threads, and at least up to that point it seems to scale near perfectly. Would that still hold at 64 threaed? 128 threads? 256 threads? Adding a load more E cores may be good for now, but may not hold indefinitely and it may require a change of plan in future. Even get rid of HT/SMT on P cores? They work against single thread performance. Save the P cores for ultimate single thread perf. I don't recall the exact numbers but in the best case HT does give a decent uplift in MT performance compared to the extra die space it takes up, but it is closer to break even on average. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, porina said:

Do people really know what they want? It may be a problem to offer to the market what people think they want, vs what they really need.

Too true, even for myself lol. I have a 4930k for that very reason, not because I actually needed it now, then or ever.

 

I know what I need most of the time, I just choose to ignore that and go with what I want.

 

3 minutes ago, porina said:

A side thought does occur though. We know much software doesn't scale well with more cores (or threads). Even the software that we currently think of as scaling well, does that hold indefinitely?

The group of software that is like "CB" and "Pov Ray" and the group that isn't, yes that is a problem. On the general desktop consumer side I do wonder how much software fits in to the non-independent multi-thread category aka not CB like.

  • Games
  • Excel & Access
  • Ahh?

Actually drawing blanks on example software that isn't more higher end "pro"/actually professional or hobby programmer/dev types.

Link to comment
Share on other sites

Link to post
Share on other sites

52 minutes ago, porina said:

A side thought does occur though. We know much software doesn't scale well with more cores (or threads). Even the software that we currently think of as scaling well, does that hold indefinitely? Back to Cinebench, I've only had systems up to 16 cores/32 threads, and at least up to that point it seems to scale near perfectly. Would that still hold at 64 threaed? 128 threads? 256 threads? Adding a load more E cores may be good for now, but may not hold indefinitely and it may require a change of plan in future. Even get rid of HT/SMT on P cores? They work against single thread performance. Save the P cores for ultimate single thread perf. I don't recall the exact numbers but in the best case HT does give a decent uplift in MT performance compared to the extra die space it takes up, but it is closer to break even on average. 

Cinebench is embarrassingly parallel. You should be able to subdivide it in perpetuity. Statistically speaking, as long as one thread doesn't end up stalling out (at which point it'd be a bottleneck and you might want to move the thread to a stronger core), you should be able to scale the bulk of the work with thread count. The risk of one thread stalling does increase as threadcount goes up but this might not be an issue if there isn't much variability (think max of 100 processes with mean time of 1000 and standard deviation of 2 - running a simulation this example has a half a percent slowdown but YMMV for things with erratic memory access patterns)

Usually the main issue comes when you need to recombine all of the parts, this is likely to be LESS embarrassingly parallel and more sensitive to the performance of the fastest handful of threads.

 

A hypothetical design for cinebench could have something with a profound number of weak cores and a small number of very powerful cores. Depending on the scheduler there might even be a case for mixing relatively WIDE "braniac" cores with a few relatively narrow "speed demons" depending on the amount of instruction level parallelism within the code being run. No idea how "easy" it is to figure that out at either the scheduler or the compiler level (probably hard, EPIC/Itanium failed for a reason).

 

 

3900x | 32GB RAM | RTX 2080

1.5TB Optane P4800X | 2TB Micron 1100 SSD | 16TB NAS w/ 10Gbe
QN90A | Polk R200, ELAC OW4.2, PB12-NSD, SB1000, HD800
 

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, porina said:

A side thought does occur though. We know much software doesn't scale well with more cores (or threads). Even the software that we currently think of as scaling well, does that hold indefinitely? Back to Cinebench, I've only had systems up to 16 cores/32 threads, and at least up to that point it seems to scale near perfectly. Would that still hold at 64 threaed? 128 threads? 256 threads?

From my experience with tile-based renderers - pretty much yeah. The renderer creates a list of tiles to be rendered and they're then just distributed to the CPU cores to render in order. More CPU cores just means more cores reaching to take items from that list. All you have to do to maintain your scaling is to ensure that there are enough items on the list to feed your cores, which you can do simply by reducing the size of each "tile" in your renderer. The more threads you have, the smaller you want to make your tiles, thereby preventing situations where you're waiting on a single slow tile to complete the render. This is why these sorts of workloads are such good targets for GPU compute - you can treat it as if you have thousands of threads available to you and feed each thread a single pixel to render.

 

Another issue you might encounter is when your cores render each tile too fast and thereby start to cause a queue for the list - only one thread is allowed to read from the list at a time to prevent race conditions - you don't want two threads accidentally rendering the same tile. The situation this can create is a quite literal bottleneck - you end up with only a fraction of your cores rendering at any one time because the rest are stuck waiting to be given a job to do. Chances are in this case the image wouldn't take very long to render anyway and so in real life it wouldn't be an issue (or you would take it as a reason to be rendering at a higher quality) but that doesn't really work with benchmarks. This was one of the problems with Cinebench R15 towards the end of its lifespan - the renders were taking too little time to be truly representative of the CPU's compute capability.

 

One place I could see an issue occurring with really high core count CPUs is with IO - rendering is a very memory-intensive process, especially when you're talking production-grade scenes with 200+GB of 3D models and textures per frame. I imagine that cache sizes will be increased enough to at least partially mitigate this problem (there are rumors of such increases for 13th gen) but also the move to DDR5 should help this as well. This is also where the large number of memory channels you get on workstation hardware would be of great benefit.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, tim0901 said:

One place I could see an issue occurring with really high core count CPUs is with IO - rendering is a very memory-intensive process, especially when you're talking production-grade scenes with 200+GB of 3D models and textures per frame.

That was why I asked the question. What isn't a choke point now may not hold forever. I've tested Cinebench (as opposed to testing with Cinebench) and at least within consumer attainable tier hardware it is near ideal scaling. It really doesn't seem impacted by practical ram performance outside of competitive overclocking where 10 points makes all the difference. Single channel DDR4 makes margin of error difference in scores compared to dual channel. Also the code doesn't feel very dense to me. Even when they added AVX support with R20, it doesn't seem to be making much use of it as the performance doesn't scale significantly with known relative AVX performance of CPU generations. They even said they didn't gain any benefit from AVX-512 so didn't implement it. It also gets a relatively high scaling from HT/SMT, implying individual threads aren't a able to load a core heavily. For those reasons it remains an interesting edge case but for sure it does not represent most other use cases.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I'm using an i7 8700, which is getting slow for CPU intensive games like Cyberpunk 2077.

 

I find the progress of CPUs after Intel 9th gen to be brilliant! I skipped 12th gen to avoid the early adopter bugs of Intel big little architecture, but I'll definitely upgrade to 13th gen and likely DDR5 once it comes out later this year.

 

The CPU and GPU arms race is only getting hotter, and is only soured by the supply/demand issues.

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, porina said:

That was why I asked the question. What isn't a choke point now may not hold forever. I've tested Cinebench (as opposed to testing with Cinebench) and at least within consumer attainable tier hardware it is near ideal scaling. It really doesn't seem impacted by practical ram performance outside of competitive overclocking where 10 points makes all the difference. Single channel DDR4 makes margin of error difference in scores compared to dual channel. Also the code doesn't feel very dense to me.

You can't make CInebench render scenes big enough to encounter the IO issues I mentioned - it renders a rather small, fixed scene, that's it.

 

This is one of those scenarios where CInebench becomes a bad indicator of real-world performance. Cinebench will scale pretty much perfectly with as many cores as you throw at it. A real-world tile-based renderer (eg Cinema4D, which Cinebench is based off of) will find issues and bottlenecks during real-world scenarios that - due to it being a restricted benchmark suite - will not be encountered by Cinebench.

 

16 hours ago, porina said:

Even when they added AVX support with R20, it doesn't seem to be making much use of it as the performance doesn't scale significantly with known relative AVX performance of CPU generations. They even said they didn't gain any benefit from AVX-512 so didn't implement it. It also gets a relatively high scaling from HT/SMT, implying individual threads aren't a able to load a core heavily. For those reasons it remains an interesting edge case but for sure it does not represent most other use cases.

Cinema4D does support AVX512 though (though the use of Embree). So it's not that renderers can't benefit from it - they very much still can - it's that Cinebench does not. 

 

The reason AVX512 didn't provide an uplift on Cinebench was attributed to the significant core clock reduction that older CPUs invoked when executing AVX512 instructions in order to deal with the additional heat output - in the days of Skylake-X this was ~500MHz. These days this isn't the case - the 11900K can execute an 8-core AVX-512 workload without dropping its core clock. (Source) As such, I would expect AVX512 to increase performance in Cinebench if such an option was available.

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×