Jump to content

Raptor Lake P-Core only SKUs

alex75871
On 12/20/2023 at 5:10 AM, Kisai said:

No, no, the reasoning behind this has always been a higher clock speed matters more, and has always been the case. If you have a CPU with 2 cores and 4Ghz(like most CPU's produced by Intel until they finally dropped the 2-core i3's with the 7th gen), or 4 cores and 2ghz (like a Xeon) the better CPU always has the higher clock. No DirectX9/OpenGL or DX10 game can utilize more than 2 cores in the graphics pipeline because the graphics pipeline doesn't do that, any multithreading done by a game in DX9 or DX10 is done at the driver level in software, or because the developer went out of their way to find parallelization opportunities elsewhere. I have never seen a single game gain any level of performance from having more cores that wasn't attributable to the higher clock speed/boost of the better CPU. 

This has literally nothing to do with if a game uses more than 2 threads, like at all.

 

Refer to

On 12/20/2023 at 2:22 AM, leadeater said:

Games using more than 2 cores has been long standing now and that includes DX11 games. The whole doesn't use XYZ number of cores has been well investigated by many reviewers, it stems from the faulty reasoning that if 6 threads are only 30%-40% utilization on average then it doesn't need 6 or isn't "using" 6. Background OS is inconsequential, you can baseline your system doing nothing and it's legitimately not worth graphing when comparing to the active game running using whatever number of threads it is using. You can disable 2 cores/threads and force the game to now use only 4 on the same CPU and many times the utilization on average for the threads does not change but the average FPS and 1%/0.1% decrease so the reduction in threads is obviously having a performance impact.

 

Threads do not have to be 100% or even 50% utilized on average to state they are being used, if you collect performance data, ensure repeatability and data accuracy you can determine if something is have an impact or not. If a game positively reacts and increases frame rate from having more threads available then it is using those threads regardless of how little on average. 

I've made it all bold since if you don't read it all then you miss the point. Just to shorten it to the critical point, if more threads increases actual FPS in the game then it's using more threads. It is NOT OS background tasks, that is materially insignificant to 20% more FPS for example.

 

Games benefiting from higher clocks has nothing to do with how many threads it can and will use.

 

And your own final statement at the end of what I have quoted from you just shows you have not at all investigated this and it's 100% just your own misconceptions and assumptions. This is not hard information to find, literally on the same CPU with cores being disabled to prove the theory and clocks fixed. It's covered, more threads help and have done for AGES.

 

Or aka, you have never looked.

 

Same CPUs used, cores disabled to match (where possible) from 4 cores/8 threads to 10 cores/20 threads

 

image.thumb.png.4fc7e477f2954c320096d36bd834c2ff.png

 

On 12/20/2023 at 5:10 AM, Kisai said:

Disabling cores on those higher core count CPU's allows the CPU to boost higher under the same TDP envelope. That's where that performance comes from.

Again false. Firstly the difference in clocks if any, I will cover this next, is not the difference in performance increase percentage wise so this is impossible to be the cause. Secondly Intel boost is by design not uniform across cores and 1 or 2 cores still boost higher to the same frequency if some where disabled in games. Since games are not heavy all core workloads you aren't getting reduced clocks due to thermal or power limits and Intel Turbo Boost 2.0 and 3.0 as well as TVB apply. Maximum possible allowed clocks on induvial cores are achieved in gaming workloads.

 

Any reduction in maximum frequency, if any, on a per core basis is minor and does not mathematically explain the performance difference therefore cannot be attributed as to the reason/cause. Are you seriously saying that a 10900K with 4 cores enabled is boosting 19% higher than the same CPU with 6 cores enabled?

 

10900K Intel official All core specification is 4.8Ghz, 19% higher than that is 5.7Ghz, the Max Turbo is 5.3Ghz. 5.7Ghz != 5.3Ghz.

 

Only 10 cores shows a regression from 8 cores, ~3.5%, which is the difference between Turbo 2.0 5.1Ghz and TVB 5.3Ghz. But that isn't actually why since like I said the clocks are fixed. In 10th Gen having 2 more cores active in this game is could use enough extra power and heat for TVB to not be active, but this is not a factor here. This is of course 10th Gen and 14nm, not today, and would not observed across every game tested.

 

  

On 12/20/2023 at 5:10 AM, Kisai said:

Again, Until DX12u/Vulkan, there was no ability to scale render threads without some proprietary extension that only worked on one GPU driver

Incorrect.

 

Ultimately if you are going to state anything like this

On 12/20/2023 at 12:40 AM, Kisai said:

Rarely does a game use more than 2 cores, let alone 1. Only very recent stuff using DX12u or Vulkan does, and even then most CPU use on other cores are used by other services by the OS

Then you must be very sure you are correct and have done the diligence to check so which you are neither correct nor done that diligence. Such a statement is critically important for many reasons, primarily misleading anyone that happens to believe it. Just take a moment to think about if you were a reasonably large review publication and put that information out on the internet and what feedback you'd be getting.

 

And yes time relevancy does matter, a DX9 game from 2006 is irrelevant to buying advice for today or even 5-8 years ago.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, starsmine said:

Parallel threads are not defined as dependent on each other unless they are racing. There is no way to say the task will run at the slowest core thread without assuming the fast core is waiting on a result from the slowcore and is creating a massive pipeline bubble. 

I can think of two examples where this happens, but neither are day to day use cases for most people. Both Prime95 and Y-cruncher work best when all threads working on the same task finish at the same time. They are working on the same core data set so there are those dependencies. This is unlike Cinebench, where each thread is largely independent on the output of other threads so it scales more ideally. It is my understanding that much of modern gaming code is more closer to Cinebench than Prime95.

 

5 hours ago, leadeater said:

Same CPUs used, cores disabled to match (where possible) from 4 cores/8 threads to 10 cores/20 threads

image.thumb.png.278e2886b9340fac5fef46fa27293df0.png

I did a similar test in the past. Can't find my original chart so had to grab from video.

Watch Dogs Legion 1080p low preset. 7920X with cores/threads disabled. 3070. I was surprised to see scaling beyond 8c16t however small. This was also done to get some data on cores vs threads argument, where I have repeatedly argued thread count is far less important than cores.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Kisai said:

Again, a GAME, the highest performance single-thread performance is the single most important performance metric, particularly for anything that isn't DirectX12u/Vulkan.

 

All other uses, if something can min-max the CPU cores (3D rendering, or Video editing) then equally performing cores is better than not. Asymmetric cache on cores and cores that don't boost evenly will cause every task to run at the slowest core on the system. This is why the scheduler has to be intelligent how it deals with those cores otherwise your 16 core CPU with 3D cache will perform the same as not having one. 

 

Certain loads also do not benefit from parallelized loads at all, such as compression. Compression is linear. Encryption is linear.

 

Not for a long time now. Yes, it helps, but AMD proved that tons of cache does wonders too.

 

Also no, compression isn't linear. Nearly all compression is HEAVILY parallelized with exception of images and audio. 7z (LZMA2) can work on as many cores as you can throw at it and eat as much RAM as you can give it. I've also done enough video encoding to know H.264 and H.265 can encode on all cores you can have as I used that as stability benchmark for a long while.

 

Most modern games use more than 1 or 2 cores. Basically any modern game you try will max out pretty much all cores if you're not GPU bottlenecked in which case CPU will be standing around doing less. Old games run so fast on modern single cores it just doesn't matter anymore.

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, igormp said:

Modern compression and encryption algorithms do scale with more cores. All of your assumptions seem to be related to softwares from 10 years ago.

Tile-based lossy Video compression can scale across cores, but the cores have to be the same clock/cache, otherwise it will be waiting on the disk i/o. Disk i/o within the same file does not parallelize, because data has to be written in order.

 

Lossless compression can NOT benefit from more cores, because it depends on the data that came before it, otherwise you trade efficiency for speed. So if you split an image in half, to compress the top and the bottom of the image, it can not make use of any compression done by the other core. Divide that by 16, like you would in an 8 core 16 thread, and you've probably should just give up trying to losslessly compress that. If it's a lossy image, then it doesn't rely on the compression on the other parts of the image because it's throwing away based on data only in that tile. So if the quality is set to X-1, then it's going to -1 every tile. That's how JPEG, MPEG, etc all work. You can parallelize it only because it doesn't rely on anything. However have you ever seen what happens when you play back a video on a multi-core system when the cores aren't in sync? You get tearing.

 

You know where else you get screen tearing? Games. This is not even something you can benchmark for. But it's very distracting.

 

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, leadeater said:

 

image.thumb.png.4fc7e477f2954c320096d36bd834c2ff.png

 

Again false. Firstly the difference in clocks if any...

This is nothing more than a clock speed argument. If a game truely used more threads, then the frame rate would be exactly the same, when the same number of cores are available. Which is not what that chart shows. That chart shows the faster the CPU, the faster the frame rate, regardless of the cores. That difference between the 10 core and the 8 core shows that, because that's the SAME chip.

 

 

 

13 hours ago, leadeater said:

And yes time relevancy does matter, a DX9 game from 2006 is irrelevant to buying advice for today or even 5-8 years ago.

So you're supposed to decide on your purchase, decision based on games and software you do not yet own, over software you regularly have used for the last decade? Sorry, I thought we were talking about computers and not televisions. Do I feel sorry for those people who bought 3DTV's and then no content materialized? No. You could still use it to watch your content you already have. But good luck finding a new 3DTV to watch any 3D content once that TV dies.

 

Look, you seem to have swallowed some kind of koolaid about more cores improving all workloads, which is not the case, and has never been the case. If a software has not been designed to use hardware threads, then you are relying on the OS to schedule those threads. running a game on Windows 95 where the OS barely even use multiple threads, versus XP when it first became possible to run with multiple cores/threads on a consumer device. Do you think developers thought about multiple CPU's at the time? No. Games and applications were designed to spinwait, there was only one CPU. Games didn't stop doing spinwait because they were developed for consoles. Only when a game was developed for the PC first was this behavior avoided. A properly threaded program gives time back to the thread scheduler. Games don't do this because they are constantly drawing.

 

No Unity game, ever benefits from having more cores if you stick to only managed code. None. It's not even possible to utilize threads that way. You're told to use co-routines, which aren't threads. You're told to reach into unmanaged C++ native libraries that do. And when you start doing that you again run into problems where thread-safe code tends to be rare except on Windows where it has to be thread-safe. WebGL exports do NOT support threads because Javascript does not support threads.

 

So you know what that means? That means Nothing developed on Unity, or anything else that exports WebGL (eg Unreal, Godot, etc) is going to encourage you to use threads. Use the native game engine coroutines and quit trying to optimize the game. It's impossible.

 

That means, highest single-thread clock speed matters more. In ALL client-facing workloads. When you start talking about servers and non-facing tasks, the answer is reversed, the more cores of the same performance matter more, eg Video compression, AI training, Web servers, Game servers, nothing the user needs to care about the user-facing performance. It doesn't matter if video compression or AI training takes a few less seconds if you update the screen or log file less frequently to save on unnecessary disk or pipes i/o. But again, web servers are often deployed in non-threadsafe manners because it's been habit to do that for 30 years. If you actually spend a few hours and configure everything to run threadsafe and not fork every new access to the server, you gain 100x fold increase in performance because the threads aren't being torn down and tasks aren't being forked.  But that only works for the static content. This is why PHP has such a terrible reputation, because most PHP software is designed in a vacuum to not be thread safe. Sure, you could gain more performance by properly making your php program *cough*wordpress*cough* be threadsafe, but maybe it would be better to just put that effort into making the website less dynamicly generated by the server so it can cache things, and rely on the client (the web browser, ala "web 2.0") to populate the data with a JSON.

 

The point I'm making here is that you have to consider what you are using it for. Gamers are better off picking whatever the current "i7" or "i9" Intel CPU or R7/R9 from AMD and just ignoring everything else. They have no voice in what Intel or AMD do, so unless you want to sit on a 11th gen CPU for 7 years to wait until Intel's big.little experiment gains better compatibility, if you need something now, the single-thread performance is the only thing that tells you if one CPU is better than another.

 

(2-1) 3D Particle Movement v2.1 (non-AVX)

(2-2) 3D Particle Movement v2.1 (Peak AVX)

https://www.anandtech.com/show/18693/the-amd-ryzen-9-7900-ryzen-7-7700-and-ryzen-5-5-7600-review-ryzen-7000-at-65-w-zen-4-efficiency/3

 

Then you have this:

SPECint2017 Rate-1 Estimated Scores

https://www.anandtech.com/show/17601/intel-core-i9-13900k-and-i5-13600k-review/6

Do any of those individual numbers matter? No. But it shows you where single thread performance matters more.

 

Quote

Opening things up with SPECint2017 single-threaded performance, it's clear that Intel has improved ST performance for Raptor Lake on generation-upon-generation basis. Because the Raptor Cove P-cores used here don't deliver significant IPC gains, these performance gains are primarily being driven by the chip's higher frequency. In particular, Intel has made notable progress in improving their v/f curve, which allows Intel to squeeze out more raw frequency.

 

And this is something Intel's own data backs up, with one of Intel's performance breakdown slides showing that the bulk of the gains are due to frequency, while improved memory speeds and the larger caches only making small contributions.

Raptor%20Lake%20Slides_31.png

Some of Anandtech's charts from the 12th gen also show the 11th gen outperforming the 12th gen when AVX is involved.

 

(2-2) 3D Particle Movement v2.1 (Peak AVX)

https://www.anandtech.com/Show/Index/17267?cPage=8&all=False&sort=0&page=2&slug=the-intel-core-i7-12700k-and-core-i5-12600k-review-high-performance-for-the-mid-range

 

So that 11th gen Intel part, still outperforms the 13th gen, but not the Ryzen 7xxx parts in AVX. 

 

So if you are quite literately looking to do workloads that depend on AVX, Intel threw that away with the 12th gen.  If you were doing software AV1/HEVC/AVC compression, now the Intel CPU is the worse part. Yet, why would you be doing that work on anything other than the highest-end consumer parts anyway? 

 

The smart thing would be to return AVX512 to the desktop chips, and figure out a way for AVX512 loads to be paused if they are shoved to e-cores rather than not supported and breaking workloads.

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Kisai said:

but the cores have to be the same clock/cache

No, they do not.

1 hour ago, Kisai said:

otherwise it will be waiting on the disk i/o

Not a thing, disk i/o is async in most places.

1 hour ago, Kisai said:

Disk i/o within the same file does not parallelize, because data has to be written in order.

You can have a buffer and then reorder stuff, that's a non issue.

 

1 hour ago, Kisai said:

Lossless compression can NOT benefit from more cores

Wrong, see FFV1, is totally capable of multithreading and does scale with more threads.

1 hour ago, Kisai said:

otherwise you trade efficiency for speed.

First you say that it cannot benefit, then say that it can but with tradeoffs. Make up your mind fam.

1 hour ago, Kisai said:

and you've probably should just give up trying to losslessly compress that

It's still pretty much relevant for videos. For images and audio the file sizes are usually so small that it's not worth the speedup, you're better off just doing 1 job per thread with images/audios.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Kisai said:

This is nothing more than a clock speed argument. If a game truely used more threads, then the frame rate would be exactly the same, when the same number of cores are available. Which is not what that chart shows. That chart shows the faster the CPU, the faster the frame rate, regardless of the cores. That difference between the 10 core and the 8 core shows that, because that's the SAME chip.

You're taking the piss right? 
Same chip, same clock speed, Went up in frame rate from 4/8, to 6/12 to 8/16 cores/threads and hits a ceiling, and you say that its the clock speed making it go up?
There are two different bottlenecks happening here, yes one is clock speed, but you have a (not random) distribution on what a specific frame is actually being bottlenecked by. Here you can distill this to a simple bivariance statistical problem. Fx,y(X,Y) = some Gaussian distribution. X being bottlenecked by clock speed and how much, with Y being bottlenecked by active thread count and by how much. The result of that function is your frame rate. (Normalizing Z, the GPU out of the equation and normalizing W, the memory bandwidth out of the equation by turning them to infinity, standard statistical modeling). To read that chart and come to the conclusion that the Y variable doesnt have a significant effect blows me away. 
 

Also to point out benchmarking reviews always use PCs in unrealistic ways to eliminate variables, Who is playing video games without discord in the background with spotify running and a chrome browser open in the with at minimum a dozen tabs in the modern era. 

15 minutes ago, Kisai said:

So you're supposed to decide on your purchase, decision based on games and software you do not yet own, over software you regularly have used for the last decade? Sorry, I thought we were talking about computers and not televisions. Do I feel sorry for those people who bought 3DTV's and then no content materialized? No. You could still use it to watch your content you already have. But good luck finding a new 3DTV to watch any 3D content once that TV dies.

 

Look, you seem to have swallowed some kind of koolaid about more cores improving all workloads, which is not the case, and has never been the case. If a software has not been designed to use hardware threads, then you are relying on the OS to schedule those threads. running a game on Windows 95 where the OS barely even use multiple threads, versus XP when it first became possible to run with multiple cores/threads on a consumer device. Do you think developers thought about multiple CPU's at the time? No. Games and applications were designed to spinwait, there was only one CPU. Games didn't stop doing spinwait because they were developed for consoles. Only when a game was developed for the PC first was this behavior avoided. A properly threaded program gives time back to the thread scheduler. Games don't do this because they are constantly drawing.

What? I am so lost to the point being made here, even if all your software is using single-threaded code, the OS can put different instances on different threads.

23 minutes ago, Kisai said:

The point I'm making here is that you have to consider what you are using it for. Gamers are better off picking whatever the current "i7" or "i9" Intel CPU or R7/R9 from AMD and just ignoring everything else. They have no voice in what Intel or AMD do, so unless you want to sit on a 11th gen CPU for 7 years to wait until Intel's big.little experiment gains better compatibility, if you need something now, the single-thread performance is the only thing that tells you if one CPU is better than another.

If this is the point being made, the arguments do not track. The better compatibility is now, it's today. 

 

26 minutes ago, Kisai said:

Some of Anandtech's charts from the 12th gen also show the 11th gen outperforming the 12th gen when AVX is involved.

 

(2-2) 3D Particle Movement v2.1 (Peak AVX)

https://www.anandtech.com/Show/Index/17267?cPage=8&all=False&sort=0&page=2&slug=the-intel-core-i7-12700k-and-core-i5-12600k-review-high-performance-for-the-mid-range

 

So that 11th gen Intel part, still outperforms the 13th gen, but not the Ryzen 7xxx parts in AVX. 

 

So if you are quite literately looking to do workloads that depend on AVX, Intel threw that away with the 12th gen.  If you were doing software AV1/HEVC/AVC compression, now the Intel CPU is the worse part. Yet, why would you be doing that work on anything other than the highest-end consumer parts anyway? 

 

The smart thing would be to return AVX512 to the desktop chips, and figure out a way for AVX512 loads to be paused if they are shoved to e-cores rather than not supported and breaking workloads.

 

Again, im not sure the point being argued here. You are correct, if you do a workload that can utilize AVX, the desktop variants of intel are not a smart purchase. Im not sure what that has to do with the topic at hand. 

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, starsmine said:

You're taking the piss right? 
Same chip, same clock speed, Went up in frame rate from 4/8, to 6/12 to 8/16 cores/threads and hits a ceiling, and you say that its the clock speed making it go up?

 

 

Yes. You turn the cores off, and now there's higher headroom. That is what the 8 to 10 core explains. A game doesn't magically spread itself across more threads because it doesn't have more threads to spread out. The difference between one game sticking all it's physics on one thread, and one sticking every destructible object on it's own thread is going to scale differently, but that STILL REQUIRES IT TO HAVE BEEN DESIGNED TO DO SO. 

 

Here, here's one game developer explaining multithreading in a game

image.thumb.png.e3a11f389fcbeb610a79bff0cae3700d.png

 

Emphasis. Cores. If the game has no way to poll the number of cores in the system, like games developed inside another game engine or inside the web browser, they are going to be tuned to whatever the most common denominator is. Which is 2.

image.thumb.png.f1f6da57174ce95a6d795ea34a6e92e0.png

 

image.thumb.png.61432d6d5d96edda7930925e3f12bde8.png

 

I'm not saying anything that isn't being said by game developers.

 

6 hours ago, starsmine said:

 

Also to point out benchmarking reviews always use PCs in unrealistic ways to eliminate variables, Who is playing video games without discord in the background with spotify running and a chrome browser open in the with at minimum a dozen tabs in the modern era. 

Chrome is just making things worse, CEF in general is making things worse because you don't need an entire damn full featured web browser to animate a gif.

 

6 hours ago, starsmine said:

 

Again, im not sure the point being argued here. You are correct, if you do a workload that can utilize AVX, the desktop variants of intel are not a smart purchase. Im not sure what that has to do with the topic at hand. 

Intel's P-core's have nerfed the AVX, so if you were buying or developing software that requires AVX512, suddenly you're being shortchanged. Now a game, or an emulator, or a new video codec, or an AI inference process, or whatever that might use it, can't, and it now has to be assumed that nobody has AVX512. Thanks Intel, I hate it. I'd rather have AVX512 working on the P cores and maybe find a way to make sure AVX512 not hand off to e-cores, or have the e-cores still have all the registers for AVX, but no AVX unit for it so it can be handed back.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Kisai said:

Yes. You turn the cores off, and now there's higher headroom. That is what the 8 to 10 core explains. A game doesn't magically spread itself across more threads because it doesn't have more threads to spread out.

This makes no sense whatsoever.

1 hour ago, Kisai said:

Here, here's one game developer explaining multithreading in a game

 

Bad example of a studio that only deals with RTS games, which are famously known to not scale well with many cores.

 

1 hour ago, Kisai said:

like games developed inside another game engine or inside the web browser

False, that has been possible even in browsers for over 10 years, you really need to update your knowledge on this topic instead of spreading misinformation.

1 hour ago, Kisai said:

and it now has to be assumed that nobody has AVX512.

AMD's adoption actually made it well widespread, many softwares were updated thorough this year to add support for AVX-512, and even check for it on runtime instead of compile-time.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/21/2023 at 6:22 AM, Kisai said:

This is nothing more than a clock speed argument. If a game truely used more threads, then the frame rate would be exactly the same, when the same number of cores are available. Which is not what that chart shows. That chart shows the faster the CPU, the faster the frame rate, regardless of the cores.

Do you have a comprehension problem. Look at the chart, ONLY look at the 10900k for example. Is the framerate higher when only 4 cores are enabled in the BIOS and then the system booted and benchmarked or is the framerate higher with 6 cores enabled?

 

The boost clocks are exactly the same. And no they are not different, and again even IF they were, they are not, the CPU is completely unable to boost to the required frequency to give that performance difference. Worse yet if what you were originally saying were true the 4 core enabled 10900K would be faster than the 6 core enabled 10900K because the 6 core enabled 10900K frequency would be lower.

 

No matter which way you spin it the 6 cores is faster than the 4, no amount of frequency difference is or could be making up the difference.

 

Quote

That difference between the 10 core and the 8 core shows that, because that's the SAME chip.

Shows what exactly? How does this support your original statement, please do explain because I legitimately have no idea what you think it's showing in your mind or how it supports what you are saying.

 

You claimed games don't use more than 2 threads, often only 1, and I've given you direct evidence with isolated variables that this is not true. Are you still arguing against this evidence? I would actually like to stick to the point.

 

On 12/21/2023 at 6:22 AM, Kisai said:

So you're supposed to decide on your purchase, decision based on games and software you do not yet own, over software you regularly have used for the last decade? Sorry, I thought we were talking about computers and not televisions. Do I feel sorry for those people who bought 3DTV's and then no content materialized? No. You could still use it to watch your content you already have. But good luck finding a new 3DTV to watch any 3D content once that TV dies.

Firstly what you use is for you, do not proclaim some universal statement of truality that is actually only your perception of what is true. Games have been using more than 2 cores and 2 thread for more than 10 years, this is not new. It may not matter to you and what you use on the regular but that doesn't make it right for you to go around saying what you did because it's not correct and is just spreading misinformation.

 

This is not an in future argument, games use and have used more than 2 threads for 10 years, the closer to the present the more true that is and the less exceptions to this general norm exists. It was correct 10 years ago to say games use more than 2 threads.

 

Preferencing purchasing a CPU with higher per thread performance which can still today on the most optimal threaded game result in higher game performance does not negate that the game can and does use more threads and will benefit from more. These are not exclusionary. In literally every situation the 4 core variant of the 10900K performed worse than the 6 core variant, both operating at 4.5Ghz, the 10 core configuration operating at 4.5Ghz was still faster than both the 4 core and 6 core. You are more than welcome to ask Hardware Unboxed to do it again on a 14900K but you'd only be wasting their time since the result will not fundamentally change and worse yet for your position the 10 cores vs 8 core 10900K situation won't happen.

 

There are elements of what you say that are true, like single thread performance mattering, but the way you apply that and treat it along with what you say or the way you say it is problematic. This is not a discussion about more cores always being better it's about the issue of you saying games don't use more than 2 threads. So again lets isolate it to just this, because that is the specific misinformation that concerns me, why I choose to address it and what my evidence seeks to address and to show you (and anyone else).

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, igormp said:

This makes no sense whatsoever.

Bad example of a studio that only deals with RTS games, which are famously known to not scale well with many cores.

Then you've missed the point. The game has to be designed to use threads, and if you over-subscribe the cores you actually lower the performance. So that difference between 8 and 10 cores, likely reveals that the game has been designed to not use 10 cores, only 8. That's why it went DOWN with more cores.

 

RTS games and simulator games are notorious about not scaling because they need to run on everything. If a game is released that requires an 8-core CPU, then it's going to not run at ALL, or the performance is going to cut itself directly in half if you run it on a 4-core.

 

Leadeater comes to the wrong conclusion from the chart.

 

 

 

14 hours ago, igormp said:

False, that has been possible even in browsers for over 10 years, you really need to update your knowledge on this topic instead of spreading misinformation.

No browser supports threads. Period. If you export a game from Unreal, Unity, Godot, etc to run in the browser, you only have 1 thread.

https://caniuse.com/sharedarraybuffer and even if you tried, support doesn't exist on Android.

https://forum.unity.com/threads/webgl-roadmap.334408/

Quote

We posted some benchmarks on WebGL performance last year, comparing browsers and native runtimes, which showed decent performance across the board in some areas, and larger gaps both between browser, and between WebGL and native in others. We want to make sure that those gaps will become as small as possible in the future. Most of the benchmarks were we have been seeing large gaps between native and WebGL are in areas which are heavily optimized to use SIMD and/or multithreading, neither of which are available on WebGL right now, but that will change:

 

  • SIMD.js: SIMD.js is a specification to add SIMD support to the JavaScript language. Mozilla, Google and Microsoft are all planning to support this. We will be able to use this to get the same SIMD performance improvements we get on other platforms right now on WebGL as well.
     
  • Shared Array Buffers: Shared Array Buffers will let WebWorkers (JavaScript’s equivalent to threads) share the same memory, which makes it possible to make existing multithreaded code compile to JavaScript. Mozilla has a working implementation of this spec, and they have successfully been running Unity WebGL content with multithreading enabled on this. They ran our benchmark project, with very good results on some of the benchmarks, resulting in several times higher scores when running multi-threaded. Google has also announced plans to add support for Shared Array Buffers

 

And then

https://forum.unity.com/threads/multithreading-and-webgl.817986/#post-6796475

Quote

To get to C# multithreading with enabling multithreading of user C# code, a complete redesign and reimplementation of how garbage collection works in C# on the web will be needed. This is because one cannot scan the native stack (due to security restrictions) on the web, which is the way that our existing Boehm library based garbage collection design works.

 

So, no Unity game is using multithreading if it's using managed code. Exactly what I said.

 

14 hours ago, igormp said:

AMD's adoption actually made it well widespread, many softwares were updated thorough this year to add support for AVX-512, and even check for it on runtime instead of compile-time.

Uh huh, and are the system requirements for these applications going to come with "Only works on AMD Ryzen 7xxx CPU's" cause until Intel brings it back, people are going to be making their purchase decisions based on if they really need AVX512.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Kisai said:

Then you've missed the point. The game has to be designed to use threads, and if you over-subscribe the cores you actually lower the performance. So that difference between 8 and 10 cores, likely reveals that the game has been designed to not use 10 cores, only 8. That's why it went DOWN with more cores.

You are moving goalposts at this point.

Let's refresh what you said before that started this discussion:

On 12/19/2023 at 8:40 AM, Kisai said:

The problem is simply that people don't know what they need. Like does any Game benefit from more cores? Rarely does a game use more than 2 cores, let alone 1.

So now of a sudden games can make use of 8 cores? Ok then.

9 minutes ago, Kisai said:

RTS games and simulator games are notorious about not scaling because they need to run on everything.

No, that's not it. Many simulators scale well with core count (see racing sims, flight sims and stuff like cities skyline).

RTSes are hard because the main bulk of compute is on the AI, and since it's turn based, everything has to happen sequentially.

12 minutes ago, Kisai said:

If a game is released that requires an 8-core CPU, then it's going to not run at ALL, or the performance is going to cut itself directly in half if you run it on a 4-core.

That's blatantly false.

13 minutes ago, Kisai said:

No browser supports threads

All browsers do, the link you gave even shows support for this kind of array in almost all browsers.

13 minutes ago, Kisai said:

If you export a game from Unreal, Unity, Godot, etc to run in the browser, you only have 1 thread.

Again, false.

14 minutes ago, Kisai said:

We posted some benchmarks on WebGL performance last year

WebGPU is a thing and it's miles better, fwiw, and unity already has it in preview mode.

15 minutes ago, Kisai said:

So, no Unity game is using multithreading if it's using managed code. Exactly what I said.

 

Who cares about unity? You said before that no browser supports this kind of thing, which I said it's false.

Anyhow, this quote of yours is only related to C# on WebGL, unity on other platforms does support proper multithreading with its job system abstraction.

17 minutes ago, Kisai said:

Uh huh, and are the system requirements for these applications going to come with "Only works on AMD Ryzen 7xxx CPU's" cause until Intel brings it back, people are going to be making their purchase decisions based on if they really need AVX512.

Are you really that dense?

14 hours ago, igormp said:

and even check for it on runtime instead of compile-time.

 

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Kisai said:

Then you've missed the point. The game has to be designed to use threads, and if you over-subscribe the cores you actually lower the performance. So that difference between 8 and 10 cores, likely reveals that the game has been designed to not use 10 cores, only 8. That's why it went DOWN with more cores.

 

RTS games and simulator games are notorious about not scaling because they need to run on everything. If a game is released that requires an 8-core CPU, then it's going to not run at ALL, or the performance is going to cut itself directly in half if you run it on a 4-core.

 

Leadeater comes to the wrong conclusion from the chart.

So, the game is designed to be multithreaded, but not n-threaded. the thing we all said has been happening well over 10 years that you kept saying wasnt. You kept claiming its not multithreaded, that its ONLY single-core performance that matters. you are the one who keep saying that is the conclusion to draw... your statement is so clearly and obviously faulty here, that none of us are able to understand what point you are trying to make. 

In general additional cores do help gaming performance up until a constant value that varries per game.

I want to point out the performance from 8 to 10 didn't really decrease, the 1% lows stayed the exact same. It simply stagnated in growth because its not n-threaded which no one here has claimed that to be the case. 

 

multi-threading =/= n threading. just that n-threading is a type of multi-threading. 

None of us has argued single core performance does not mater if that is the strawman you are arguing against. I told you already its a bivariable problem.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, igormp said:

 

Are you really that dense?

 

Again, are you going to purchase a software product that says "Requires an AMD 7000 CPU" when you have an Intel chip? No. You're not.

 

If a software product requires AVX512, and it's fallback mechanism runs 100x slower, you're going to either buy the AMD chip if you need that software, or you're going to find another software product that uses maybe the GPU to do the same thing.

 

To go back to the 3DTV analogy. A technology gets stifled from adoption because the parts do not come together at the same time. So the most likely thing that is going to happen is that software will not be designed or optimized to use AVX because it can't be assumed to exist. Nice if it does, but you as the software developer are not going to give your product a blackeye if you don't make it work on all CPU's, and are going to get review bombed if it seems to only support one vendors CPU.

 

Or have we learned nothing from the DLSS/FSR/XeSS support on GPU's when a game seems to only support one of them?

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Kisai said:

Again, are you going to purchase a software product that says "Requires an AMD 7000 CPU" when you have an Intel chip? No. You're not.

 

If a software product requires AVX512, and it's fallback mechanism runs 100x slower, you're going to either buy the AMD chip if you need that software, or you're going to find another software product that uses maybe the GPU to do the same thing.

 

To go back to the 3DTV analogy. A technology gets stifled from adoption because the parts do not come together at the same time. So the most likely thing that is going to happen is that software will not be designed or optimized to use AVX because it can't be assumed to exist. Nice if it does, but you as the software developer are not going to give your product a blackeye if you don't make it work on all CPU's, and are going to get review bombed if it seems to only support one vendors CPU.

 

Or have we learned nothing from the DLSS/FSR/XeSS support on GPU's when a game seems to only support one of them?

software with avx512 fallbacks are not 100x slower. 
Gow3Comparison.png
SSE2/SSE4.1/AVX2/AVX512 from left to right
Raptor lake has AVX2

But I also dont understand the point you are arguing? because intel consumer doesn't have avx512, that nothing will in the future? so its not a selling point?
If software can be accelerated with 512, there is no reason to not have a runtime check to see if its there. Im not even sure what that point has to do with the discussion you have been having. 

Also the same thing applies to games. multi threading does not mean you are locking out 4 cores or less CPUS because your game is capable of scaling to 6 or 8 cores (or more with many DX12/vulkan titles, battlefield V was shown as a DX11 title to dispute your point way back when). 

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, starsmine said:

So, the game is designed to be multithreaded, but not n-threaded. the thing we all said has been happening well over 10 years that you kept saying wasnt. You kept claiming its not multithreaded, that its ONLY single-core performance that matters. you are the one who keep saying that is the conclusion to draw... your statement is so clearly and obviously faulty here, that none of us are able to understand what point you are trying to make. 

In general additional cores do help gaming performance up until a constant value that varries per game.

I want to point out the performance from 8 to 10 didn't really decrease, the 1% lows stayed the exact same. It simply stagnated in growth because its not n-threaded which no one here has claimed that to be the case. 

 

multi-threading =/= n threading. just that n-threading is a type of multi-threading. 

Good grief, it feels like this thread keeps pulling the "no, I'm right" and then looks at a chart showing something completely different from the last one.

 

I'll repeat for the audience who isn't on this page of the thread.

 

If you are making a purchase decision. The base clock speed matters more than the core count, all other things considered equal.

 

A 4-core 4Ghz CPU is not the same performance as a 8-core 2ghz CPU, which is what you see when the cores increase, is the single-thread performance goes down. That's why you see Intel's Xeon product stack lower clock speeds as the cores increase. Because that's how it actually works. You have a thermal budget, and that's how you fit so many cores on a chip.

 

Do you see where the Xeon chips are on the single thread chart?

image.thumb.png.dfe8be24a4b7d972c681fac81c52ba72.png

 

Yes that's right, the 32-core and 24 core Xeon chip's are down there with the i5 laptop chips.

image.thumb.png.ab5dfa48a163cdb42d91a2cff433c4c7.png

That w7-2495x has a 2.5Ghz clock, the i9-14900K has a base clock of 3.2Ghz. The i9 part outperforms the Xeon part despite it being an 8P-16e configuration, when the W7 is a 24P configuration. 

 

 

Would it be better if Intel released a 16-core CPU if you had to cut the clock speed in half? That's the question being asked here. If you are a gamer, you don't. If you have the choice between the 8-core at 5Ghz or the 16-core at 2.5Ghz, you are going to pick the 8-core, knowing that the higher clock speed will have higher performance if it doesn't scale to all the cores, and most of those games made with Unity will not scale anyway. If you want a game that actually uses threads, the game has to be engineered to do so, and that is not done automatically for you regardless if you use Unity or Unreal.  Threading is hard, and results in blocking other things if you try to create too many threads, or access data from one thread in another. E-cores makes things even harder.

 

More cores and more threads benefits highly parallel loads. Games often have little to parallelize in the first place, and are often encouraged to use non-threading features like co-routines to avoid creating threads, and thus avoiding the overhead of creating threads, at the cost of not having threads to scale with.

 

Every time you look up "how do I do multithreading" in any context, you always get the same, out-of-date answer.

"Don't."

 

 

I've been fighting this bone-headed argument that you should not multi-thread, for years. Developers just want to take the easy way out and fork things, and hence we see how Firefox, and Chrome behave today. No threads. Forking trades performance for not having to think about security from one browser process accessing the data of another. It trades crashing the browser not taking all your tabs with it (yet if you crash the main process, all the forked processes die anyway, what did that solve?

 

A game can not fork it's process, especially on mobile platforms. A rock and a hard place. The lazy thing to do is let the engine handle the threading, but the game engines don't actually know how to thread your program, only you do.

 

Link to comment
Share on other sites

Link to post
Share on other sites

39 minutes ago, Kisai said:

If a software product requires AVX512, and it's fallback mechanism runs 100x slower, you're going to either buy the AMD chip if you need that software, or you're going to find another software product that uses maybe the GPU to do the same thing.

AVX-512 isn't a single thing so it is complicated, but I'm not aware of any software near 100x between AVX-512 and AVX2.

 

In peak FP execution terms, AVX-512 is at best 2x IPC relative to AVX2 on dual-unit Intel implementations. That's Skylake-X and the better server stuff. Consumer Intel doesn't have any extra FP execution units over AVX2, which is also the path AMD went with Zen 4. Actually, just saying AVX2 I'm implying Intel Haswell or newer, because AMD's AVX2 implementation in Zen and Zen+ really sucked. They have half the FP execution units of Haswell and only reached parity with Zen 2.

 

Execution aside, AVX-512 has a bunch of other instructions that speed up code, and this is where much of the gain happens. For Prime95-like compute workloads, other factors like shuffles can help too. Relative to Skylake AVX2, Rocket Lake AVX-512 give about +40% IPC, and Skylake-X AVX-512 gives about +80% IPC, not quite the theoretical doubling. Zen 4 AVX-512 is somewhere between the two Intel implementations but as I don't own such a system to run controlled tests my indirect data is rather noisy.

 

The biggest improvement in performance I'm aware of for AVX-512 is below:

In this example, we see AVX-512 implementation give about 11x the performance of baseline code. But this is about the most extreme best case. The baseline code is written by someone who isn't a professional programmer and just wanted to get things working. The AVX-512 code was optimised "by a former Intel AVX-512 engineer who now works elsewhere. According to Jim Keller, there are only a couple dozen or so people who understand how to extract the best performance out of a CPU, and this guy is one of them." While it is not proven it was actually used by the compiler, the highest level vector optimisation that might be present in the baseline version is SSE2.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, starsmine said:

SSE2/SSE4.1/AVX2/AVX512 from left to right
Raptor lake has AVX2

Ouch that SSE2 performance is brutal compared to the rest, fortunately nobody should be limited to that by now. I remember when PCSX2 got support for greater than SSE2 way back, actually playable PS2 emulation. My PC is too old and crap to give PS3 emulation a try sadly (Ivy Bridge-E, SSE4.1/AVX1).

Link to comment
Share on other sites

Link to post
Share on other sites

23 hours ago, Kisai said:

Leadeater comes to the wrong conclusion from the chart.

No I do not, you do. I already explained to you the correct way to look at it but you are so dogmatic on your opinion you refuse to actually look at the evidence or answer a basic question.

 

Is the 4 core 10900K faster or slower than the 6 core 10900K?

 

Ok, now that is settled. Any answer other than the 6 core is faster is a straight up lie then the case is closed on whether or not games can and have been using more than 2 threads, they do, and able to benefit from it.

 

Are 4 people trying to tell you the same thing wrong or is it more likely it is you, the one outlier?

 

So lets go over the 10 core vs 8 core situation. On that CPU and that specific game the extra power draw of having the 2 additional cores could bring the operating state of the CPU out of TVB threshold like I originally said and explained however the clocks are fixed to 4.5Ghz. This does not mean the game wasn't designed for 10 cores/20 threads and could never benefit from them, this is a specific trait of the 10900K CPU and it's operating parameters, here not enough per core L3 cache with all 10 cores enabled.

 

The 10900K with all cores and threads active is not a single time in any game tested slower than itself with only 4 or 6 cores enabled, not a single time. If you had actually looked more at the source you'd have seen this situation does not carry over to every tested game.

 

Remember I originally explained this to you, I gave you that chart for a reason, multiple reasons. I literally went out of my way to find a situation that had any amount of credence to what you said (because I knew you were going to go down the single thread performance track even though you never said anything about that, and you have), DX11 and regression in performance at a higher number of available threads. I've given you as much as possible so you had something to talk about and explain your point of view well enough.

 

How is me telling you this in the first place interpreting the graph wrong? You are the one solely focusing on the 8 core vs 10 core situation for a single game and telling me I'm interpreting it wrong when I explained this exact thing to you first in the original post with the data and information. How can I possibly be interpreting it wrong when I explained and showed it to you first? At what point was I not aware that the 10 core enabled 10900K regressed in performance to the 8 core enabled 10900K?

 

So again the only issue I had with what you said is that games don't use more than 2 threads which is just false, so can you please at least say that it was not correct to say that exact statement in that way because it is actually problematic to go around saying that and for even one person to believe it and that includes yourself as a one person.

 

We should not be letting our opinions get in the way of facts and evidence and we should not, knowingly, go around spreading misinformation. Saying something in error for whatever reason is fine so long as you can actually take onboard feedback and recognize a mistake.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Kisai said:

Would it be better if Intel released a 16-core CPU if you had to cut the clock speed in half? That's the question being asked here.

No it isn't, it was never asked at all or mentioned in any capacity in your original post and is not even relevant in any way to what you actually said at the start. This has so little to do with anything actually being discussed with you that this is the most I will address it, to tell you to stick to the point and the problem statement you made.

 

None of this 4 cores at X frequency vs 8 cores at Y frequency has anything at all to do with what you said and is just a distraction to the issue.

 

Yes it's a more complicated topic to discuss properly which is why your statement was a problem to start with, it took absolutely nothing in to account and was frankly an egregiously wrong thing to say.

 

Can we please stay on point. Do games use more than 2 threads? Yes or no? Then and only then, after the first question has been answered, can games benefit in performance with more threads? Yes or no?

 

9 hours ago, Kisai said:

I'll repeat for the audience who isn't on this page of the thread.

 

If you are making a purchase decision. The base clock speed matters more than the core count, all other things considered equal.

I would say this is a back peddle but those peddles were never on the in the first place to peddle backwards.

 

Exactly where in this quote from what you said does this come from or even apply?

On 12/20/2023 at 12:40 AM, Kisai said:

The problem is simply that people don't know what they need. Like does any Game benefit from more cores? Rarely does a game use more than 2 cores, let alone 1. Only very recent stuff using DX12u or Vulkan does, and even then most CPU use on other cores are used by other services by the OS, they rarely use full CPU cores, hence "e-cores" for the OS did make sense.

Because I see literally nothing about base clocks speeds or anything at all

 

This quote of what you said is the origin point of the discussion and this is what is being discussed.

 

Just say it was incorrect to say that or in that way, it's fine, nothing bad will happen. Nobody will even care that you were wrong, people care that you are currently wrong and could continue to spread that incorrect information. Whether or not your intention was to actually say a couple of faster threads is better for gaming that isn't actually what you said and it's the specific words and way you said something that is the problem because anyone actually reading that can only conclude what you explicitly said. People are not mind readers, words matter. I just happened to know you well enough to figure that you might have been trying to say something else.

 

P.S. Base clocks are not operating clocks, not even in an all core blender workload. Don't keep adding more additional incorrect things in to the mix, it's not helping you or us.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Kisai said:

Good grief, it feels like this thread keeps pulling the "no, I'm right" and then looks at a chart showing something completely different from the last one.

 

I'll repeat for the audience who isn't on this page of the thread.

 

If you are making a purchase decision. The base clock speed matters more than the core count, all other things considered equal.

 

A 4-core 4Ghz CPU is not the same performance as a 8-core 2ghz CPU, which is what you see when the cores increase, is the single-thread performance goes down. That's why you see Intel's Xeon product stack lower clock speeds as the cores increase. Because that's how it actually works. You have a thermal budget, and that's how you fit so many cores on a chip.

I want to stress that I said twice its a bivariance. Bi meaning two, and variance meaning variable.

Fx,y(X,Y) = frame time distribution = some similar Gaussian distribution that is unique to each game. 
X is Performance per core, Y is core count
Nowhere in that formula says those are independent variables, X can be influenced by Y. but is not necessarily. 

the influence of Y is bounded sure. and you can make Y>some z force X to drop off a cliff. This doesn't mean it's not multithreaded, just means it's not n-threaded so pulling out a 24-P core 48-thread chip and using that as an argument to say "see its not multithreaded" blows my mind.
2 threads is multi, 4 threads is multi, 6 threads... is multi. It does not have to be n-threaded to be multi-threaded. this formula requires you to define bounds. 

image.png.ffe6acd8e7fabb2f5e85bc606c0c49d3.png
I have also said in reality there are other variables, but for the sake of simplicity set them to infinity (GPU, Memory), see Definition 5.1c/5.1d. You can this formula to define the bottleneck for software, by integrating each individual contribution out of the equation with convolution or Fourier transforms.

none of us have denied Core performance is important, you keep arguing core count is not important. But a hypothetical of 4 core 4ghz, 8 core 2ghz doesnt make sense, no one has said they are equal, the equation never makes that implication of them being equal in weight. When you talk a bout a hypothetical 4 core 4ghz to a 3.9ghz 8 core, Its impossible to make a claim which one would be faster in a suite of games across the last decade. 

Aint no one got the time to build stochastic data sets per game to isolate the variables when you can just do the benchmarks and get the final data points of improvement from 4 to 6 to 8 to steady at 10 for 1% lows and a minor drop in avg. 

Yes putting the world thread on an e-core is beyond detrimental, that's why it's not done. Asynch threads most likely don't belong on there, but there are probably some Asynch threads that can get away with it and help performance. But put that another way, anything OS non-game relating being on the E cores frees up the P cores to have less interruptions thus less latency. You have to argue the E cores being there takes away so much thermal/power headroom that it lowers the performance of the P cores, and that just does not pan out in most tests, nor does turning them off give the P cores more power/thermal headroom in a significant amount to give you enough performance to rectify the loss of threads for the OS.

you are correct a thermal limit exists. but that is largely decoupled from the power to core limit other than being a bound when ALL cores are running at max power. You can infinite the thermal limits but that doesn't infinite the power to a core limit. 



 

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/21/2023 at 6:22 AM, Kisai said:

This is nothing more than a clock speed argument. If a game truely used more threads, then the frame rate would be exactly the same, when the same number of cores are available. Which is not what that chart shows. That chart shows the faster the CPU, the faster the frame rate, regardless of the cores. That difference between the 10 core and the 8 core shows that, because that's the SAME chip.

Sorry to beat the dead horse again but reading this again it truly does need addressing. I really do have no idea how you have gotten so confused here.

 

When you say the frame rate would be the same when the same number of cores are available this is absolutely not correct. This video/testing purpose is to look at the effect of L3 cache and number of active cores with a fixed 4.5Ghz frequency. Which also means your other assertion in above is wrong with the reasoning you have given, technically the faster CPU is faster but it's not faster due to frequency,

 

Quote

To put this test together we used the Gigabyte Z590 Aorus Xtreme motherboard, clocking the three Intel K-SKU CPUs at 4.5 GHz with a 45x multiplier for the ring bus and used DDR4-3200 CL14 dual-rank, dual-channel memory with all primary, secondary and tertiary timings manually configured. 

https://www.techspot.com/article/2315-pc-gaming-quad-core-cpu/

 

This is how I know you have not watched the video, read the linked article version of the video or have been reading what I said because I told you at the very start the clocks were fixed and that is why I was so confident and repeatedly said the frequency/clocks were the same. Every time I said if they were different and the math doesn't explain the difference that was to entertain you and your idea that frequency alone was the difference but you were too busy arguing your point trying to reinforce that you were correct that you ignored crucial information.

 

If we isolate all 3 tested CPUs to 6 cores they would indeed perform exactly the same IF they were operating at the same frequency and had the same L3 cache, which they do not.  This actually should have been obvious and it should have been the first assumption as to any performance differences with CPUs within the same generation using the same microarchecture with no core/cache modifications. It's literally what the video is about and that is how I know you have not actually watched it properly.

 

If they were ALL operating at 4.5Ghz and all had 20MB L3 cache then yes the 3 CPUs in the game would deliver the same performance, but they don't all have 20MB L3 cache so they do not. Therefore your statement and reasoning is incorrect. I'd say you got there because you are trying to find anything to support your view rather than just objectively looking at the information and checking product information.

 

This is literally why I told you to ignore every CPU other than the 10900K. But I'm more than happy for us all to ignore the 10900K and only look at either the 10700K or 10600K so long as we are ONLY evaluating one and only one CPU model so as many variables are isolated. If you are not applying scientific method then your evaluations and conclusions have no measure of accuracy and consistency and is simply an unclean sample set of data leading to erroneous outcomes.

 

The core central issue with what you said is that games only use 2 threads, often only 1. If that were actually true then the 10900K across ALL configured and active cores combinations would be the same performance which is not the case. This would also be the same for the 10700K and the 10600K. Zero out of the 3 CPUs tested deliver the same performance across the range of possible active core values tested. If your assertion were true then for each CPU it would be the same FPS across the tested range.

 

I cannot believe we are going through the exercise of performance benchmarking different CPUs to figure out which is on average the fastest which isn't at all the discussion. We are trying to ascertain if the game benefits from having more threads available and you do that by keeping the tested CPU consistent so frequency and cache differences, or worse microarchectures, are not a factor (or as much as possible not).

 

I really am lost at how you can look at the 10 cores vs 8 cores and at the same time ignore 8 cores vs 6 cores and 6 cores vs 4 cores. At which point do I conclude you are doing this intentionally? Because this is a very big issue right now. There is no way any of this can progress until you can demonstrate to me and everyone else you have actually genuinely evaluated the information that has been provided to you completely and in full and are able to explain each and every situation, not just one that happens to slightly align with your opinion that has a very simple and already explained reason for it, a reason that does not hold true across every single game in existence, which is beyond the point anyway.

 

Unless the above is satisfied it is very unlikely I will directly respond to you again in this topic about this. I am very satisfied with the information I have given you, the explanations I have given, my reasonings and that it has been demonstrated that games indeed do use and benefit from greater than 2 threads.

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, starsmine said:



none of us have denied Core performance is important, you keep arguing core count is not important. But a hypothetical of 4 core 4ghz, 8 core 2ghz doesnt make sense, no one has said they are equal, the equation never makes that implication of them being equal in weight. When you talk a bout a hypothetical 4 core 4ghz to a 3.9ghz 8 core, Its impossible to make a claim which one would be faster in a suite of games across the last decade. 

 

 

Look, for the sake of stop having this circular argument. I'm not saying "no, don't buy that multicore CPU with more cores". This is a thing that has been true ever since Intel introduced hyperthreading. On The P4's that it was introduced on, it cut the performance in half if the software you were using didn't actually use more than one thread. This is because it cut the ALU's per hyperthread to half of the CPU core.  So one 3ghz CPU Core became effectively 2 3Ghz cores but using both the same way would be closer to a 2ghz and a 0.5ghz cpu. The overhead to the parts that didn't make it a full core is still there. But that's just talking about first gen HT, and not AMD's version.

 

Introducing the e-cores, or nerfing the AVX512 introduces the same kind of variable that software has not been designed to anticipate. So for the sake of argument, you now have physical full featured P cores, or logical hyper threaded P cores, or E-cores that have no AVX and no hyperthreading, but also a possibility of turning AVX512 off, or turning the-cores off, or turning hyperthreading off.

 

So because not all threads are equal, creating a parallel task is now substantially more difficult thing to do. So the thing that developers do is is ignore all of that. Hence we get back to the "no game expects more than 2 cores", and you rarely saw a game before DX12u that actually min-maxed all available resources because that would make the the game not run on anything but bleeding edge hardware. That's why the game console specs ends up being the PC base spec. So we didn't have any multi-core games designed to use more than 2 cores until PS4/XBox1's which had

 

*drumrolll*

 

8 cores.

 

 

13 hours ago, starsmine said:


Aint no one got the time to build stochastic data sets per game to isolate the variables when you can just do the benchmarks and get the final data points of improvement from 4 to 6 to 8 to steady at 10 for 1% lows and a minor drop in avg. 

Yes putting the world thread on an e-core is beyond detrimental, that's why it's not done. Asynch threads most likely don't belong on there, but there are probably some Asynch threads that can get away with it and help performance. But put that another way, anything OS non-game relating being on the E cores frees up the P cores to have less interruptions thus less latency. You have to argue the E cores being there takes away so much thermal/power headroom that it lowers the performance of the P cores, and that just does not pan out in most tests, nor does turning them off give the P cores more power/thermal headroom in a significant amount to give you enough performance to rectify the loss of threads for the OS.

you are correct a thermal limit exists. but that is largely decoupled from the power to core limit other than being a bound when ALL cores are running at max power. You can infinite the thermal limits but that doesn't infinite the power to a core limit. 
 

We are not yet at a point in time where any game is aware of e-cores or how to use them, but if the game expects all cores to be the same performance, and the OS mistakenly sticks the main game loop on the e-core, or the input on the e-core, then the users are going to complain that playing the game on a 12th/13th/14th gen Intel CPU is substantially slower.

 

Case in point:

https://arstechnica.com/gaming/2021/11/faulty-drm-breaks-dozens-of-games-on-intels-alder-lake-cpus/

 

If you search for problems explicitly with e-cores, you will find Unreal mentioned a lot, and Ubisoft mentioned a lot. The solution? Turn the e-cores off. Echos of "turn hyperthreading off" on Pentium 4's and early dual-core systems. Or use something like Process Lasso, and explicitly forbid the game from touching the e-cores.

 

But that's not the only time this happened. When I played a certain game that was originally designed as a mobile game, the SSL library would use a feature on the CPU that would prevent the game from launching, and there was no way to learn this yourself, you had to find obscure posts on reddit saying to set an environmental variable to force openssl to not use CPU features, once the game updated, you could turn it off.

 

image.thumb.png.5d4c069e1120bd36bba7c18af37db283.png

Most PC gamers have 4, 6 or 8 core systems. 2-core is still above 5%, so it's still something you have to consider. The Steamdeck is a 4-core toy. If you want people to be able to use the steam deck, you aren't going to optimize it for a 6 or  8-core system, 52% of PC players will not have their hardware fully used. So again, you would still be better off picking the higher clocked CPU, regardless of it's core count, if this is the case.

 

If you are buying a new PC and have no idea what is going to perform the best.

go to here https://www.cpubenchmark.net/singleThread.html

image.thumb.png.71d0768ae9c4a932211edcc5fa6c4e55.png

That's how everything lines up. Unless you need AVX512, nothing else matters, because that's out of your control anyway.

 

If you get a list by maximum possible performance, it of course looks different:

image.thumb.png.288bf787976d6bcd188ff8749fd17756.png

 

But nobody is seriously going to say "buy a ThreadRipper PRO 7995WX cause it has more cores" if all you're going to do is play games.

 

image.thumb.png.dc7a6e2df199e9c1b875e42d12439109.png

The Intel 14900K is 8p16e, The Ryzen 9 7950X and 7950X3D are 16 cores, The WX has 96 cores.

 

I don't think you're going to find a game that can use 96 cores. Never mind a game that is aware of e-cores not being equal.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Kisai said:

On The P4's that it was introduced on, it cut the performance in half if the software you were using didn't actually use more than one thread. This is because it cut the ALU's per hyperthread to half of the CPU core.  So one 3ghz CPU Core became effectively 2 3Ghz cores but using both the same way would be closer to a 2ghz and a 0.5ghz cpu. The overhead to the parts that didn't make it a full core is still there. But that's just talking about first gen HT, and not AMD's version.

Honestly I don't think I have the energy or patience to cover this, but this is also not correct. Simply have a read of the Anandtech review at the time and also look at the microarchecture diagram for Netburst and compare it to any later microarchecture. Execution units were not split or isolated to "hyperthreads" and hardware threads are strictly front end. Instructions from threads go in to queues and the decoder take them out and decodes them in to micro-ops and those are put in to queues and so on. One of the reasons Netburst HT was not that great is because it could only decode 1 instruction at a time, later architectures could do more.

 

A lot of applications/software did see regressions in performance with HT active but performance was never cut in half, not for applications/software that were or were not thread aware. Think about it, if you had a game back then written well before HT was a thing then on P4 HT it would perform at half the speed based on what you are saying? Do you actually think this? 

 

I owned many P4 CPUs back then, non-HT and HT ones, 2.5GHz/2.8Ghz/2.8Ghz 1MB cache/2.8Ghz HT & 3.06Ghz HT. I say owned but technically they were free taken from old work computers being thrown out. One thing was for sure, my old games weren't half the performance on the 2.8Ghz HT compared to the 2.8Ghz non-HT.

 

I preferred the P4 HT's because in Rome: Total War battles could get to large and crawl and render the game not playable and I could tab out and open task manager and kill it with a P4 HT but not with a P4 without HT (could but it took extremely long with the mouse barely usable).

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, leadeater said:

Honestly I don't think I have the energy or patience to cover this, but this is also not correct. Simply have a read of the Anandtech review at the time and also look at the microarchecture diagram for Netburst and compare it to any later microarchecture. Execution units were not split or isolated to "hyperthreads" and hardware threads are strictly front end. Instructions from threads go in to queues and the decoder take them out and decodes them in to micro-ops and those are put in to queues and so on. One of the reasons Netburst HT was not that great is because it could only decode 1 instruction at a time, later architectures could do more.

 

A lot of applications/software did see regressions in performance with HT active but performance was never cut in half, not for applications/software that were or were not thread aware. Think about it, if you had a game back then written well before HT was a thing then on P4 HT it would perform at half the speed based on what you are saying? Do you actually think this? 

 

I owned many P4 CPUs back then, non-HT and HT ones, 2.5GHz/2.8Ghz/2.8Ghz 1MB cache/2.8Ghz HT & 3.06Ghz HT. I say owned but technically they were free taken from old work computers being thrown out. One thing was for sure, my old games weren't half the performance on the 2.8Ghz HT compared to the 2.8Ghz non-HT.

Even if that was true, it's irrelevant for modern settings anyway, software managed to catch up since then and is aware of SMT/HT. Same applies to any other hardware advancement that manages to get traction in the market (as in stuff making use of AVX-512 and Windows' scheduler managing to work with heterogeneous cores).

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×