Jump to content

AMD, threads on threads on threads?

ShawnTD
23 hours ago, KarathKasun said:

I am the citation.  SMT hurts lots of consumer tasks like gaming because it adds latency to the pipeline for each thread.  This is why you can get better/more stable FPS with SMT off.

 

It helps server workloads because they are not as sensitive to the extra latency because they are generally not doing real-time tasks.

heart-attack.jpg.8a458454bded648ff84b69b58d5a1085.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, floofer said:

heart-attack.jpg.8a458454bded648ff84b69b58d5a1085.jpg

Sorry guy, I actually qualify as a citeable source (in the acedemic world) or expert witness (in the legal world).  Holding degrees in the field and having ~20 years of performance analysis and implementation experience in the professional field does that for you.

 

Not every analyst runs a tech news site, most of us do private consultation work.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, KarathKasun said:

Sorry guy, I actually qualify as a citeable source (in the acedemic world) or expert witness (in the legal world).  Holding degrees in the field and having ~20 years of performance analysis and implementation experience in the professional field does that for you.

 

Not every analyst runs a tech news site, most of us do private consultation work.

We both know full well that writing something without citing isn't quite appropriate. No matter how much we'd like, citing yourself isn't quite what we do. 

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, KarathKasun said:

Sorry guy, I actually qualify as a citeable source (in the acedemic world) or expert witness (in the legal world).  Holding degrees in the field and having ~20 years of performance analysis and implementation experience in the professional field does that for you.

 

Not every analyst runs a tech news site, most of us do private consultation work.

That may well be the case but I'd love to see you stand up in a court and tell a judge "this thing is definitely true because I say it is and I'm the expert". Let's see how that would end for you.

 

Until we hear it directly from AMD (and ftr I'm not disagreeing with anything you said) anything you say is conjecture, nothing more.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Master Disaster said:

That may well be the case but I'd love to see you stand up in a court and tell a judge "this thing is definitely true because I say it is and I'm the expert". Let's see how that would end for you.

I'm not saying the dude isn't being obnoxious but that is actually what a expert witness does in court. They are considered to be experts in their field that their testimony is objectively true, ie. only giving facts. Yes there have been cases where the expert witness isn't really an expert or isn't being objective.

[Out-of-date] Want to learn how to make your own custom Windows 10 image?

 

Desktop: AMD R9 3900X | ASUS ROG Strix X570-F | Radeon RX 5700 XT | EVGA GTX 1080 SC | 32GB Trident Z Neo 3600MHz | 1TB 970 EVO | 256GB 840 EVO | 960GB Corsair Force LE | EVGA G2 850W | Phanteks P400S

Laptop: Intel M-5Y10c | Intel HD Graphics | 8GB RAM | 250GB Micron SSD | Asus UX305FA

Server 01: Intel Xeon D 1541 | ASRock Rack D1541D4I-2L2T | 32GB Hynix ECC DDR4 | 4x8TB Western Digital HDDs | 32TB Raw 16TB Usable

Server 02: Intel i7 7700K | Gigabye Z170N Gaming5 | 16GB Trident Z 3200MHz

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, 2FA said:

I'm not saying the dude isn't being obnoxious but that is actually what a expert witness does in court. They are considered to be experts in their field that their testimony is objectively true, ie. only giving facts. Yes there have been cases where the expert witness isn't really an expert or isn't being objective.

And if you color your input to the court as an expert witness you effectively destroy your career and reputation.  Same goes for the acedemic and professional side, which has been seen in a semi-public way with firms that have signed off on shady performance reports used as marketing material.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, KarathKasun said:

And if you color your input to the court as an expert witness you effectively destroy your career and reputation.  Same goes for the acedemic and professional side, which has been seen in a semi-public way with firms that have signed off on shady performance reports used as marketing material.

*Cough* Principled Technologies *Cough*

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus GA-AX370-GAMING 5
  • RAM
    32GB DDR4 3200
  • GPU
    Inno3D 4070 Ti
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    Western Digital Black 250GB, Seagate BarraCuda 1TB x2
  • PSU
    EVGA Supernova 1000w 
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide Full HD, BenQ - XL2430(portrait), Dell P2311Hb(portrait)
  • Cooling
    MasterLiquid Lite 240
Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, williamcll said:

Too bad most games are still not using more than 4 threads.

This is no longer the case. The consoles with 8 cores have been out for years and basically any game post 2017 can take more than 4 threads, assuming the game is complex enough to need more than 4 threads.

MOAR COARS: 5GHz "Confirmed" Black Edition™ The Build
AMD 5950X 4.7/4.6GHz All Core Dynamic OC + 1900MHz FCLK | 5GHz+ PBO | ASUS X570 Dark Hero | 32 GB 3800MHz 14-15-15-30-48-1T GDM 8GBx4 |  PowerColor AMD Radeon 6900 XT Liquid Devil @ 2700MHz Core + 2130MHz Mem | 2x 480mm Rad | 8x Blacknoise Noiseblocker NB-eLoop B12-PS Black Edition 120mm PWM | Thermaltake Core P5 TG Ti + Additional 3D Printed Rad Mount

 

Link to comment
Share on other sites

Link to post
Share on other sites

I have to think there is some kinda diminishing returns, or drawback like a massive increase in heat. I do think KarathKasun has a point though at some point as you increase threads per core, latency just HAS to be an issue, or at least thread latency consistency, at some point some workload is going to ask for too many tasks going through a certain part of the CPU and cause a "Traffic jam" of sorts, and certain "SMT cores" will be idle.
 

It be crazy to think if a Zen 3 Threadripper literally had 64 cores, 256 threads though.

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/27/2019 at 2:08 AM, KarathKasun said:

I am the citation.  SMT hurts lots of consumer tasks like gaming because it adds latency to the pipeline for each thread.  This is why you can get better/more stable FPS with SMT off.

 

It helps server workloads because they are not as sensitive to the extra latency because they are generally not doing real-time tasks.

 

On 9/27/2019 at 2:11 AM, Master Disaster said:

Then you'll forgive me for calling bollocks. Until I hear it from AMD I'm remaining open to either possiblity.

 

can confirm poorly optimized games like BDO can get as much as a 35 fps boost from hyperthreading being disabled

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Sypran said:

I have to think there is some kinda diminishing returns, or drawback like a massive increase in heat.

Fortunately heat wouldn't increase like that, the execution cores don't change and use as much power as they could before. Heat would slightly increase if you are able to keep the cores at maximum power state doing more work more often though but the increase shouldn't be large.

 

The main draw back is you can't actually make the CPU cores do more by adding threads, so if you're fully optimized using all the data movement and operations per cycle that can be done you can only have two resultants from adding more threads, zero improvement or a reduction. Ideally the application would be properly optimized for the CPU architecture and only assign work to threads that it knows can complete the operations as requested, because applications can understand SMT and thread layout even if it's SMT2, SMT4 or SMT8 but that understanding is by no means default.

 

SMT has gotten a lot better though so it is now more rare for performance to degrade with SMT enabled, or if it does it's difficult to tell due to margin of error. For the stuff that I've found where it does actually matter it's usually an older application or one that was architected long ago and the core of it has not changed in 15+ years even though it is being maintained and updated, the other case is high performance workloads that are optimized to use all the resources of a core as best it can but isn't as well SMT aware as it needs to be or NUMA aware as I'd like where a OS thread being allocated work it shouldn't results in an entire cores worth of performance reduction.

 

It's better to fix application problems but the application may not be yours, vendor paid product, or the effort required is a lot. A lot easier to jump in to bios and disable SMT than anything else. Honestly though I haven't done that since around Pentium 4 era.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

if you are able to keep the cores at maximum power state doing more work more often though but the increase shouldn't be large.

I should have clarified, but yeah thats what I meant a CPU in a sustained high workload .
I wasn't sure how much of an increase in heat it truly be, I would thought something like a Blender render vs Prime95 on a regular CPU.

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/27/2019 at 1:58 AM, Dabombinable said:

When ever I've seen testing, comparison and reviews done, that hasn't been the case. The most difference that I have seen BTW was back in the days of the Pentium 4 HT through 600 series. And even then, only a few examples had severe performance drop off. Some of those also having issues later on with multiple cores or CPU (in the case of my dual Pentium III rig).

I can possibly attest to what he's saying, that it hurts performance across the board.  In my signature you'll find a link to some "info about Intel CPUs".  In that thread, I explore/reveal an interesting effect in which the scaling with multiple cores of chips with HT is noticeably and consistently worse than chips without HT.  I postulate that this could be due to the test not working properly (multiple threads being stuck onto a single core when it could have been spread out, etc.), but it also could have been due to a performance hit resulting from HT, and tbh that's feeling more likely to me at this point.

 

 

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, spartaman64 said:

 

can confirm poorly optimized games like BDO can get as much as a 35 fps boost from hyperthreading being disabled

I'm not disagreeing, I'm simply saying until it's confirmed by AMD then it's a guess. Granted an educated guess but still a guess.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

58 minutes ago, Ryan_Vickers said:

I can possibly attest to what he's saying, that it hurts performance across the board.  In my signature you'll find a link to some "info about Intel CPUs".  In that thread, I explore/reveal an interesting effect in which the scaling with multiple cores of chips with HT is noticeably and consistently worse than chips without HT.  I postulate that this could be due to the test not working properly (multiple threads being stuck onto a single core when it could have been spread out, etc.), but it also could have been due to a performance hit resulting from HT, and tbh that's feeling more likely to me at this point.

 

 

HT always has a performance hit because other tasks can land on the SMT threads.  That and there are hard penalties for keeping two sets of registers active for each core and checking them both.  Added complexity to instruction fetch and microcode issue nessicitates a performance hit.

 

The reason nobody "notices" the difference from SMT to no SMT (in modern systems) is that performance is "good enough" with it on so they dont test with it off.  On i7 consumer chips of the past you get more performance benefit from enabling SMT than it costs you in the end.  This is because the OS or game doing houskeeping tasks in the background can end up pre-empting the main game thread.  This would cause a very noticable hitch, where as with SMT on you only get a minor FPS dip.  Having more cores simply bypasses this situation in the first place.

 

The old SMT-8 CPUs were built specifically to always execute instructions in 8 pipeline stages. This was so that each thread always got an equal slice of the resources, with the penalty being that all instructions had a minimum 8 cycle latency.  Great for maximizing throughput, but bad for "real time" or interactive tasks.

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Master Disaster said:

I'm not disagreeing, I'm simply saying until it's confirmed by AMD then it's a guess. Granted an educated guess but still a guess.

The Gamers Nexus 3900x review showed that with SMT off, many games ran better.  Granted, they were also able to get a higher overclock, so it could have been due to that, but even so that's still meaningful

6 minutes ago, KarathKasun said:

[...]The reason nobody "notices" the difference from SMT to no SMT (in modern systems) is that performance is "good enough" with it on so they dont test with it off.  On i7 consumer chips of the past you get more performance benefit from enabling SMT than it costs you in the end. [...]

Yeah, clearly it causes total performance to improve.  Based on my meta analysis of the userbenchmark scores though, it suggests as much as a 10% hit to single-threaded performance (though often coming in at less than that)

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Ryan_Vickers said:

The Gamers Nexus 3900x review showed that with SMT off, many games ran better.  Granted, they were also able to get a higher overclock, so it could have been due to that, but even so that's still meaningful

Yeah, clearly it causes total performance to improve.  Based on my meta analysis of the userbenchmark scores though, it suggests as much as a 10% hit to single-threaded performance (though often coming in at less than that)

Hilariously, this is the same performance drop off seen on going from 1 core per module to 2 cores per module in Bulldozer.  In that architecture, the CPU is setup EXACTLY like a SMT core but it has dedicated resources for the second integer thread.

 

Again, hard limits on the maximum performance when tracking two threads with one fetch unit.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ryan_Vickers said:

The Gamers Nexus 3900x review showed that with SMT off, many games ran better.  Granted, they were also able to get a higher overclock, so it could have been due to that, but even so that's still meaningful

Would be interesting to pick a game that also has Linux support and run the test on both operating systems, is it Windows just being useless or SMT itself. Being a game and not all cores being utilized I'd throw my hat towards Windows being useless lol.

 

You "shouldn't" get negative scaling for SMT when there are underutilized cores, but we all know the difference between shouldn't and reality.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, KarathKasun said:

Hilariously, this is the same performance drop off seen on going from 1 core per module to 2 cores per module in Bulldozer.  In that architecture, the CPU is setup EXACTLY like a SMT core but it has dedicated resources for the second integer thread.

 

Again, hard limits on the maximum performance when tracking two threads with one fetch unit.

Interesting.  Although that arch had other issues, like abysmally low IPC xD

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

Would be interesting to pick a game that also has Linux support and run the test on both operating systems, is it Windows just being useless or SMT itself. Being a game and not all cores being utilized I'd throw my hat towards Windows being useless lol.

 

You "shouldn't" get negative scaling for SMT when there are underutilized cores, but we all know the difference between shouldn't and reality.

It wouldn't be the first time the Windows scheduler has lacked awareness of the structure of the CPU it's working with and caused poorer than necessary performance as a result, but I can't guess as to what the outcome of this would be.  Certainly would be something worth testing though!

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ryan_Vickers said:

It wouldn't be the first time the Windows scheduler has lacked awareness of the structure of the CPU it's working with and caused poorer than necessary performance as a result

s008j9ibbfpx.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Ryan_Vickers said:

It wouldn't be the first time the Windows scheduler has lacked awareness of the structure of the CPU it's working with and caused poorer than necessary performance as a result, but I can't guess as to what the outcome of this would be.  Certainly would be something worth testing though!

https://www.agner.org/optimize/blog/read.php?i=6&v=t

 

Best case, greater than 50% but less than 100% speedup running two threads on one core.

 

Worst case, negative scaling due to resource contention.

 

Regardless of the case, SMT reduces the resources available to a single thread on the "front end" of the core.  Things like micro-op cache, register space, and decode unit resources are cut in half.  This likely accounts for the ~10% performance loss.  Those resources ARE overprovisioned for each core, but they are not doubled in comparison to what the core is capable of due to the fact that you cant shove double the work into the same pipeline in 99.99% of cases.

 

In essence, the front end is tuned to provide ~180% of the maximum instruction load of the core.  When you use SMT you get between ~90% performance at worst and ~180% performance at best.  Its a transistor and complexity vs performance tradeoff that has been tweaked over the past 20 some odd years.

Link to comment
Share on other sites

Link to post
Share on other sites

* Programmers everywhere scream in horror as their code that assumes hyperthreading = 1c2t is broken *

PLEASE QUOTE ME IF YOU ARE REPLYING TO ME

Desktop Build: Ryzen 7 2700X @ 4.0GHz, AsRock Fatal1ty X370 Professional Gaming, 48GB Corsair DDR4 @ 3000MHz, RX5700 XT 8GB Sapphire Nitro+, Benq XL2730 1440p 144Hz FS

Retro Build: Intel Pentium III @ 500 MHz, Dell Optiplex G1 Full AT Tower, 768MB SDRAM @ 133MHz, Integrated Graphics, Generic 1024x768 60Hz Monitor


 

Link to comment
Share on other sites

Link to post
Share on other sites

On 9/27/2019 at 1:03 AM, ShawnTD said:

First time news poster here so sorry in advance if i get this wrong.

 

Just ran across this on wccftech(I know, I know) large amounts of salt and whatnot.

 

Apparently the new zen3 architecture from AMD is rumored to have SMT4 (4 threads per core) as stated by the article here,

https://wccftech.com/rumor-amd-zen-3-architecture-to-support-up-to-4-threads-per-core-with-smt4-feature/

 

Which after digging into the wccftech article they are getting their info from a German website Hardwareluxx-

https://www.hardwareluxx.de/index.php/news/hardware/prozessoren/50914-geruechtekueche-zen-3-mit-smt4-und-neue-navi-und-turing-einsteigerkarten.html

 

The article says-

 

 

 

While I can't say I know very much about micro architecture or if this has any plausibility or possiblity anytime soon, to me it seems like might be the next new "core wars" beginnings? 

 

 

https://www.digitaltrends.com/computing/amd-zen-3-sm4-four-threads-per-core/

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, rcmaehl said:

* Programmers everywhere scream in horror as their code that assumes hyperthreading = 1c2t is broken *

That would already have been an issue because basic assumptions like this also are likely assuming a monolithic design without CCDs, CCXs, Numa nodes, etc. and in fact Windows itself was guilty of this for some time, hence my earlier comment.  I think with the popularity of these CPUs, the topic of making sure the OS (and applications) actually understand what they're running on rather than going in blind based on an assumption guided by nothing but a core number is getting a lot more attention which will help resolve these issues.

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×