Jump to content

Intel HT vs AMD SMT scaling

I thought up a little project after Ian Cutress of Anandtech mentioned seeing 80% speedup from SMT. This got my attention as my own experience with Intel CPUs up to that point had only shown up to 50% improvements. The software was something he wrote himself in the past, and I got a copy to try out. I've only had limited time to run it, but on two different Intel systems I got under 50% speedup with HT, and on a 1st gen Ryzen system it was 70-ish %. I kinda parked it at that point since it was during the heatwave where I am, and benching isn't fun under those conditions.

 

Now the weather has cooled off a bit, I'm looking into doing a wider test of HT/SMT performance/gains between Intel and AMD. I've got a good variation of hardware, although I probably should get a Ryzen 2000 representative also, maybe the 2200G. So hardware wise, I think I'm ok.

 

The software is what I'm left wondering about... what are good multi-thread CPU benchmarks or easily timed workloads, preferably NOT significantly limited by other factors, such as ram bandwidth.

 

For now I'm thinking:

3DPM

Cinebench R15 (and maybe older ones?)

Blender (may depend on version and content, TBD)

Y-cruncher Pi - @Mysticial in another thread you said Skylake-X was ram limited in this due to AVX-512. If I were not to use that feature, what's the ram situation then? To be safe I could simply force using fewer cores and/or lower clocks...

Prime95 - historically this doesn't gain from HT, but I like to recheck now and then

PrimeGrid GCW-sieve

 

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Ah, so you're mackerel from mersenneforum!

 

Not sure if you're a programmer, but if you want to artificially construct a benchmark that would benefit as much as possible from HT, try something that iterates over a very large linked-list such that every hop is a cache miss.

 

I don't know of any particular real-life application that does this. But I can't imagine it being too uncommon. Something like this could possibly get 4x speed-up on Knights Landing with 4-way SMT.

 

For y-cruncher, even AVX2 is somewhat memory-bound if you have enough cores. So you may have to drop all the way down to SSE4. (the "08-NHM" binary). But you'll have to experiment to see for sure.

 

I'll note that y-cruncher's memory-bandwidth usage is very different from Prime95.

  • Prime95's is very smooth and steady. So it's either 0% memory-bound or 100% memory-bound.
  • In y-cruncher, the usage is bursty - so there's more of a distribution. Memory and CPU speeds will always have an effect and neither can completely bottleneck the other even at the extremes like scalar code on one end and AVX512 on the other.
Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Mysticial said:

Not sure if you're a programmer, but if you want to artificially construct a benchmark that would benefit as much as possible from HT, try something that iterates over a very large linked-list such that every hop is a cache miss.

Definitely not a programmer to any useful level. Wish I understood even half of the stuff you talk about on the other forum. Also I'm not after constructing the best possible HT/SMT benefit case, but to use existing software and see how Intel and AMD's architecture differ from each other. 

 

3 hours ago, Mysticial said:

For y-cruncher, even AVX2 is somewhat memory-bound if you have enough cores. So you may have to drop all the way down to SSE4. (the "14-NHM" binary). But you'll have to experiment to see for sure.

I'm not sure I want to reduce the software capability, apart from Skylake-X where AVX2 vs AVX512 might be interesting to allow comparisons with the rest of Intel CPUs. So in that sense, I want the software to do whatever it normally does, and if needed I'd rather limit the cores to take the load of ram. It could even make comparisons a little bit easier, like if for example I fixed everything to 2 cores 4 threads at a certain clock regardless of its original configuration.

 

3 hours ago, Mysticial said:

I'll note that y-cruncher's memory-bandwidth usage is very different from Prime95.

  • Prime95's is very smooth and steady. So it's either 0% memory-bound or 100% memory-bound.
  • In y-cruncher, the usage is bursty - so there's more of a distribution. Memory and CPU speeds will always have an effect and neither can completely bottleneck the other even at the extremes like scalar code on one end and AVX512 on the other.

Based on the indicated ram usage, I take it even the smaller sizes wont fit in the L3 cache of most CPUs so ram will have some impact?

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, porina said:

I'm not sure I want to reduce the software capability, apart from Skylake-X where AVX2 vs AVX512 might be interesting to allow comparisons with the rest of Intel CPUs. So in that sense, I want the software to do whatever it normally does, and if needed I'd rather limit the cores to take the load of ram. It could even make comparisons a little bit easier, like if for example I fixed everything to 2 cores 4 threads at a certain clock regardless of its original configuration.

Ah ok. Yeah, reducing the core count will achieve that. 2 cores 4 threads is a good starting point. Just make sure that you pin the threads properly so that the 4 threads are actually running on 2 physical cores.

 

18 minutes ago, porina said:

Based on the indicated ram usage, I take it even the smaller sizes wont fit in the L3 cache of most CPUs so ram will have some impact?

Yes and no. The program doesn't always use all the memory that it says it needs. The number it shows is just some provable upper-bound. (Yeah more math, but hey we're from MersenneForum!) The actual usage is usually a bit lower.

 

But for the most part, you're correct. You will not be able to fit most computations in L3 cache. And if you can, it'll probably be so small that it ends in a blink.

 

If you want to completely throw memory out of the equation, you can try the BBP benchmark. That one touches no memory and is raw CPU. The HT speedup seems to be just under 10%. But one of the purposes of HT is to hide memory access latencies. So a pure-CPU benchmark won't be able to measure that.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Mysticial said:

Ah ok. Yeah, reducing the core count will achieve that. 2 cores 4 threads is a good starting point. Just make sure that you pin the threads properly so that the 4 threads are actually running on 2 physical cores.

I was simply going to disable cores in bios so there were only two visible, for example.

 

8 hours ago, Mysticial said:

If you want to completely throw memory out of the equation, you can try the BBP benchmark. That one touches no memory and is raw CPU. The HT speedup seems to be just under 10%. But one of the purposes of HT is to hide memory access latencies. So a pure-CPU benchmark won't be able to measure that.

Interesting thought. If I go back to my goal, I wanted to see what benefits HT/SMT can bring. I'm not avoiding ram access as such, just that I didn't want it to be the limiting factor. Maybe I look at it over simplistically the other way. A core (without HT/SMT) has so much execution resource that can be extracted. With the addition of HT/SMT, more of that could be used. I didn't really give consideration to how it is used.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...

 

Some preliminary results here...

 

3dpm2.png.ca1f3b5fc4ed6d48e82ed9c9b0a93d2e.png

 

3DPM has subscores, and I thought it interesting to show those as well as the overall score. It might not be clear, but in general the 8086k and 7800X are slightly faster per core per clock with HT off, compared to the 1600. This reverses when HT/SMT is enabled, and the 1600 takes the lead.

 

3dpm1.png.8b401c95a14077ebd7b3faf396f29a59.png

 

It is clearer to see when shown as HT/SMT improvement. The interesting thing here is, one of the subtests shows over 60% improvement on Intel CPUs from having HT on compared to off. This breaks my assumed 50% limit! It is also clear that Ryzen shows better SMT scaling than Intel HT, almost reaching 80% on one subtest.

 

It was also interesting to see the two Intel CPUs, 8086k and 7800X, were almost identical. There are differences in the cache structure and ram, but presumably these tests do not rely too much on those and are primarily affected by core performance. This is to be followed up later to check for scaling.

 

Systems were all running Win10, with all updates applied as of 27 July 2018. Mobo bios were also all up to date with at least the original set of Sepctre/Meltdown protections if not subsequent ones. CPUs cores were stock, but the 7800X did have cache OC'd to 3000. Ram on the Intel systems were 3000 all channels, and 2666 dual channel on the Ryzen system.

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

  • 10 months later...

I realize this thread is pretty old but was searching for Intel HT vs AMD Ryzen SMT comparison now that the Intel Zombieload mitigation fixes are in the process of being delivered and the reality is the Intel's HT will be taking some reasonable performance hit when these are final. My guess is right now it might be reasonable to expect 40-50% improved performance of Ryzen SMT vs Intel HT + ZL mitigation fixes/bios updates.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, spkay31 said:

I realize this thread is pretty old but was searching for Intel HT vs AMD Ryzen SMT comparison now that the Intel Zombieload mitigation fixes are in the process of being delivered and the reality is the Intel's HT will be taking some reasonable performance hit when these are final. My guess is right now it might be reasonable to expect 40-50% improved performance of Ryzen SMT vs Intel HT + ZL mitigation fixes/bios updates.

I doubt it'll be anywhere near that much outside of niche cases. If you wait a month or so, I do intend to get Zen2, and will retest Zen, Zen+ and Skylake with current bios/Windows patches at the time.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×