Jump to content

Windows Doesn't Like 64 Threads; TR 2990WX Threading Tests

https://www.phoronix.com/scan.php?page=article&item=2990wx-linwin-scale&num=1

 

Brought this over here because it's more of a discussion than a News piece, but it's interesting none the less.

 

Bh99G.png&key=a5a596bff0a61bbf4016f742b9

 

Phoronix did a lot of testing and these types of results are pretty common. @leadeater , looks like we have some answers about what is going on. Some of the Windows libraries are likely the culprit. It also explains why there were similar problems when Threadripper 1 dropped. They just moved those libraries from 24 threads to 32 threads and the problem went away. Then AMD drops a 64 thread part and, suddenly, those SMT problems crop back up.

 

So, if you have an AMD 2990WX (or 2970WX) and you're wondering why certain tasks seem really slow, here's the main answer until Microsoft gets around to fixing things. 

 

Edit: Fixed the image.

2990wx.jpg

Link to comment
Share on other sites

Link to post
Share on other sites

Use Windows Server?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

@Taf the Ghost

Dead image link, had similar problems try to link pictures off Phoronix as well. Gave up and just used snip/screenshot last time lol.

Yeah, he's using SVG which is a vector/data approach for web display. I didn't realize it, so hard image is go!

Link to comment
Share on other sites

Link to post
Share on other sites

Microsoft - Micro performance

CPU: Core i9 12900K || CPU COOLER : Corsair H100i Pro XT || MOBO : ASUS Prime Z690 PLUS D4 || GPU: PowerColor RX 6800XT Red Dragon || RAM: 4x8GB Corsair Vengeance (3200) || SSDs: Samsung 970 Evo 250GB (Boot), Crucial P2 1TB, Crucial MX500 1TB (x2), Samsung 850 EVO 1TB || PSU: Corsair RM850 || CASE: Fractal Design Meshify C Mini || MONITOR: Acer Predator X34A (1440p 100hz), HP 27yh (1080p 60hz) || KEYBOARD: GameSir GK300 || MOUSE: Logitech G502 Hero || AUDIO: Bose QC35 II || CASE FANS : 2x Corsair ML140, 1x BeQuiet SilentWings 3 120 ||

 

LAPTOP: Dell XPS 15 7590

TABLET: iPad Pro

PHONE: Galaxy S9

She/they 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

@Taf the Ghost

Dead image link, had similar problems try to link pictures off Phoronix as well. Gave up and just used snip/screenshot last time lol.

As for the topic, it's good we finally have an explanation. Some of it is the scheduler, I imagine especially if you're dealing with Games, but the results were too weird when some tested with SMT Off. It definitely is some set of libraries that have dependencies that just don't accept 64 threads.

 

Most of the time, the performance at 64 threads is around the 24 thread result, which probably means it is doing some sort of fall-back limitation. Funniest result, though, was FFmpeg.

 

5b7a27b76c865_2990wx2.thumb.jpg.bc93d4a407010f7dded7c5a6d11eb49f.jpg

 

I imagine this would repeat itself on even X299 CPUs as well. And I do find it funny that 64 threads is slower than 1 thread.

 

However, I think we need to start asking questions about Adobe Premiere performance over threads.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Taf the Ghost said:

Yeah, he's using SVG which is a vector/data approach for web display. I didn't realize it, so hard image is go!

Here, high-res SVG to PNG conversion.

Bh99G.png

 

A bit more on-topic: hoo-wee. Linux is the way to go for these chips?

Would make for a hella encoding machine.

Check out my guide on how to scan cover art here!

Local asshole and 6th generation console enthusiast.

Link to comment
Share on other sites

Link to post
Share on other sites

Isn't this just a repeat of the existing news thread?

 

[Out-of-date] Want to learn how to make your own custom Windows 10 image?

 

Desktop: AMD R9 3900X | ASUS ROG Strix X570-F | Radeon RX 5700 XT | EVGA GTX 1080 SC | 32GB Trident Z Neo 3600MHz | 1TB 970 EVO | 256GB 840 EVO | 960GB Corsair Force LE | EVGA G2 850W | Phanteks P400S

Laptop: Intel M-5Y10c | Intel HD Graphics | 8GB RAM | 250GB Micron SSD | Asus UX305FA

Server 01: Intel Xeon D 1541 | ASRock Rack D1541D4I-2L2T | 32GB Hynix ECC DDR4 | 4x8TB Western Digital HDDs | 32TB Raw 16TB Usable

Server 02: Intel i7 7700K | Gigabye Z170N Gaming5 | 16GB Trident Z 3200MHz

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, 2FA said:

Isn't this just a repeat of the existing news thread?

 

Not necessarily; this discusses more that Windows doesn't do well with 64 cores rather than Linux performance on the 2990WX being much better than Windows performance.

Check out my guide on how to scan cover art here!

Local asshole and 6th generation console enthusiast.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Taf the Ghost said:

As for the topic, it's good we finally have an explanation. Some of it is the scheduler, I imagine especially if you're dealing with Games, but the results were too weird when some tested with SMT Off. It definitely is some set of libraries that have dependencies that just don't accept 64 threads.

A lot of it looks like software itself as well, there was a decent number which worked fine.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Dan Castellaneta said:

Here, high-res SVG to PNG conversion.

 

 

A bit more on-topic: hoo-wee. Linux is the way to go for these chips?

Would make for a hella encoding machine.

Thanks, I edited the image in. 

 

As for the 2990WX, with a properly working OS, the memory bandwidth issues seem to be fairly minimal, so, yes, they are monster encoding machines.

Just now, 2FA said:

Isn't this just a repeat of the existing news thread?

 

We knew the Linux performance was a lot better, but this thread is about "why". There's clearly problems with the way Windows handles certain things, and it's not just the Scheduler.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

A lot of it looks like software itself as well, there was a decent number which worked fine.

Yup. It's more than likely some set of standard libraries that has the issues. Maybe a set of compiler flags?

 

Interesting thing is that this should hit the 28 core Intel A-series as well.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Taf the Ghost said:

Yup. It's more than likely some set of standard libraries that has the issues. Maybe a set of compiler flags?

 

Interesting thing is that this should hit the 28 core Intel A-series as well.

Probably a lot of stuff using .Net and/or older stuff with fixed thread pool sizes or no thread pooling at all. From my limited time doing C# programming multi-threading and thread scaling really is not a "it just works" thing. Been a long time since I've written any sort of proper code though.

 

Edit:

https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, leadeater said:

Probably a lot of stuff using .Net and/or older stuff with fixed thread pool sizes or no thread pooling at all. From my limited time doing C# programming multi-threading and thread scaling really is not a "it just works" thing. Been a long time since I've written any sort of proper code though.

 

Edit:

https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl

 

And just before this timestamp as well. I think a lot of those libraries break at 32 threads. We haven't really looked at a whole bunch of HEDT benchmarks since last year, and there's been a lot more going on here than we realized. Mostly because they added just so many extra threads in such a short time-span that I don't think anyone realized there were problems since the results were so much higher in raw numbers.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×