Jump to content

Prime95 29.5 now supports AVX-512 (Windows ver available)

porina

https://mersenneforum.org/showthread.php?t=23723

 

This could be interesting for Skylake-X owners. AVX-512 support has been added so you can really give your CPU a workout. Linux only for now, but I'll run benchmarks on it as soon as a Windows version is also available.

 

Edit: a Windows version is now available.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

So does this mean even more heat than it already caused?  Is it even safe to use?

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Ryan_Vickers said:

So does this mean even more heat than it already caused?  Is it even safe to use?

This software was developed to do real work, finding large prime numbers. People do run this 24/7. It is as safe as any other software as long as you don't go crazy on overclocking with unsafe voltages and inadequate cooling. AVX-512 promises more throughput, although I suspect for large FFTs it'll be ram bandwidth limited so probably not much different from AVX2. For small FFTs, that limitation wont apply and we may see a bigger difference.

 

Not exactly the same I know, but I can only run my 7800X at 4.3 GHz when running y-cruncher with AVX-512, compared to 4.9-ish for non-AVX.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, porina said:

This software was developed to do real work, finding large prime numbers. People do run this 24/7. It is as safe as any other software as long as you don't go crazy on overclocking with unsafe voltages and inadequate cooling.

Yeah, I sometimes forget about that xD But it is (or was) often used as a stress test and/or heat test, but like Furmark, last I checked it was considered to be in a class all to its own, above what any normal stress test would do (which are already borderline for being realistic scenarios) and actually in the dangerous realm.

6 minutes ago, porina said:

AVX-512 promises more throughput, although I suspect for large FFTs it'll be ram bandwidth limited so probably not much different from AVX2. For small FFTs, that limitation wont apply and we may see a bigger difference.

 

Not exactly the same I know, but I can only run my 7800X at 4.3 GHz when running y-cruncher with AVX-512, compared to 4.9-ish for non-AVX.

Interesting...

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Ryan_Vickers said:

Yeah, I sometimes forget about that xD But it is (or was) often used as a stress test and/or heat test, but like Furmark, last I checked it was considered to be in a class all to its own, above what any normal stress test would do (which are already borderline for being realistic scenarios) and actually in the dangerous realm.

It is doing a lot of work, which is what AVX was created for. Over time, the software is continuously optimised for performance. This is what we want for any software right?

 

I'm aware of claims of problems with specific CPUs in the past, but I missed that era and didn't look further into it. It wasn't a problem with Sandy Bridge, and I didn't have any problems with Haswell or newer.

 

Intel saw the power gap between AVX and non-AVX growing, and from Kaby Lake mainstream / Broadwell HEDT they introduced AVX offset to allow non-AVX to hit higher clocks. Even at a lower clock AVX offers a lot of potential.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Is it Halloween yet? This is scary stuff! Windows test version just got posted. Small FFT stress test = >100C with 7800X at stock clocks (4.0 for AVX-512). This is with delid, Kryonaut and a watercooling setup. Note I don't have liquid metal on it as I'm preparing it for low temperature testing where it can't be used.

 

I redid the TIM to make sure, but I'm seeing cores 0, 2, 5 ~20C cooler than the remaining cores under this load. The highest temps are achieved when HT is enabled. If I don't use that in software, hottest cores are only in the 80's.

 

7800X system power draw at wall, tests at 64k FFT

 

380W AVX-512 12 threads

320W AVX-512 6 threads

320W AVX2 12 threads

280W AVX2 6 threads

115W system idle

 

I didn't take 12 thread benchmark results, but at 6 threads, AVX-512 did 56273 iter/s and AVX-2 did 33540. So roughly 68% more throughput for 14% more system power, or 24% more CPU power (load-idle). Performance per watt improvement :) I should add, 64k FFT isn't really used much within the groups I participate in. With hindsight I should have done something in the 256k ball park where ram limitations haven't kicked in hard yet, but is still in a used range.

 

On that note... for small FFTs running one task per core, we are looking at ball park 70% throughput improvement up to 256k, where it tails off towards 512k and we're thoroughly ram bandwidth limited. Even in this ram limited region, we're still ahead of AVX2 by the ball park of 10%.

 

For single task using multi-thread results, it tracks AVX2 until 1024k where it is ball park 20% ahead. Something happens around 2560k where it drops to around 5% improvement and stays there for bigger FFTs. Note multi-thread scaling has never been good for small FFTs, as it seems to need a big enough task size before it can efficiently spread the work out.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, porina said:

Is it Halloween yet? This is scary stuff! Windows test version just got posted. Small FFT stress test = >100C with 7800X at stock clocks (4.0 for AVX-512). This is with delid, Kryonaut and a watercooling setup. Note I don't have liquid metal on it as I'm preparing it for low temperature testing where it can't be used.

That sounds like the Prime95 I know

Solve your own audio issues  |  First Steps with RPi 3  |  Humidity & Condensation  |  Sleep & Hibernation  |  Overclocking RAM  |  Making Backups  |  Displays  |  4K / 8K / 16K / etc.  |  Do I need 80+ Platinum?

If you can read this you're using the wrong theme.  You can change it at the bottom.

Link to comment
Share on other sites

Link to post
Share on other sites

In the post-Kaby Lake era, I'd argue that it's pointless to say that AVX and AVX512 are "more stressful" than normal code.

 

The reason I say that is simply that AVX(512) aren't supposed to run at the same speed as normal code anyway. For the same reason, nobody runs their GPU at the same speed of the CPU. AVX and AVX512 fall somewhere in between.

 

To oversimplify a bit, a Skylake X chip has 5 different sets of silicon that run at different speeds:

  1. Main core.
  2. AVX units
  3. AVX512 units
  4. uncore (L3/mesh)
  5. DRAM

Most people are familiar with 1, 4, and 5. But very few people are aware of 2 and 3.

 

Most of the people who get "burned" (pun intended) by AVX or AVX512 either don't know about the existence of 2 and 3 and therefore are likely to set them wrong. Or they are aware, but don't realize their importance and thus ignore them. And from what I've been reading online, it seems to be a good mix of both.

 

Back in the Haswell/Skylake (non-X) days, 3 didn't exist and 1 was tied to 2. So you were forced to drop to down to the lower denominator. This is what gave AVX its bad rep among the OC community in the first place. But this bad rep seems to have stuck even after Intel has fixed AVX by separating 1 and 2 into different classes (which it really should've done from the very beginning in Sandy Bridge)

 

The problem now is that because the chip is so complicated now, it's become increasingly difficult to stabilize it against all workloads. So it's easy to screw up. And each time someone crashes on AVX or AVX512, it just adds fuel to FUD that "AVX is bad".

 

So I'm interested to see what the rest of the OC community does with the new Prime95 with AVX512.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Mysticial said:

So I'm interested to see what the rest of the OC community does with the new Prime95 with AVX512.

Up to this point, I think about the only AVX-512 load the OC community had was y-cruncher... where I still hold the 6 core record on hwbot. Almost wish I had more Skylake-X CPUs to get a few more :)

 

Given the limited CPU support for AVX-512 this may remain a niche of a niche for some time.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×