Jump to content

Speculation on Folding Performance of RDNA3

Attempting to compare the theoretical performance of AMD and NVIDIA GPUs is difficult these days. Since Ampere NVIDIA has been reporting FP32 performance including Tensor Cores which currently can not be used for Folding. AMD, on the other hand, is reporting multiple GPUs using the same Device ID rendering statistics gathering less than ideal.

 

So what kind of theoretical performance we might see with the new RX 7900 XT and XTX RDNA3 cards?

 

Their FP32 performance is listed as 30.78 and 25.8 TFLOPS respectively. With the  6900 XT with 23.04 TFLOPS FP32 performance and yielding 4,520,522PPD on a medium atom-sized Influenza WU (p18450) assuming the performance scales linearly we should see 6,039,135 and 5.062,043PPD respectively. So not even close to the 16,438,230PPD seen from the 4090 with it's 82.58 TFLOPS combined Tensor + CUDA core performance.

 

To get an idea of the performance penalty AMD is under with Compute due to being limited to OpenCL we can look at the relative performance between a RX 6650 XT at 10.79TFLOPS which sits somewhere between a RTX 2080 at 10.07 TFLOPS and a RTX 2080 Super at 11.15 TFLOPS. On the same Influenza WU the 6650 XT yields 1,599,809PPD versus 2,209,943 and 2,810,358 for the 2080 and 2080 Super. A prototypical 2080 running at the same TFLOPS as the RX 6650 XT would yield 2,610,220PPD.

 

To put it another way AMD GPUs running OpenCL would see about a 38.7% decrease in yield compared to a NVIDIA GPU running CUDA.

 

So while the new AMD GPUs may have potential to be a much better bang for the buck for gaming, with the state of OpenCL performance, likely the best we can hope for for is that the competition from AMD forces NVIDIA to be more aggressive in pricing.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Gorgon said:

Attempting to compare the theoretical performance of AMD and NVIDIA GPUs is difficult these days. Since Ampere NVIDIA has been reporting FP32 performance including Tensor Cores which currently can not be used for Folding. AMD, on the other hand, is reporting multiple GPUs using the same Device ID rendering statistics gathering less than ideal.

 

So what kind of theoretical performance we might see with the new RX 7900 XT and XTX RDNA3 cards?

 

Their FP32 performance is listed as 30.78 and 25.8 TFLOPS respectively. With the  6900 XT with 23.04 TFLOPS FP32 performance and yielding 4,520,522PPD on a medium atom-sized Influenza WU (p18450) assuming the performance scales linearly we should see 6,039,135 and 5.062,043PPD respectively. So not even close to the 16,438,230PPD seen from the 4090 with it's 82.58 TFLOPS combined Tensor + CUDA core performance.

 

To get an idea of the performance penalty AMD is under with Compute due to being limited to OpenCL we can look at the relative performance between a RX 6650 XT at 10.79TFLOPS which sits somewhere between a RTX 2080 at 10.07 TFLOPS and a RTX 2080 Super at 11.15 TFLOPS. On the same Influenza WU the 6650 XT yields 1,599,809PPD versus 2,209,943 and 2,810,358 for the 2080 and 2080 Super. A prototypical 2080 running at the same TFLOPS as the RX 6650 XT would yield 2,610,220PPD.

 

To put it another way AMD GPUs running OpenCL would see about a 38.7% decrease in yield compared to a NVIDIA GPU running CUDA.

 

So while the new AMD GPUs may have potential to be a much better bang for the buck for gaming, with the state of OpenCL performance, likely the best we can hope for for is that the competition from AMD forces NVIDIA to be more aggressive in pricing.

Shouldn't you be using 61tflops for FP32, since that is the official spec? I'm guessing you are using Techpowerup for all you numbers, but they have had incorrect tflop information besides this one, like with Arc double precision. Looks like they are probably wrong here as well.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, miker07 said:

Shouldn't you be using 61tflops for FP32, since that is the official spec? I'm guessing you are using Techpowerup for all you numbers, but they have had incorrect tflop information besides this one, like with Arc double precision. Looks like they are probably wrong here as well.

Hmm.  Yes, I’m using the TechPowerup value but the do note that the FP16 is 123 at 1:4 so that might be their issue. Doubling the FP32 performance would put then at 12MPPD which is closer but still only 75% of the way there. Still it would make the value proposition better.

 

Ive never been one to buy the top tier cards as they’re usually the worst value proposition but if the numbers for the 4090 and 4080 are to be believed then it actually looks like the 4090 might be a better value than the 4080 a mistake NVIDIA hasn’t made since Pascal with the 1080ti

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×