Jump to content

HOWTO: Profiling Folding GPUs

Let's take a look at one method we can use to measure the efficiency of a Graphics Card (GPU) at various Power Limits.

 

With electricity costs soaring globally and the need to reduce heat emissions running Folding@Home can be a delicate balancing act between contributing to a worthwhile cause and keeping your Electricity Bill low and the Temperature in your home at a reasonable level.

 

Modern GPUs, like CPUs, have a power-efficiency curve that is exponential. At the upper end of the curve you get diminishing increases in yields. So our goal is to find the most efficient Power-Level to run a GPU at. We can define Efficiency as the Yield (PPD) at a specific Power Level (W). For convenience we will use kPPD/W as the measurement of Efficiency.

 

What you will need:

  • Folding at Home Advanced Control
  • Harlam's Folding Monitor (HfM.net) (Windows only or using Wine in Linux)
  • nvidia-smi (Bundled with NVidia drivers on Windows and Linux)
  • Excel or Google Sheets
  • An hour or two per GPU.

The best way of measuring efficiency in Folding@Home, given the variable yields in differing Work Units (WUs), is to run a GPU at a target Power-Level over a period of several days recording the Aggregate Yield of the GPU and dividing it by the Power-Level to obtain the Efficiency at that Power-Level then adjusting the Power-Level and repeating the measurements.

 

However, a quick indication of a GPUs efficiency can be measured by observing the changes in Yield (PPD) during a single WU as the Power-Limit is adjusted. Frame Time (TPF) is the time required to complete 1/100th of a WU.

 

In this example we will look at a EVGA RTX 2070 Super XC Hybrid (08G-P4-3178-KR) running project 18202 as the WU.

 

First we need to configure HfM.net to calculate it's estimate of Yield (PPD) using the last 3 Frames as the Sampling Window. A larger Sampling Window might provide more accuracy but will take more time to measure.

 

Select Preferences in the Edit Menu in HfM and choose "Last 3 Frames" to Calculate PPD based on and Click OK.

HfmConfig.jpg.dad04863de25cfa49388475c30602b7a.jpg


Note that TPF appears to be calculated across all Frames so PPD will be a better measurement.

 

Select a GPU to profile taking note of which Slot on which Host it is running.

 

First we need to determine the Minimum and Maximum Power-Levels supported by the GPU. Open a Command Prompt (Windows) or a Terminal Window (Linux) and enter

nvidia-smi -q

to query the capabilities of the GPUs installed in the system:

    Power Readings
        Power Management                  : Supported
        Power Draw                        : 126.81 W
        Power Limit                       : 125.00 W
        Default Power Limit               : 215.00 W
        Enforced Power Limit              : 125.00 W
        Min Power Limit                   : 125.00 W
        Max Power Limit                   : 240.00 W

where:

  • Power Limit: Current value Power-Limit is set to
  • Power Draw: Current Power consumed by the GPU
  • Default Power Limit:
  • Min Power Limit:
  • Max Power Limit:

Here we see this GPU has a minimum Power of 125W, a Maximum of 240W so we will want to measure the Yields between these two Limits. We will use 25W as the step size and record Yields at: 125, 150, 175, 200, 225 and 240 Watts.

 

Next open the Folding@Home Advanced Control application from the Task Bar. Select the system with the GPU under test click on the "Log" tab to view the log checking the "Filter" option and selecting the appropriate "Slot" from the drop-down list.:

AdvCtl_WUprog.jpg.04985ecffa9cbcd23293379e17e72d20.jpg

Here we can see that this WU Checkpoints every two frames. We want a consistent sampling window with the same number of Checkpoints as the Checkpoint process adds a slight delay reducing the Yield. In this case we choose to record the Yield after an odd percentage has completed every 6th percentage as we want a sampling interval (6 frames) wider than that used for the Yield estimate (3 Frames) but with a consistent number of Checkpoints (3).

 

It is important we measure the actual Power Draw rather than the set Power-Limit as at lower and upper bounds the GPU may have trouble enforcing the Power-Limit.

 

Wait until the WU is 5-10% complete before starting measurements.

 

In our Command Prompt (Windows) or Terminal (Linux) enter:

nvidia-smi -i 0 -l 1 --format=csv,noheader --query-gpu=temperature.gpu,power.draw,clocks.gr,fan.speed

which will query GPU 0 (-i 0) on this system and display the GPU temperature, Power Draw, Graphics Clock Speed and Fan Speed once a second.

MeasurePowerDraw_Linux.jpg.0c584acb03e0ae160fa876f69f1c8637.jpg

While the sampling window for the current set Power-Limit is in progress we will use this to estimate the Power Draw during the sampling window. In the above example with a 125W Power-Limit we see that the GPU appears to be averaging around the set value of 125W.

 

Next we create a spreadsheet to record our values:

Spreadsheet.jpg.4fa3b05bbd16edb5bd75479d3e8b6569.jpg

The first Column is our "Set" Power-limit; the second our Observed Power-Draw; The third the Percentage measurement point; the fourth the TPF in Seconds from HfM; the fifth the Yield from HfM and the 6th the calculated Efficiency (E/B/1000) in kPPD/W.

 

In a second Administrator Command Prompt (Windows) or Terminal (Linux) set the GPU starting with the lowest Power-limit at the end of a Frame.

nvidia-smi -i <GPU#> -pl <Min. Power>
In this instance I used:
nvidia-smi -i 0 -pl 125

Watch the nvidia-smi window during the sampling interval and record the estimate of the Power-Draw. Populate the Command Prompt or Terminal with the next set-point in preparation for when the current sampling window ends.

 

As soon as the current sampling period finishes (watch the Log in Advanced Control) change to the next set-point (nvidia-smi -i <X> -pl <Y>) and record the TPF and PPD estimate from HfM for the previous sampling window.

 

HfM_PPD.jpg.5b6da3065c85e42b6ce8edc6fb7db09b.jpg

 

It helps to record the TPF and PPD values a couple of times later in the sampling interval as they should be fairly stable after 3-5 frames have completed and it will give you a good estimate of the final values. As HfM calculates the Yield (PPD) based on the last 3 Frames and our sampling window is 6 Frames you do not have to be super accurate how soon after the Frame Completion you change to the next Set Point.

 

Here are the final values. The values seemed inconsistent after the 175W Set-Point (completed 15:02) so I took measurements adjusting the Power-Limit down from the Maximum for comparison. Perhaps the calculations performed on the WU around this point got more complicated?

 

Here is the smoothed (5-minute average values for PPD and Power) efficiency for this GPU over the initial test run from my Zabbix server for comparison.

ZabbixEfficiency.thumb.png.5d720aef7e400db48c4dfc3e158d1861.png

 

I then calculated the Average Efficiency over the two measurements for each of the Set-Points:

Spreadsheet2.jpg.343e4d1ea8f190fc47b97a72aa639f8f.jpg

 

We can then create a scatter graph of the data including a Trend line and display the Confidence or "Fit" of the Trend line (R^2 value):

ResultGraph.jpg.20e969b87e0e6c632339142cc785cbaa.jpg

 

For this WU on this GPU we see the Efficiency is highest at the lowest Power-Limit and gets exponentially worse as the Power-Limit is increased.

 

To put it another way, dropping from 225W, which is close to the 217W Default, to the Minimum 125W Limit we see a 7.53% decrease in PPD for a 44.4% decrease in Power.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

All Done - @Shlouski Make Sense?

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Gorgon said:

The best way of measuring efficiency in Folding@Home, given the variable yields in differing Work Units (WUs), is to run a GPU at a target Power-Level over a period of several days recording the Aggregate Yield of the GPU and dividing it by the Power-Level to obtain the Efficiency at that Power-Level then adjusting the Power-Level and repeating the measurements.

 

Absolutely, this was exactly what I wanted to do.

 

8 hours ago, Gorgon said:

However, a quick indication of a GPUs efficiency can be measured by observing the changes in Yield (PPD) during a single WU as the Power-Limit is adjusted. Frame Time (TPF) is the time required to complete 1/100th of a WU.

 

I will give it a go, but this is where I have a problem.

 

For me altering the characteristics of a GPU during a running wu can cause ppd fluctuations far beyond the change in performance, it can take several minutes for the client to start adjusting after a change, if at all. 

 

For example

The gpu on this PC was reporting 8.2 mil ppd at default settings in its current wu, changing the power limit to 90% resulted in a 6 mil ppd decease reported in the client to around 2 mill ppd, 10 minutes later and the client is reporting 2.5mil ppd and putting the card back to default settings hasn't altered the reported ppd after another 10 minutes of waiting.

 

Unfortunately this is a common issue on all my systems and why I don't trust the reported ppd after changing gpu settings, but the testing method described in the first quote wouldn't just be more accurate, it also means I wouldn't have to alter gpu setting during a wu and avoiding the issues I'm experiencing.

 

This hasn't stopped me from noticing a pattern by watching how the 30 series cards respond to power limits while running FAH.

30 series cards experience a sudden crash of clock speeds upon reaching a certain power limit, its often as little as a 1-5% change. 30 series cards boost to the best part of 2000mhz at default setting, reducing the PL can get these clock speeds down to around the 1300-1400mhz range and it is at this point the cards have a huge drop in clock speeds when the PL is reduced a little further, just a few percent (less 5%) and they can drop down to around 600-800mhz range, if not lower.

 

These are some of my gpu's power limits, at these power limits they are running around the 1300-1400mhz range and just a few percent over the clock crash, sometimes as little as just 2% PL away from losing 500-800mhz on the core. These cards are also running the maximum stable memory overclocks which range from 500mhz to 1200mhz.

 

Gigabyte 3070 - PL 50%

EVGA FTW3 3070 - PL 48%

PNY 3070 - PL 58%

MSI SUPRIM 3070 - PL 46%

Zotac 3070 - PL 52%

Gigabyte 3080 - PL 65%

EVGA FTW3 3080 - PL 58%

Gigabyte 3080 ti - PL 74%

MSI 3090 - PL 78%

 

Of course I do not have accurate performance data, but from the looks of if these gpu's achieve around 1 mil ppd less than the averages stated on the Lars systems ppd database.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Gorgon

 

Yes I understand, the only thing I'm unsure on is when to take the ppd value.

 

I see that the wu in your example checkpoints every two frames and I understand the processing of the checkpoints adds delay reducing the Yield. HFM is using the last 3 frames to calculate the ppd value, I'm guessing HFM updates that value with every newly completed frame averaged with the 2 previous frames?

 

If you have a wu that checkpoints every 2 frames and HFM is using the last 3 frames, no matter where you start won't you always have a checkpoint in between your frames?

 

Perhaps when possible it is preferable to use a wu with more than 3 frames between checkpoints?

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Shlouski said:

I will give it a go, but this is where I have a problem.

 

For me altering the characteristics of a GPU during a running wu can cause ppd fluctuations far beyond the change in performance, it can take several minutes for the client to start adjusting after a change, if at all. 

The Advanced Control PPD reporting can be tricky.  HfM, on the other hand, can be set to report the PPD estimate based on calculating it from the average Frame Time and the Base Points.

 

The technique I outline does work on 3000-series GPUs. With my 3070ti I observed the same behaviour you describe at lower power limits where the Graphics Clock drops dramatically and with it the PPD.


See this post:

I’d suggest starting at the Maximum Power Limit and moving down. You can also use a smaller step size around where the clocks start to fall off to find the exact “knee” of the Cliff where the Clocks fall off but this will likely be specific to the WU so I would suggest running 10-25W above the knee to be safe.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Shlouski said:

@Gorgon

 

Yes I understand, the only thing I'm unsure on is when to take the ppd value.

 

I see that the wu in your example checkpoints every two frames and I understand the processing of the checkpoints adds delay reducing the Yield. HFM is using the last 3 frames to calculate the ppd value, I'm guessing HFM updates that value with every newly completed frame averaged with the 2 previous frames?

 

If you have a wu that checkpoints every 2 frames and HFM is using the last 3 frames, no matter where you start won't you always have a checkpoint in between your frames?

 

Perhaps when possible it is preferable to use a wu with more than 3 frames between checkpoints?

Having a checkpoint in the sampling window is OK, in fact it’s desirable, but what I was trying to get across was you want the same number of checkpoints in each sampling window.

 

In the example given if we used a 3 or 5 frame sampling window then some samples would have 2 or 3 checkpoints and others only 1 or 2.


Perhaps a better way of putting it would be to say to set the Sampling interval to a multiple of the checkpointing interval (2 Frames in this case) that is greater than the 3 Frame window HfM uses. So the minimum sampling window in this case would be 4 frames but a 6 frame sampling period gives a bit more time for the values to settle.

 

In the example given I started the lowest sampling window at 23% by setting the Power Limit to 125W and then at 29% raised the PL to 150W and recorded the 125W PPD reported by HfM over the last 3 Frames.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Gorgon said:

Having a checkpoint in the sampling window is OK, in fact it’s desirable, but what I was trying to get across was you want the same number of checkpoints in each sampling window.

 

Ok I understand.

 

I would be interested in comparing a variety of models of the same card, I suspect some models may perform a fair bit better than others as I have see around a 20w differential between my 3070's at similar clock speeds.

 

I will upload some screen caps showing power limits and HFM.

Link to comment
Share on other sites

Link to post
Share on other sites

Evga 3070 ftw3.

 

Card won't go over 200w when folding.

 

Default: 

 

1109644102_evga3070default.thumb.png.39f6871e1930a7192bd5534634fcd6a3.png

 

1706747722_evga3070175.thumb.png.431671ad333385d96ccf73181050b768.png

 

1862161635_evga3070150.thumb.png.2cd5ae3e6516ae4eb9cc9bb92b398b45.png

 

1627405128_evga3070125.thumb.png.2deb41548b2f32cf2ae0052e90aadb4c.png

 

1955981256_evga3070100.thumb.png.14699f57e7f525d762d187e535317ead.png

 

616431471_evga3070300.thumb.png.1deb823b7670d7b2666ba07984ba4e04.png

 

Useful at all?

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Shlouski said:

Evga 3070 ftw3.

 

Card won't go over 200w when folding.

 

Default: 

 

1109644102_evga3070default.thumb.png.39f6871e1930a7192bd5534634fcd6a3.png

...

Useful at all?

Some WUs are too small (too few atoms) or too complex (too much Dual Precision Calculations) to make larger Turing & Ampere cards work hard (boost to high clocks and use more power). Are you adding a Graphics Clock offset? I usually use +50MHz for Pascal & Turing as that seems to be a safe compromise. Haven't tried anything larger with Ampere but the one 3070ti I have seems to like +50 (haven't seen any failed WUs). p18202 is "largish" at 302.700 atoms but maybe see what it does with a p16701 with 446,955 atoms.

 

My 3070ti is running p18601 at PL 150/gClock ~151MHz. I raised it to 300W and am seeing 270-290W actual power draw with Gclock ~1965MHz. I can't leave it there too long as the 700VA/450W UPS it's connected to goes into bypass mode with the 2070 Super (125W) and the 3900x CPU (45W) also in that system.

 

The EVGA RTX 3070 FTW3 Ultra should be able to use close to all of it's power on a big WU. It's VBIOS allows:

Board power limit
 Target: 270.0 W
 Limit:  300.0 W
 Adj. Range: -63%, +11%

so a Min: 100W; Def: 270W; Max: 300W

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Shlouski said:

 

Ok I understand.

 

I would be interested in comparing a variety of models of the same card, I suspect some models may perform a fair bit better than others as I have see around a 20w differential between my 3070's at similar clock speeds.

 

I will upload some screen caps showing power limits and HFM.

Once you figure out what PL you want to run the cards at a much more interesting comparison would be to view the same WU results at the same PL across the same GPU model using HfM.

Here is a query I used to view just p16701 on just my 2070 Supers:

p16701_query.jpg.f5cfe4bc721d9acf24311e23f8ca5879.jpg

and the results:

p16701_results.jpg.d18f89ac3861787a9855e39f585db01a.jpg

So there is variation within the cards (these are all EVGA RTX 2070 Super XC Hybrids bought at the same time) but is it due to silicon quality variations or variations within the WUs? I suspect a bit of both.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

For Turing and later GPUs including Ampere we use can another method to examine efficiency.

 

These models support locking the GPU clocks (Graphics or Shader Clock) at a range of frequencies using the System Management Interface command:

nvidia-smi -i <GPU ID> -lgc <lower>,<upper>

Both Turing and Ampere GPUs support a range of Graphics Clocks from a lower limit of 300MHz to an upper limit of 2130-2160MHz for Turing and 405-2115MHz for Ampere both in 15MHz steps which can be viewed using:

nvidia-smi -i <GPU ID> --query-supported-clocks=gr --format=csv

 

We will use a better method for recording the Running or Simple Moving Average (SMA) of the GPU Power Consumption in these tests. First we need to create an AWK script to calculate the SMA:

sudo nano ~/run-avg.awk

adding the code:

#!/usr/bin/awk -f
{
  if (!col)  col = 1
  if (!size) size= 5
  mod= NR%size; 
  if(NR <= size){count++}
  else{ sum-= array[mod] };
  sum+= $(col); array[mod]= $(col);
  print sum/count
}

and make the script executable

sudo chmod +x ~/run-avg.awk

Next we need to to collect the GPU data over a large enough interval for analysis. I use a 1-3 minute period to collect data at a 1 second interval. Start the data collection using:

nvidia-smi -i <GPU ID> -l 1 --format=csv,noheader,nounits --query-gpu=power.draw,clocks.gr > gpu_data

and stop it by pressing <CTRL>+C after the desired period.

 

We then can analyze the collected data to see the SMA of the GPU Power Consumption (Column 1) over a defined Window size of 30 seconds (size=30):

root@dcn02:~# cat gpu_data | ~/run-avg.awk size=30 col=1
186.59
181.985
183.297
181.95
183.054
183.592
182.953
183.359
182.689
183.178
183.25
183.528
183.084
183.346
179.883
180.264
180.121
180.494
180.394
180.686
180.542
180.822
180.685
180.379
180.466
180.458
180.701
180.614
180.841
180.767
180.76
180.802
180.555
180.859
180.559
180.567
180.537
180.259
180.534
180.216
180.34
180.345
180.348
180.377
182.268
182.014
182.195
182.246
182.243
181.948
182.252
181.972
182.241
181.093
180.929
181.117
181.098
181.23
180.959
181.227
179.665
179.641
179.911
179.636
179.913
179.897
179.893

Here we look at the mid-range of the values and see the moving average is about 180.5W.

 

This same method can be used for our previous method of adjusting Power-Limits to display the SMA of the Graphics Clock (Column 2) by executing:

cat gpu_data | ~/run-avg.awk size=30 col=2

We next select a range of Graphics Clocks to test the GPU's Efficiencies at. 450-1980MHz with 90MHz steps (6 x 15MHz "ticks") was chosen to represent a set of values that would enable analysis over the range of typical Clocks within the typical length of a WU.

 

Again we start collecting data after a WU has started and progressed at least 5-10% to allow for settling of initial values.

 

Set the GPU to the lowest Graphics Clock:

nvidia-smi -i <GPU ID> -lgc 450,450

Start the SMA Data collection:

 nvidia-smi -i <GPU ID> --format=csv,noheader,nounits --query-gpu=power.draw,clocks.gr > gpu_data

and record the SMA of the Power after 1 to 3 minutes:

cat gpu_data | ~/run-avg.awk size=30 col=1

 

and the Yield (PPD) at the end of the WU sampling window in a spreadsheet:2060s_Clks_Efficencies.jpg.e28fb17dae37374ffaeee76ebe863057.jpg

 

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Analysis

 

Using the Graphics Clock adjustments we can run the GPU under test at lower effective Powers than could be achieved using the range of available Power Limits and analyzing the data for the RTX 2060 Super tested we can see an Efficiency-Graphics Clock curve:2060s_Gclock_Eff.jpg.eff93e6c1da97599e135174eeb814a0c.jpg

similar to that observed for the RTX 3070ti Ampere GPU:

58662809_3070ti_ClksEfficiencies.jpg.d0b4990c0e170159b6c492113173d5f4.jpg

Both these charts exhibit two distinct range of data. A gradual increase in efficiency from the lower bounds of the GPU clock until the Peak Efficiency is reached between 1250 and 1450MHz followed by a sharp decrease in Efficiencies after the Peak at higher Clock Speeds.

 

Examining the Power-Graphics Clock curve for the Turing GPU Tested (RTX2060 Super) we see an exponential increase in Power needed after the Peak Efficiency is reached at 1250Mhz to achieve higher Graphics Clocks and Yields:

2060s_Clks_Clocks_vs_Power.jpg.ac4ab33ed6e64a41327e96ea22d746cf.jpg

However, NVIDIA or the Add-in-Board Partner (AIB - EVGA) set the minimum Power Limit of this model to 125W which is higher on the upper range of the Efficiency curve around and 1800MHz Graphics Clock Frequency preventing the card from operating at peak efficiency using just a Power-Limit.

 

Setting this model to a range of Graphics Clocks with an upper limit closer to the Peak at 1250 will result in improved efficiency.

nvidia-smi -i <GPU ID> -lgc 0,1440

 

Examining the Power-Graphics Clock curve for the Ampere GPU tested (RTX 3070ti) we see a similar exponential curve:

3070ti_Clks_Clocks_vs_Power.jpg.b4e6b2bb012b7d3645e7e659eed3f4d7.jpg

But here NVIDIA or the AIB has set the lower Power-Limit well below the Peak Efficiency at 1450MHz so either a forced Graphics clock at the Peak:

nvidia-smi -i <GPU ID> -lgc 0,1440

or a Power-Limit:

nvidia-smi -i <GPU ID> -pl 150

could be used to operate the GPU more efficiently.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 months later...

Just picked up a EVGA RTX 3080 FTW Ultra Gaming to add to the fold. I'm running it in my Windows daily driver so I used HWinfo64 to record the average GPU Power Consumption at a set Graphics Clock.

 

Similar to the Asus TUF RTX 3070ti it has a peak efficiency around 1400MHz.

 

Here's the efficiency running p18601:

3080_Gclock_Eff.jpg.1c85976423fbe3a95cbf59e9f9760f55.jpg

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 7 months later...

I picked up a Zotac Trinity 4070ti on sale here for $100Cdn off retail and am impressed so far with the efficiency of Ada, if not the value.

 

This GPU when running at the Graphics Clock where it's most efficient (~2205MHz) barely spins the fans at all running them under 35%. At lower clocks it cycles between off and 30% as the temperatures are around the 60C zero-RPM cut-off threshold.

 

Running it at Stock it's about 40-50% more efficient than my 3080.

 

Again, we see that the peak efficiency occurs very close to the GPUs rated Base Clock:

GPU Peak Eff.    Base Clk
1660ti 1350 1500
2060 1350 1365
2060 S 1350 1470
2070 S 1260 1605
3070 ti 1450 1375
3080 1400 1440
4070 ti 2250 2310

This is looking like a good rule of thumb would be:

 

A Turing, Ampere or Ada GPU will run most efficiently when the Graphics Clock is limited to the Base Clock Value.

 

This also tells me that NVidia, like with Ampere, is pushing these GPUs closer to their limits and have made a significant increase in Clock Speed possible with the new Process Node and Architecture.

 

4070ti running p16576:

4070ti_p16576_Eff_vs_Gclk.jpg.acc5492ef1a947a22fd607a38f4e12ed.jpg

Observed Peak Efficiency: 2250MHz; Base Clock: 2310MHz

 

3080 running p18601:

3080_GclkEff_p18601.jpg.a8e59751209198b0f9e8b93ab80bd55f.jpg

Observed Peak Efficiency: 1400MHz; Base Clock: 1440MHz

 

3070ti running p18202:

3070ti_ClksEfficiencies.jpg.d8fd7cf81652495e0c14261728239d35.jpg

Observed Peak Peak: 1450MHz; Base Clock: 1375MHz

 

2070 Super running p18202:

2070s_p18202_Eff_vs_Gclk.jpg.6415643021c2c1055d01b835e5f5c365.jpg

Observed Peak Efficiency: 1260MHz; Base Clock: 1605MHz

 

2060 Super running p18202:

2060s_Gclock_Eff_p18202.jpg.aded735383e87bfb273bb14d8b28a254.jpg

Observed Peak Efficiency: 1350MHz; Base Clock: 1470MHz

 

2060 running p18202:

2060_Gclock_Eff_p18202.jpg.23e1bd0c2dab031c7b97e6fbc8c37de2.jpg

Observed Peak Efficiency: 1350MHz; Base Clock: 1365MHz

 

1660ti running p18202:

1660ti_Eff_vs_Clock.jpg.7cba973fbcccd9d62ff5d19565a47253.jpg

Observed Peak Efficiency: 1350MHz; Base Clock: 1500MHz

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

I added a 2nd Zotac Trinity 4070ti to the Fold which behaves similarly to the first. I don't know if this card would be well suited for gaming as the cooler seems undersized at the maximum TDP but for efficient folding it works well and at just over 2-slots in width it works well in Dual GPU systems.

 

I currently have the 4070tis in Fractal Meshify 2 cases along with a 3080 in each and clock limited they run just fine sharing a Corsair 750W power supply the total system draw on both being 375-425W while folding.

 

Intrigued by the 4000-series (Ada) efficiencies, if not their price, I've sold off most of my 1000-series (Pascal) and 2000-series (Turing) GPUs and bought an Asus TUF 4070 and a Gigabyte Eagle 4060ti.

 

The 4060Ti behaves like the 4070Tis and shows an efficiency peak at about 2205MHz (p16571):

4060ti_p16571.jpg.77f917114bf37fd5f25949155e5e9a7e.jpg

 

But the 4070 seems to be a bit of an odd-ball and shows an efficiency peak at about 2400MHz over two separate projects.

 

p18449:

4070_Eff_p18449.jpg.1a2c9140d64bab198fba813824c8a684.jpg

 

and p18917:

4070_Eff_p18917.jpg.1b07af964e49aa961aca0c1faa0ed058.jpg

 

While the efficiency difference is slight between the 2205MHz the rest of the 4000-series cards operate best at it is noticeable.

 

And again we see in these plots that the efficiency increases almost linearly from clocks below the peak point and decreases exponentially at clocks above the peak. Which makes perfect sense looking at the Power-Frequency curve:

4070_Pwr_vs_GClk_p18917.jpg.36239db6597e854a8db26a64c0bbbdea.jpg

 

Considered on price and gaming performance while the 4070ti, 4070 and 4060tis may not be good values for gaming the efficiency gains NVIDIA has achieved moving from the 8nm Samsung to the 5nm TSMC Node are impressive. We see over a 50-100% increase in efficiency compared to the 3000-series due to the combination of higher base clocks and lower power consumption.

 

Assuming NVIDIA doesn't "blink" and bow to lackluster demand and stiff competition and lower their pricing on Ada it looks like the soon to be released 4060 may be the best value in the lineup so far for efficient Folding.

 

Fortunately the features missing from mid-range Ada GPUs that reviewers have noted that may impact gaming, namely low amounts of VRAM and small memory bus width and gimped PCIe Bus Connectivity appear to have little to no effect on these GPUs to be well suited to and efficient for Folding.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 8 months later...

With the refresh to the 4000-series I picked up a couple of MSI 4070ti Super GPUs, one a tri-axial Gaming X Slim and the other a dual-axial fan Ventus 2x. Both of these cards are slightly under 2.5 slots in width and thus are suitable for running in a dual GPU rig with 3-slot spacing if one applies clock or power limits.

 

Now these two "4070s," unlike the older 4070ti which sat at the top of the AD104 stack, sit at the bottom of the AD103 Stack (4070 ti Super; 4080, 4080 Super).

 

The implication of this is, unlike the 4070 ti which would have been the best binned silicon in the AD104 stack, the 4070 ti Supers are likely the worst binned silicon in in the AD103 stack so we should expect them to be less efficient that the 4080 or 4080 Supers like we observed with the 4070 where it, and the lower models in AD104 stack showed less efficiency compared to the 4070ti.

 

Here's the Efficiency vs. Clock Speed plots for both GPUs:

4070ts_0.jpg.aa099087f98e0b8e3e08945a60a32e71.jpg

4070ts_1.jpg.b0cc26fed755aa0968299e745a525c39.jpg

 

Both these appear to have a peak efficiency around a clock speed of 2400MHz.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Next up is the 4080 Super. I picked up a couple of Gigabyte RTX 4080 Super Gaming OC GPUs. As expected, these being at the top of the AD103 Stack they are most efficient at 2205MHz close to their 2295MHz Base Clock.

4080s_Eff_p12219.jpg.b054a87b2bf76555df201285618d7ad9.jpg

Running at 2730MHz, close to their Default Power Limit setting they consume about 275W and yield about 21MPPD. Moving to the efficiency peak at 2205MHz they consume about 167W and produce 17.6MPPD a decrease in Yield of 16.5% but a 39% decrease in Power.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×