Using Old Hardware for Folding

Gorgon · September 8, 2018

After watching a shout-out on the WAN Show a few months ago I became interested in using some spare parts for Folding@Home.

I haven't gamed since Team Fortress was released and so I used the kid's gaming system (i5 e/w Gigabyte GTX1060 on Windows 10) initially to get a feel for how the software works. My primary desktop is a e31231v3 Xenon with 16GB ECC with a AMD FirePro WX4100 and only produced about 50,000 Points per Day (PPD) so I decided to see what I could do to get a dedicated Folding System up and running at minimal cost.

I had the kids old retired Acer Desktop lying around (a Future Shop open-box special purchased in 2010 that ran Win7 Home) and acquired a EVGA Short Board GTX 1060 3GB try things out.

This worked well but I wanted to find the sweet spot in terms of power efficiency and PPD.

Objective:

Determine the performance and efficiency at folding of a system at various power limits. The Power Limit of the GPU was adjusted from it's minimum and maximum values in 5W increments and at each step the Total system power was measured using a Power Meter and work done recorded while running a synthetic load for 10 minutes.

In regular operation folding it was observed that the GPU temperature reached over 90% of it's final long-term temperature within the first 10 minutes.

The hardware consisted of:

Acer iTX AX3400 Motherboard, nForce MCP78PV Chipset and a PCIe 2.0 x16 Slot
AMD Athlon 64 XII 220 2.8GHz CPU
3GB DDR2 1333 RAM (2GB, 1GB)
EVGA GTX1060 03G-P4-6162-KR
SanDisk 128GB SSD
Super Micro SC-731B Micro ATX Mid Tower
SuperMicro 300W Bronze PS
9cm, 3200RPM Exhaust & 8cm, 1200RPM Supply fans
An UPM EM100 Power Meter was installed between the PC and power source.

Ubuntu Desktop 18.04 LTS ("Bionic") was installed on the system with nVidia Linux driver version 390.59, FAH Client 7.4.16 (AMD64) and FAHBench 2.3.2.

With no video load the system was observed to draw 30W at the wall but the GPU reported 5.5W at idle so the load for the motherboard, CPU, ram, SSD, CPU & Case Fans as well as the power loss in the power supply was approximately 25W.

Persistence Mode was enabled on the Video Card:

nvidia-smi -i 0 -pm 1

A typical current Work Unit (Project 11713) was copied to ./share/fahbench/workunits/wu-11713 in the directory where FAHBench was installed and a wu.json file was created in the directory where the Work Unit was copied:

{
    "codename": "wu-11713",
    "projnum": 0,
    "protein": {
        "name": "histone methyltransferase SETD8",
        "description": "in the context of its cancer mutations"
    },
    "step_chunk": 40
}

The Power Limit was initially set to 60%:

set power limit: nvidia-smi -i 0 --power-limit=130

In another Terminal Window we ran:

nvidia-smi -i 0 -l 1 --format=csv,noheader --query-gpu=temperature.gpu,power.draw,clocks.current.sm,fan.speed

to measure the GPU temperature, power draw, Graphics Clock Speed and Fan percent of rotational maximum speed.

from ./bin directory where FAHBench was installed we ran using the sample work unit for 10 minutes (600 seconds):

./FAHBench-cmd -w wu-11713 --run-length 600

Between 75 and 80% through the run the Total Power was recorded from the Wattmeter and the minimum and maximum most common GPU Graphics Clock Speed, the GPU Temperature and the Fan percentage were recorded.

After a run completed the power limit was raised to the next increment and the next run started and the scores from the previous run were then recorded.

Results:

Limit  Min  Max  T Fan   Score   Scaled Atoms Pwr Sys   Eff. Change        
  (W)   (MHz)  (°C) %   (ns/d)   (ns/d)       (W) (W)          (%)
60    1493 1506 62  39 54.9088  85.3326 35206 113  53 0.7552  -0.68
65    1556 1569 63  42 57.7318  89.7197 35206 118  53 0.7603   0.69
70    1594 1607 65  48 59.6595  92.7155 35206 125  55 0.7417  -2.45
75    1657 1670 66  52 61.1164  94.9796 35206 131  56 0.7250  -2.25
80    1695 1708 67  56 62.6183  97.3136 35206 135  55 0.7208  -0.58
85    1733 1746 68  59 63.6303  98.8864 35206 141  56 0.7013  -2.71
90    1771 1784 69  61 64.7777 100.6696 35206 147  57 0.6848  -2.35
95    1809 1822 70  64 65.9755 102.5311 35206 154  59 0.6658  -2.78
100   1822 1835 71  67 66.7737 103.7716 35206 160  60 0.6486  -2.59
105   1847 1860 72  70 67.5552 104.9860 35206 166  61 0.6324  -2.49
110   1885 1898 73  74 68.1087 105.8463 35206 172  62 0.6154  -2.70
115   1885 1898 74  77 68.5056 106.4630 35206 179  64 0.5948  -3.35
120   1898 1911 75  80 69.3210 107.7313 35206 186  66 0.5792  -2.62
125   1923 1936 76  83 69.4180 107.8810 35206 191  66 0.5648  -2.48
130   1936 1936 76  84 70.4918 109.5498 35206 197  67 0.5561  -1.55
135   1936 1936 76  84 70.4759 109.5250 35206 197  62 0.5560  -0.02
140   1936 1936 77  84 70.4509 109.4862 35206 197  57 0.5558  -0.04

Observations:

The Peak Efficiency occurred in this configuration at a 65W Power Limit and decreased at an increasing rate until the 130W Power Limit. At the 130W Power limit and after the GPU Clock stayed at 1936MHz and the GPU Temperature, FAN Speed and Total System Power Draw remained constant indicating the GPU was at it's power capacity. The change in slope between the 70W and 80W limits may be within the margin of error or may be a function of the Nvidia GPU Boost 3 algorithms.

The Peak Performance likewise increased at a decreasing rate as the Power Limit was adjusted with a maximum of 109.55 ns/day observed at the 130W Power Limit.

The Nvidia GPU Boost 3 algorithm is reputed to start mildly down-clocking the GPU Boost at 60°C and more aggressively as the temperature exceeds 70°C, The GPU under test was a single rotary fan model and is noticeably loud above 60% which was reached at the 90% Power Limit. Removing the side of the case at a 126W Power Limit reduced the GPU temperature from 81°C to 67°C providing a clear indication that the chassis had insufficient airflow.

The chassis was replaced with a Fractal Design Define R4 with 2 Fractal Design Silent R2 140mm front and 1 Noctua NF-A14 PWM bottom supply fans and a single Noctua NF-S12A PWM 120mm Rear exhaust fan and the top fan mount positions uncovered. The power supply was replaced with a Corsair CS-650M Gold Efficiency Power Supply. The tests were re-run in this configuration.

Limit  Min  Max  T Fan   Score   Scaled Atoms Pwr Sys  Eff.  Change
  (W)   (MHz)  (°C) %   (ns/d)   (ns/d)       (W) (W)          (%)
60    1493 1506 58  27 55.7028  86.5664 35206 108  48 0.8015   0.56
65    1556 1569 59  30 57.9560  90.0682 35206 113  48 0.7971  -0.56
70    1620 1632 60  32 60.1952  93.5479 35206 120  50 0.7796  -2.20
75    1657 1670 60  34 61.5631  95.6739 35206 125  50 0.7654  -1.82
80    1695 1708 61  37 62.9723  97.8639 35206 131  51 0.7471  -2.40
85    1746 1759 62  39 64.3032  99.9322 35206 137  52 0.7294  -2.36
90    1784 1797 62  41 65.5446 101.8615 35206 142  52 0.7173  -1.66
95    1822 1835 63  44 66.3094 103.0500 35206 146  51 0.7058  -1.60
100   1835 1847 64  46 67.0713 104.2340 35206 152  52 0.6858  -2.84
105   1860 1873 65  49 67.9183 105.5502 35206 156  51 0.6766  -1.33
110   1885 1898 65  49 68.6574 106.6990 35206 161  51 0.6627  -2.05
115   1911 1923 66  51 69.4163 107.8783 35206 166  51 0.6499  -1.94
120   1923 1936 66  53 69.5664 108.1115 35206 171  51 0.6322  -2.71
125   1936 1949 67  55 70.3950 109.3993 35206 176  51 0.6216  -1.68
130   1949 1961 68  57 71.0101 110.3552 35206 181  51 0.6097  -1.91
135   1949 1949 67  57 70.9053 110.1924 35206 178  43 0.6191   1.54
140   1949 1949 67  57 70.8737 110.1432 35206 178  38 0.6188  -0.04

Observations:

The Peak Efficiency again occurred at the 130W Power Limit but the observed clock speed was one step higher and the Scaled FAHBench score increased to 110.36 ns/day.

The GPU temperature dropped 10°C at high Power Limits with a corresponding decreased in GPU Fan Speeds and perceived noise.

The benefits of moving from a Bronze (80% efficiency) to a Gold (92% efficiency) power supply are also apparent. The calculated System Overhead Load (Total Power - GPU Power Limit) is much more consistent and at a lower level.

Further Work:
3 Noctua 140mm iPPC 3000 PWM Fans, 1 Noctua 120mm iPPC 3000 PWM Fan and a Noctua NA-FC1 4-pin PWM Fan Controller replaced the existing fans.

I ended up setting the Power Limit to 90W and applied a 125MHz overclock to the GPU. After trying many different adjustments to the GPU Fan Speed I have settled on just using the Automatic fan control.

The system runs at about 60% CPU Utilization showing that even a 8 year-old low end CPU is capable of driving a current mid-range GPU. The GPU runs at 94% utilization and 39% Fan Speed and 62C.