Jump to content

Folding@Home occasionally owning all ram (linux)

I've got a headless ubuntu server with 7 nvidia gpus. It's normally for mining but I switched it over to folding last week. Twice now it's got to a point where I can't ssh in. One time I managed to get in to find load average of 24+ and got to a point where any command executed resulted in an unable to allocate memory error. 

 

Any idea what could be causing this or how to diagnose? It's a very reliable rig. Been running for years mining without a hiccup. Just keeps tripping on folding for some reason. 

Link to comment
Share on other sites

Link to post
Share on other sites

Well folding is a completely different thing than mining. Maybe your graphics cards are bandwidth limited (i guess you are running pcie x1?) and then they shovel everything to system memory until it overloads. 

Just a guess tho. 

Gaming HTPC:

R5 5600X - Cryorig C7 - Asus ROG B350-i - EVGA RTX2060KO - 16gb G.Skill Ripjaws V 3333mhz - Corsair SF450 - 500gb 960 EVO - LianLi TU100B


Desktop PC:
R9 3900X - Peerless Assassin 120 SE - Asus Prime X570 Pro - Powercolor 7900XT - 32gb LPX 3200mhz - Corsair SF750 Platinum - 1TB WD SN850X - CoolerMaster NR200 White - Gigabyte M27Q-SA - Corsair K70 Rapidfire - Logitech MX518 Legendary - HyperXCloud Alpha wireless


Boss-NAS [Build Log]:
R5 2400G - Noctua NH-D14 - Asus Prime X370-Pro - 16gb G.Skill Aegis 3000mhz - Seasonic Focus Platinum 550W - Fractal Design R5 - 
250gb 970 Evo (OS) - 2x500gb 860 Evo (Raid0) - 6x4TB WD Red (RaidZ2)

Synology-NAS:
DS920+
2x4TB Ironwolf - 1x18TB Seagate Exos X20

 

Audio Gear:

Hifiman HE-400i - Kennerton Magister - Beyerdynamic DT880 250Ohm - AKG K7XX - Fostex TH-X00 - O2 Amp/DAC Combo - 
Klipsch RP280F - Klipsch RP160M - Klipsch RP440C - Yamaha RX-V479

 

Reviews and Stuff:

GTX 780 DCU2 // 8600GTS // Hifiman HE-400i // Kennerton Magister
Folding all the Proteins! // Boincerino

Useful Links:
Do you need an AMP/DAC? // Recommended Audio Gear // PSU Tier List 

Link to comment
Share on other sites

Link to post
Share on other sites

What are the specs of the machine?

Link to comment
Share on other sites

Link to post
Share on other sites

I have a 6 GPU rig also running headless on Ubuntu 18.04.4.

 

I had to up the RAM to 32GB as some of the newer OpenMM22 Work Units are consuming 1.5GB each and I've recently seen some people reporting some units north of 2GB.

 

Make sure you have 1 thread free per GPU to keep it fed and 1 thread for the OS and you should be able to allocate the remaining threads for CPU folding.

 

If your using PCIe3 x1 risers I'd be interested in seeing what your PPD performance per card is to see what, if any, degradation in performance your seeing.

 

I'm running with m.2 to PCIe risers so I've got x4 to all the slots and in testing only saw degradaion in performance under Windows on a RTX 2080 Super. Under Linux it seemed OK

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

4G of ram. Some celeron processor. Six PCIe risers except 1 M.2 riser. Six 1070s, 1 1070ti. 

Mining takes next to no cpu/ram. That's why it's those specs are minimal. I guess folding is much heavier on the CPU/RAM. 

The machine gets owned but if I leave it it, it comes out of it. I guess worst case, I just let it be. 

 

Here's the top output when it's "ok". No work for the 7th GPU here.

 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4188 fahclie+  39  19 33.290g 432212 110384 R  33.7 11.0  25:27.12 FahCore_22
 3622 fahclie+  39  19 33.291g 363784 105752 R  33.3  9.3  29:51.17 FahCore_22
 3629 fahclie+  39  19 33.290g 361252 106436 R  33.3  9.2  29:34.52 FahCore_22
 4609 fahclie+  39  19 33.620g 661472 118964 R  33.3 16.8  19:33.65 FahCore_22
 3636 fahclie+  39  19 33.290g 360892 105028 R  33.0  9.2  30:17.78 FahCore_22
 3649 fahclie+  39  19 33.305g 293240 105080 R  33.0  7.5  29:46.66 FahCore_22
 1538 root      20   0       0      0      0 S   0.3  0.0   0:27.85 nv_queue
 1574 root     -51   0       0      0      0 S   0.3  0.0   0:15.98 irq/130-nvidia
 3601 fahclie+  20   0 58.166g 434956   2324 S   0.3 11.1   0:40.10 FAHClient
    1 root      20   0  119660   3404   2212 S   0.0  0.1   0:04.74 systemd

 

That's 6 GPUs eating 33% CPU each. hahah. 

I'm brand new to folding. How do I find out PPD perf?

Link to comment
Share on other sites

Link to post
Share on other sites

With such low CPU specs, would it be better to cut down how many GPUs I use? I think I only got a few hundred thousand points yesterday from 5 WU. 

FAHControl estimates 2m per day.

Link to comment
Share on other sites

Link to post
Share on other sites

Most definitely not enough ram. For that machine I would use minimum 16 GB but more likely 32 or 64GB.
Most likely not enough cpu to feed the gpus work.

 

Do you have any ram you can add?

Link to comment
Share on other sites

Link to post
Share on other sites

Can you provide a screenshot of htop?

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, jo___l said:

Can you provide a screenshot of htop?

TIL htop. 

Definitely don't have that kind of ram laying around. Think CPU would still be too big of a bottleneck anyways?

 

How many GPUs do you think that's enough ram for. I should at least cut it down to that. Problem is I don't know how to gauge if folding is working alright. Is there a metric for folding performance or utility I should be looking at?

 

Capture.PNG

Link to comment
Share on other sites

Link to post
Share on other sites

Count on 1GB RAM per GPU.

 

Start with 1 GPU. If that works fine add another one. Repeat until failure.

 

To get information about your CPU run cat /proc/cpuinfo.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, slick613 said:

TIL htop. 

Definitely don't have that kind of ram laying around. Think CPU would still be too big of a bottleneck anyways?

 

How many GPUs do you think that's enough ram for. I should at least cut it down to that. Problem is I don't know how to gauge if folding is working alright. Is there a metric for folding performance or utility I should be looking at?

 

Capture.PNG

Yeah, if that's only a 2c/2t processor then it is going to be massively CPU bottle-necked as each NVidia GPU tries to exclusively lock one thread. Also I see all you RAM is consumed and all your swap so your disk is thrashing trying to task switch and I see your system load at 6.76! You normally want to see this under 1 on an interactive system.

 

It's pretty impressive that it is still running.

 

With a 2c/2t processor I'd restrict it to just 2 GPUs and you'll likely generate more points. Give that a try and get some stable results then try adding an additional GPU slot and see what happens.

 

FAHClient --send-command ppd

should show you the PPD

FAHClient --send-command slot-info

should show you the slots configured

FAHClient --send-command queue-info

should show you the PPD per slot

 

 

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Excellent info here. Thanks!

Folding is very different than mining. I guess there's a lot of work for the CPU to move data in and out of the GPU. 

Seems to be surviving 3 GPUs without maxing out ram, but the CPU is pinned. Going to bring it down to 2 and put the rest back to their mining work. 

Too bad. I was hoping to point all 7 at this but a CPU upgrade isn't in the cards. 

 

Thanks for all the help. I learned a bunch.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×