Jump to content

LTT Official Folding Month VI

Go to solution Solved by GOTSpectrum,

 

Message added by TVwazhere,

Daily point updates are posted here:

5 minutes ago, Alex Atkin UK said:

Not quite, the problem earlier is the WUs were scoring the same as the 4070 Ti

 

Ok, I thought it was a continuation of the shortage of larger Wu's from months ago, I didn't know we started to get more of them as I stopped using my 4090 months ago when it was only yielding 8-14 mil ppd.

 

As time goes on and more powerful GPUs make there way into folding machines, I wondering if we will get more shortages in the future.

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Shlouski @Alex Atkin UK

Running two WU per GPU may actually result in lower PPD even if computationally and occupancy/utilization is better.

 

Quote

The Folding@home software on your computer calculates Total Points as follows:

final_points = base_points * max(1, sqrt( k * deadline_length / elapsed_time))

 

Quote

PPD = 14.4 * base_points * max(1, sqrt( 14.4 * k * Expiration / TPF)) / TPF

https://foldingathome.org/support/faq/points/?lng=en

 

As you can see the amount of points awarded is a function of how fast you can complete it so if running two WU per job results in WUs taking longer you have to make sure that the PPD isn't actually lower.

 

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, leadeater said:

@Shlouski @Alex Atkin UK

Running two WU per GPU may actually result in lower PPD even if computationally and occupancy/utilization is better.

 

 

https://foldingathome.org/support/faq/points/?lng=en

 

As you can see the amount of points awarded is a function of how fast you can complete it so if running two WU per job results in WUs taking longer you have to make sure that the PPD isn't actually lower.

 

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

As I understand it you can't "technically" run two WUs on the same GPU (on NVIDIA anyway), CUDA jobs run as FIFO.

 

The issue of running two concurrent WUs scoring worse than one applies only to CPU jobs where often only a few cores are used but you'll sacrifice credits if you try to split the CPU up into more slots.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Alex Atkin UK said:

As I understand it you can't "technically" run two WUs on the same GPU, CUDA jobs run as FIFO.

I mean with vGPU (Grid), since that was the thought process. F@H doesn't allow multiple jobs per GPU while BOINC does and so does CUDA. But it's down to how your CUDA application is made etc. vGPU gets around that, but then introduces the above potential issue, more WUs per 24 hours doesn't mean more points always.

 

I could chop an A40 in to 4 vGPUs and run F@H on each vGPU instance but could get less PPD with that configuration compared to 2 vGPU or just using the full GPU.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

I mean with vGPU (Grid), since that was the thought process. F@H doesn't allow multiple jobs per GPU while BOINC does and so does CUDA. But it's down to how your CUDA application is made etc.

There's quite a lot of discussion on this on the NVIDIA forums suggesting it does not, so I'm not sure what BOINC is doing.  I wonder if there is some sort of cheat where you can submit two different jobs as if they are one?  Presumably you could have multiple different data sets managed by a single job?

 

Its way beyond my level of understanding.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, Alex Atkin UK said:

There's quite a lot of discussion on this on the NVIDIA forums suggesting it does not, so I'm not sure what BOINC is doing.  I wonder if there is some sort of cheat where you can submit two different jobs as if they are one?  Presumably you could have multiple different data sets managed by a single job?

 

Its way beyond my level of understanding.

https://docs.nvidia.com/deploy/mps/index.html

 

It's been possible since Kelper but how and the technicalities behind it have gotten better through successive archecture. As long as you have RTX 20 series or newer there's essentially no real caveats to getting multiple CUDA applications running on a single GPU. 

 

Even without MPS the restrcitions simply don't matter in reality.

Quote

Both applications can run at the same time, however the kernels will be serialized. This assumes that the 2 applications memory and resource usage combined will fit on the same GPU.

 

Unless you use MPS, CUDA will not run kernels concurrently from different applications.

 

Even within a single process/application, however, kernel concurrency is rare/hard to witness. So this “serialization” might not make much difference.

https://forums.developer.nvidia.com/t/gpu-sharing-among-different-application-with-different-cuda-context/53057

 

But this doesn't apply to vGPU. vGPU are real hardware instances with their own hardware queues.

Edited by leadeater
Link to comment
Share on other sites

Link to post
Share on other sites

46 minutes ago, leadeater said:

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

 

For the vast majority of the time my 4090 was stuck using less than 50% cuda utilisation, so I hoped two uw's might bring that closer to 100%.

 

Using your example, the hope was to do 2 Wu's in 1 hour instead of just 1. 

 

I'm not looking to run more Wu's on a GPUs that is already saturated, I looking to saturate a GPU that was using less than half of its potential.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Shlouski said:

Using your example, the hope was to do 2 Wu's in 1 hour instead of just 1. 

Yep, but what may happen is you will get 100% utilization and the per task time might change from 1 hour to 1.2 hours. Just a note to do the math and make sure actually better 🙂

 

It probably is with such a small time difference while doing more WUs in similar period of time. It would be really interesting to actually try it and see. I'm pretty sure you can get it working on the RTX 4090, ask @Windows7ge

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, leadeater said:

I could chop an A40 in to 4 vGPUs and run F@H on each vGPU instance

 

If I could do this on my 4090 but just chop it into two, that would be great 😃.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

Yep, but what may happen is you will get 100% utilization and the per task time might change from 1 hour to 1.2 hours. Just a note to do the math and make sure actually better 🙂

 

This had crossed my mind.

 

It's also possible that it ends up working on two larger Wu's each one able to saturated the card on its own or one large and one small, in both circumstances it might end up taking much longer than usual. This might be mitigated to an extent by working on deseases which have generally smaller Wu's. 

 

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them 😄.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Shlouski said:

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them 😄.

But the competition 🙃

 

Outside of something like a 'Folding Month' I'd far rather see 100% utilization all the time even if lower PPD.

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, Schnoz said:

Also, for performance and stability reasons, I absolutely recommend LTSC

🤫 😉 😉 *cough*

 

1 hour ago, Schnoz said:

My uptime record is 49 days, and I only had to shut it down cuz I had to move my PC lol.

I don't know how long mine is but I never turn my PC off ever and wasn't bothering with allowing updates so probably a good 180+ days at least lol

Link to comment
Share on other sites

Link to post
Share on other sites

20 hours ago, Alex Atkin UK said:

Watch it be bright sunshine come 1st November!

Nov 1 is literally the day that the unseasonably warm weather breaks here, and I've got a roommate doing the "it's so cold I'm shivering" for a 65°F (18C) room while my furnace of a body is finally comfortable. Time to get folding.

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, Alex Atkin UK said:

The issue of running two concurrent WUs scoring worse than one applies only to CPU jobs where often only a few cores are used but you'll sacrifice credits if you try to split the CPU up into more slots.

Actually...I find that if you use only the core count, rather than the vcore count (hyperthreading), and run multiple instances that are either per chip or per chiplet, you tend to get MUCH better CPU results than if you just tell it to "use it all".  For example, an old Xeon system, the difference between physical core count and hyperthreading is almost 50%....better points by not utilizing the hyperthreading.  It is similar on even the AMD 5800g I run in my main linux machine.  Keeping it down to just the core count is about a 20% increase over using hyperthreaded core count, and splitting it into 4x2 instead of 8x1 nets me about another 5% on average.

 

So, back to the GPU part, which I have no experience on, I would guess that things like the multi-die or multi-chiplet pro cards would likely function better split up in a similar way, if the VGPUs can be set to only use a specific block of them, rather than just everything randomly sharing it...which would be worse than just letting it try to use it all.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Shlouski said:

 

This had crossed my mind.

 

It's also possible that it ends up working on two larger Wu's each one able to saturated the card on its own or one large and one small, in both circumstances it might end up taking much longer than usual. This might be mitigated to an extent by working on deseases which have generally smaller Wu's. 

 

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them 😄.

How are you reading CUDA usage?  Because overall my card is showing 85-96% GPU usage.  The biggest loss I think is from CPU overhead every time it moves data to/from the GPU, any amount of extra CPU usage seems to slow WU completion even when there are plenty of idle CPU cores.

 

Linux seems to consistently be more efficient at feeding the GPU than Windows, unfortunately both my 4090s are on Windows as one is my gaming rig and the other my AI rig.

 

I also just discovered nvtop is a thing to monitor usage.

 

Just now, justpoet said:

Actually...I find that if you use only the core count, rather than the vcore count (hyperthreading), and run multiple instances that are either per chip or per chiplet, you tend to get MUCH better CPU results than if you just tell it to "use it all".

Yes, running a job across chiplets is always hindered by the latency penalty but I still find that using just one chiplet performs better than trying to utilise both for different jobs, and any CPU job slows down the GPU job.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Alex Atkin UK said:

Yes, running a job across chiplets is always hindered by the latency penalty but I still find that using just one chiplet performs better than trying to utilise both for different jobs, and any CPU job slows down the GPU job.

I run linux, so it may just be that the scheduler is way better at keeping things to chiplets when the thread count matches.  I also don't have GPUs to fold on, so I don't have to worry about that side of things.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, justpoet said:

I run linux, so it may just be that the scheduler is way better at keeping things to chiplets when the thread count matches.  I also don't have GPUs to fold on, so I don't have to worry about that side of things.

Sounds likely, things improved a lot with Linux in general.

 

For example I never used to be able Fold on my Linux desktop on the GPU as it would cause the UI to become laggy, this no longer happens.

 

I've never ran Linux on the 5950X so its conceivable it would perform better.  Just checking on the 5600, it seems like running a CPU WU may have far less impact on the GPU ones than it does on Intel CPUs, though could also be because I'm not running a desktop on that machine.

 

Its both fascinating and perplexing trying to figure out where the bottlenecks are with folding.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

15 hours ago, Shlouski said:

If I could do this on my 4090 but just chop it into two, that would be great 😃.

It might be possible, with more than one approach you could do to make it work as well. The big question is is this GPU part of and needed in your desktop or is it in a dedicated box where you can run a software stack that isn't Windows.

Link to comment
Share on other sites

Link to post
Share on other sites

image.thumb.png.cbb2be96467a8611f17b3906d02efc07.png

 

Foldbox updated.  Now if I could just control those darn GPU LEDs in Linux, OpenRGB doesn't see them.  Coolers RGB also disconnected as I misplaced the cable.

Had to limit the 4070 Ti to 170W to keep the fan slow enough to not resonate at an annoying pitch, but it doesn't seem to have much impact on WU completion time, or the R5 5600 compensates over the 9900K due its faster single-thread performance.

 

image.thumb.png.609aa9e2a357428c47439440d0d56818.png

 

CPU cooling is massively overkill given the AS500 was previously on a 5950X, so its not even hitting 50C despite the heat from those GPUs.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Alex Atkin UK said:

Now if I could just control those darn GPU LEDs in Linux

May I introduce you to electrical tape? With this ingenious invention there's no more LEDs. Or if you're one of the operators here at work there's no more CEL in your truck, warning light, high beam light or even gauge cluster in the dashboard.

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

just got my first 10 WU done! im ready for the folding month!

-- When you have more than what you need, build a bigger table, not a bigger fence --
 

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, IkeaGnome said:

May I introduce you to electrical tape? With this ingenious invention there's no more LEDs. Or if you're one of the operators here at work there's no more CEL in your truck, warning light, high beam light or even gauge cluster in the dashboard.

I hate electrical tape, the number of places I have sticky goop from old electrical tape.

The 3080 the LEDs appear to be plugged in at the top of the card so I could unplug them, just don't know which connector is fans and which LEDs so didn't do so as it would be a pig to plug back in if I got the wrong one.

 

I'm surprised the 3080 doesn't run hotter, it may have easier access to air but I also have the PSU pulling against its first fan, as being on the floor the filter got dirty too quickly and its easier to just clean the front filter.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Alex Atkin UK said:

The 3080 the LEDs appear to be plugged in at the top of the card so I could unplug them, just don't know which connector is fans and which LEDs so didn't do so as it would be a pig to plug back in if I got the wrong one.

Food for thought. There's a third connector. They use two 4 pin headers for the fans, which are the top ones you can see, then on the right side of the PCB just under the 8 pins there's a little white 5 pin connector. It takes 2 wires and turns that into your LED and control.

Spoiler

front_full.jpg

image.thumb.png.af031ff2d67c912dfad397cffc118e75.png

image.png.df868bfc96dcfcdf46dd609167b13c90.png

 

Here's how they look with the cooler off. Fans are red, and the led is yellow.

Spoiler

image.png.d6d947dd345f73c9d00ed73af6de0fd2.png

That's at least how it looks to me.

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

47 minutes ago, IkeaGnome said:

Food for thought. There's a third connector. They use two 4 pin headers for the fans, which are the top ones you can see, then on the right side of the PCB just under the 8 pins there's a little white 5 pin connector. It takes 2 wires and turns that into your LED and control.

Figures it wouldn't be so easy.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.


×