LTT Official Folding Month VI

Shlouski · October 24, 2023

5 minutes ago, Alex Atkin UK said:

Not quite, the problem earlier is the WUs were scoring the same as the 4070 Ti

Ok, I thought it was a continuation of the shortage of larger Wu's from months ago, I didn't know we started to get more of them as I stopped using my 4090 months ago when it was only yielding 8-14 mil ppd.

As time goes on and more powerful GPUs make there way into folding machines, I wondering if we will get more shortages in the future.

leadeater · October 24, 2023

@Shlouski @Alex Atkin UK

Running two WU per GPU may actually result in lower PPD even if computationally and occupancy/utilization is better.

Quote

The Folding@home software on your computer calculates Total Points as follows:

final_points = base_points * max(1, sqrt( k * deadline_length / elapsed_time))

Quote

PPD = 14.4 * base_points * max(1, sqrt( 14.4 * k * Expiration / TPF)) / TPF

https://foldingathome.org/support/faq/points/?lng=en

As you can see the amount of points awarded is a function of how fast you can complete it so if running two WU per job results in WUs taking longer you have to make sure that the PPD isn't actually lower.

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

Alex Atkin UK · October 24, 2023

26 minutes ago, leadeater said:

@Shlouski @Alex Atkin UK

Running two WU per GPU may actually result in lower PPD even if computationally and occupancy/utilization is better.

https://foldingathome.org/support/faq/points/?lng=en

As you can see the amount of points awarded is a function of how fast you can complete it so if running two WU per job results in WUs taking longer you have to make sure that the PPD isn't actually lower.

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

As I understand it you can't "technically" run two WUs on the same GPU (on NVIDIA anyway), CUDA jobs run as FIFO.

The issue of running two concurrent WUs scoring worse than one applies only to CPU jobs where often only a few cores are used but you'll sacrifice credits if you try to split the CPU up into more slots.

leadeater · October 24, 2023

6 minutes ago, Alex Atkin UK said:

As I understand it you can't "technically" run two WUs on the same GPU, CUDA jobs run as FIFO.

I mean with vGPU (Grid), since that was the thought process. F@H doesn't allow multiple jobs per GPU while BOINC does and so does CUDA. But it's down to how your CUDA application is made etc. vGPU gets around that, but then introduces the above potential issue, more WUs per 24 hours doesn't mean more points always.

I could chop an A40 in to 4 vGPUs and run F@H on each vGPU instance but could get less PPD with that configuration compared to 2 vGPU or just using the full GPU.

Alex Atkin UK · October 24, 2023

6 minutes ago, leadeater said:

I mean with vGPU (Grid), since that was the thought process. F@H doesn't allow multiple jobs per GPU while BOINC does and so does CUDA. But it's down to how your CUDA application is made etc.

There's quite a lot of discussion on this on the NVIDIA forums suggesting it does not, so I'm not sure what BOINC is doing. I wonder if there is some sort of cheat where you can submit two different jobs as if they are one? Presumably you could have multiple different data sets managed by a single job?

Its way beyond my level of understanding.

leadeater · October 24, 2023

12 minutes ago, Alex Atkin UK said:

There's quite a lot of discussion on this on the NVIDIA forums suggesting it does not, so I'm not sure what BOINC is doing. I wonder if there is some sort of cheat where you can submit two different jobs as if they are one? Presumably you could have multiple different data sets managed by a single job?

Its way beyond my level of understanding.

https://docs.nvidia.com/deploy/mps/index.html

It's been possible since Kelper but how and the technicalities behind it have gotten better through successive archecture. As long as you have RTX 20 series or newer there's essentially no real caveats to getting multiple CUDA applications running on a single GPU.

Even without MPS the restrcitions simply don't matter in reality.

Quote

Both applications can run at the same time, however the kernels will be serialized. This assumes that the 2 applications memory and resource usage combined will fit on the same GPU.

Unless you use MPS, CUDA will not run kernels concurrently from different applications.

Even within a single process/application, however, kernel concurrency is rare/hard to witness. So this “serialization” might not make much difference.

https://forums.developer.nvidia.com/t/gpu-sharing-among-different-application-with-different-cuda-context/53057

But this doesn't apply to vGPU. vGPU are real hardware instances with their own hardware queues.

Edited October 24, 2023 by leadeater

Shlouski · October 24, 2023

46 minutes ago, leadeater said:

1 WU per hour may be better than 2 WUs per 2 hours if the WU run time for each is 2 hours, same WU/24hr but different PPD (not that I've calculated or know which is better than the two.

For the vast majority of the time my 4090 was stuck using less than 50% cuda utilisation, so I hoped two uw's might bring that closer to 100%.

Using your example, the hope was to do 2 Wu's in 1 hour instead of just 1.

I'm not looking to run more Wu's on a GPUs that is already saturated, I looking to saturate a GPU that was using less than half of its potential.

leadeater · October 24, 2023

3 minutes ago, Shlouski said:

Using your example, the hope was to do 2 Wu's in 1 hour instead of just 1.

Yep, but what may happen is you will get 100% utilization and the per task time might change from 1 hour to 1.2 hours. Just a note to do the math and make sure actually better

It probably is with such a small time difference while doing more WUs in similar period of time. It would be really interesting to actually try it and see. I'm pretty sure you can get it working on the RTX 4090, ask @Windows7ge

Shlouski · October 24, 2023

22 minutes ago, leadeater said:

I could chop an A40 in to 4 vGPUs and run F@H on each vGPU instance

If I could do this on my 4090 but just chop it into two, that would be great .

Shlouski · October 24, 2023

3 minutes ago, leadeater said:

Yep, but what may happen is you will get 100% utilization and the per task time might change from 1 hour to 1.2 hours. Just a note to do the math and make sure actually better

This had crossed my mind.

It's also possible that it ends up working on two larger Wu's each one able to saturated the card on its own or one large and one small, in both circumstances it might end up taking much longer than usual. This might be mitigated to an extent by working on deseases which have generally smaller Wu's.

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them .

leadeater · October 24, 2023

10 minutes ago, Shlouski said:

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them .

But the competition

Outside of something like a 'Folding Month' I'd far rather see 100% utilization all the time even if lower PPD.

leadeater · October 24, 2023

59 minutes ago, Schnoz said:

Also, for performance and stability reasons, I absolutely recommend LTSC

*cough*

1 hour ago, Schnoz said:

My uptime record is 49 days, and I only had to shut it down cuz I had to move my PC lol.

I don't know how long mine is but I never turn my PC off ever and wasn't bothering with allowing updates so probably a good 180+ days at least lol

BiotechBen · October 24, 2023

20 hours ago, Alex Atkin UK said:

Watch it be bright sunshine come 1st November!

Nov 1 is literally the day that the unseasonably warm weather breaks here, and I've got a roommate doing the "it's so cold I'm shivering" for a 65°F (18C) room while my furnace of a body is finally comfortable. Time to get folding.

justpoet · October 24, 2023

11 hours ago, Alex Atkin UK said:

The issue of running two concurrent WUs scoring worse than one applies only to CPU jobs where often only a few cores are used but you'll sacrifice credits if you try to split the CPU up into more slots.

Actually...I find that if you use only the core count, rather than the vcore count (hyperthreading), and run multiple instances that are either per chip or per chiplet, you tend to get MUCH better CPU results than if you just tell it to "use it all". For example, an old Xeon system, the difference between physical core count and hyperthreading is almost 50%....better points by not utilizing the hyperthreading. It is similar on even the AMD 5800g I run in my main linux machine. Keeping it down to just the core count is about a 20% increase over using hyperthreaded core count, and splitting it into 4x2 instead of 8x1 nets me about another 5% on average.

So, back to the GPU part, which I have no experience on, I would guess that things like the multi-die or multi-chiplet pro cards would likely function better split up in a similar way, if the VGPUs can be set to only use a specific block of them, rather than just everything randomly sharing it...which would be worse than just letting it try to use it all.

Alex Atkin UK · October 24, 2023

10 hours ago, Shlouski said:

This had crossed my mind.

It's also possible that it ends up working on two larger Wu's each one able to saturated the card on its own or one large and one small, in both circumstances it might end up taking much longer than usual. This might be mitigated to an extent by working on deseases which have generally smaller Wu's.

I would also like to make it clear that my main objective is to get more work done, not to generate more points, though I would happily take them .

How are you reading CUDA usage? Because overall my card is showing 85-96% GPU usage. The biggest loss I think is from CPU overhead every time it moves data to/from the GPU, any amount of extra CPU usage seems to slow WU completion even when there are plenty of idle CPU cores.

Linux seems to consistently be more efficient at feeding the GPU than Windows, unfortunately both my 4090s are on Windows as one is my gaming rig and the other my AI rig.

I also just discovered nvtop is a thing to monitor usage.

Just now, justpoet said:

Actually...I find that if you use only the core count, rather than the vcore count (hyperthreading), and run multiple instances that are either per chip or per chiplet, you tend to get MUCH better CPU results than if you just tell it to "use it all".

Yes, running a job across chiplets is always hindered by the latency penalty but I still find that using just one chiplet performs better than trying to utilise both for different jobs, and any CPU job slows down the GPU job.

justpoet · October 24, 2023

3 minutes ago, Alex Atkin UK said:

Yes, running a job across chiplets is always hindered by the latency penalty but I still find that using just one chiplet performs better than trying to utilise both for different jobs, and any CPU job slows down the GPU job.

I run linux, so it may just be that the scheduler is way better at keeping things to chiplets when the thread count matches. I also don't have GPUs to fold on, so I don't have to worry about that side of things.

Alex Atkin UK · October 24, 2023

2 hours ago, justpoet said:

I run linux, so it may just be that the scheduler is way better at keeping things to chiplets when the thread count matches. I also don't have GPUs to fold on, so I don't have to worry about that side of things.

Sounds likely, things improved a lot with Linux in general.

For example I never used to be able Fold on my Linux desktop on the GPU as it would cause the UI to become laggy, this no longer happens.

I've never ran Linux on the 5950X so its conceivable it would perform better. Just checking on the 5600, it seems like running a CPU WU may have far less impact on the GPU ones than it does on Intel CPUs, though could also be because I'm not running a desktop on that machine.

Its both fascinating and perplexing trying to figure out where the bottlenecks are with folding.

RollinLower · October 24, 2023

Borrowed a EPYC machine from work! Just add a little GPU and PCIe riser magic and shes ready to fold for the cause

Windows7ge · October 24, 2023

15 hours ago, Shlouski said:

If I could do this on my 4090 but just chop it into two, that would be great .

It might be possible, with more than one approach you could do to make it work as well. The big question is is this GPU part of and needed in your desktop or is it in a dedicated box where you can run a software stack that isn't Windows.

Alex Atkin UK · October 24, 2023

Foldbox updated. Now if I could just control those darn GPU LEDs in Linux, OpenRGB doesn't see them. Coolers RGB also disconnected as I misplaced the cable.

Had to limit the 4070 Ti to 170W to keep the fan slow enough to not resonate at an annoying pitch, but it doesn't seem to have much impact on WU completion time, or the R5 5600 compensates over the 9900K due its faster single-thread performance.

CPU cooling is massively overkill given the AS500 was previously on a 5950X, so its not even hitting 50C despite the heat from those GPUs.

IkeaGnome · October 24, 2023

1 minute ago, Alex Atkin UK said:

Now if I could just control those darn GPU LEDs in Linux

May I introduce you to electrical tape? With this ingenious invention there's no more LEDs. Or if you're one of the operators here at work there's no more CEL in your truck, warning light, high beam light or even gauge cluster in the dashboard.

Jacob_1999 · October 24, 2023

just got my first 10 WU done! im ready for the folding month!

Alex Atkin UK · October 24, 2023

13 minutes ago, IkeaGnome said:

May I introduce you to electrical tape? With this ingenious invention there's no more LEDs. Or if you're one of the operators here at work there's no more CEL in your truck, warning light, high beam light or even gauge cluster in the dashboard.

I hate electrical tape, the number of places I have sticky goop from old electrical tape.

The 3080 the LEDs appear to be plugged in at the top of the card so I could unplug them, just don't know which connector is fans and which LEDs so didn't do so as it would be a pig to plug back in if I got the wrong one.

I'm surprised the 3080 doesn't run hotter, it may have easier access to air but I also have the PSU pulling against its first fan, as being on the floor the filter got dirty too quickly and its easier to just clean the front filter.

IkeaGnome · October 24, 2023

14 minutes ago, Alex Atkin UK said:

The 3080 the LEDs appear to be plugged in at the top of the card so I could unplug them, just don't know which connector is fans and which LEDs so didn't do so as it would be a pig to plug back in if I got the wrong one.

Food for thought. There's a third connector. They use two 4 pin headers for the fans, which are the top ones you can see, then on the right side of the PCB just under the 8 pins there's a little white 5 pin connector. It takes 2 wires and turns that into your LED and control.

Spoiler

Here's how they look with the cooler off. Fans are red, and the led is yellow.

Spoiler

That's at least how it looks to me.

Alex Atkin UK · October 24, 2023

47 minutes ago, IkeaGnome said:

Food for thought. There's a third connector. They use two 4 pin headers for the fans, which are the top ones you can see, then on the right side of the PCB just under the 8 pins there's a little white 5 pin connector. It takes 2 wires and turns that into your LED and control.

Figures it wouldn't be so easy.

Sign In

LTT Official Folding Month VI

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites