Jump to content

F@H is running, WU's are being send/received yet my PPD is 0?

I need help! i got a pretty weird situation going on.

 

I installed some new GPU's in my NAS to stop wasting CPU cycles during idle hours and get some work done instead.

Relevant specs:

  

    - EPYC 7552 48 core

    - Gigabyte MZ32-AR0

    - 1TB ECC DDR4 in 16 sticks of 64GB

    - SuperMicro 26-bay 2U server chassis

    - 26x 2TB Samsung PM863a SSDs

       - 5 vdevs of 5 disks in RaidZ1 with a single hot-spare

    - 2x Corsair MP400 boot drives in a mirror

    - Dual 920Watt SQ PSU's

    - Alphacool watercooling

    - LSI HBA

    - 5x Nvidia tesla P4

 

This box is running TrueNAS Scale 22.12.4.2

 

I have a Ubuntu Server 22.04 VM running the F@H duties, with all GPU's passed through to it.

*For troubleshooting reasons, only 3 for now*

 

All GPU's are showing up in the VM just fine:

jeroen@teslavm:~$ lspci | grep NVIDIA 
00:07.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)   
00:08.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)       
00:09.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

 

And FAH also starts working with them:

07:54:25:WU01:FS01:0x22:Completed 1980000 out of 3000000 steps (66%)
07:54:37:WU02:FS02:0x22:Completed 2790000 out of 3000000 steps (93%)
07:55:19:WU03:FS03:0x22:Completed 2280000 out of 3000000 steps (76%)

 

my config file:

<config>
<!-- Folding Slot Configuration -->
<cause v='CANCER'/>
                                                              
<!-- Slot Control -->                                                                                                             
<power v='full'/>                                                                                                                 
                                                                      
<!-- User Information -->                                                                                                           
<passkey v='xxxxxxxxxxxxxxxxxxxx'/>                                                                                     
<team v='223518'/>                                                                                                             
<user v='RollinLower'/>                                                                                                            
                                                                                                                                                                                                           
<!-- Work Unit Control -->                                                                                                  
<next-unit-percentage v='90'/>
                                               
<!-- Folding Slots -->
<slot id='0' type='CPU'/>
<cpus v='8'/>
<slot id='1' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='7'/>
</slot>
<slot id='2' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='8'/>
</slot>
<slot id='3' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='9'/>
</slot>
</config>

 

So far so good right? 

 

Here's the problem:

jeroen@teslavm:~$ FAHClient --send-command ppd
08:01:48:Connecting to 127.0.0.1:36330
                                                                                                                                      
                                                                                                                                      
PyON 1 ppd                                                                                                                      
0                                                                                                                                   
---   

 

Now i have no clue why i get no points for this. Work is being completed, and when it does in the log it also shows some points, but only the base points without QRB or anything else. 

F@H also receives the work without issue it seems. So why am i not receiving any points for it?

 

I verified with EOC and the F@H stats page that my new clients are showing up, but i still get no points for the new work i do.

 

Any help figuring this one out is greatly appreciated!

Link to comment
Share on other sites

Link to post
Share on other sites

Well that's weird, I got some time to research so Ill look around some. The only thing I can think of is because you're missing web server from your configuration not allowing it to reach the FAHClient server on 127.0.0.1

 

My Folding Stats

 

Current Rigs

Raspberry Pi 5 8GB, Raspberry Pi 4 4GB, Raspberry Pi 3, Raspberry Pi Zero W, Raspberry Pi Zero...I like Pi

Fractal North,ASRock Riptide B550, Ryzen 7 5700X, 6700XT with custom CPU/GPU loop.

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/7/2024 at 9:10 PM, HoldSquat said:

Well that's weird, I got some time to research so Ill look around some. The only thing I can think of is because you're missing web server from your configuration not allowing it to reach the FAHClient server on 127.0.0.1

 

tried adding it, also tried adding the subnets for my site2site vpn and using the browser on my main pc at home to connect. no dice sadly.

 

I just went the nuclear option and build up a new VM from the ground up, which somehow seemed to work!

But now i run into the following error when i try to add more GPU's:

 

middlewared.service_exception.CallError: [EFAULT] internal error: qemu unexpectedly closed the monitor: 2024-01-09T15:54:27.024448Z qemu-system-x86_64: -device vfio-pci,host=0000:01:00.0,id=hostdev0,bus=pci.0,addr=0x7: vfio 0000:01:00.0: group 2 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

which seems a error with PCIe passthrough to the VM 😞

guess i'll be folding on just 3 cards for the event this week.

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/7/2024 at 3:05 AM, RollinLower said:

I need help! i got a pretty weird situation going on.

 

I installed some new GPU's in my NAS to stop wasting CPU cycles during idle hours and get some work done instead.

Relevant specs:

  

    - EPYC 7552 48 core

    - Gigabyte MZ32-AR0

    - 1TB ECC DDR4 in 16 sticks of 64GB

    - SuperMicro 26-bay 2U server chassis

    - 26x 2TB Samsung PM863a SSDs

       - 5 vdevs of 5 disks in RaidZ1 with a single hot-spare

    - 2x Corsair MP400 boot drives in a mirror

    - Dual 920Watt SQ PSU's

    - Alphacool watercooling

    - LSI HBA

    - 5x Nvidia tesla P4

 

This box is running TrueNAS Scale 22.12.4.2

 

I have a Ubuntu Server 22.04 VM running the F@H duties, with all GPU's passed through to it.

*For troubleshooting reasons, only 3 for now*

 

All GPU's are showing up in the VM just fine:

jeroen@teslavm:~$ lspci | grep NVIDIA 
00:07.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)   
00:08.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)       
00:09.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)

 

And FAH also starts working with them:

07:54:25:WU01:FS01:0x22:Completed 1980000 out of 3000000 steps (66%)
07:54:37:WU02:FS02:0x22:Completed 2790000 out of 3000000 steps (93%)
07:55:19:WU03:FS03:0x22:Completed 2280000 out of 3000000 steps (76%)

 

my config file:

<config>
<!-- Folding Slot Configuration -->
<cause v='CANCER'/>
                                                              
<!-- Slot Control -->                                                                                                             
<power v='full'/>                                                                                                                 
                                                                      
<!-- User Information -->                                                                                                           
<passkey v='xxxxxxxxxxxxxxxxxxxx'/>                                                                                     
<team v='223518'/>                                                                                                             
<user v='RollinLower'/>                                                                                                            
                                                                                                                                                                                                           
<!-- Work Unit Control -->                                                                                                  
<next-unit-percentage v='90'/>
                                               
<!-- Folding Slots -->
<slot id='0' type='CPU'/>
<cpus v='8'/>
<slot id='1' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='7'/>
</slot>
<slot id='2' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='8'/>
</slot>
<slot id='3' type='GPU'>
<pci-bus v='0'/>
<pci-slot v='9'/>
</slot>
</config>

 

So far so good right? 

 

Here's the problem:

jeroen@teslavm:~$ FAHClient --send-command ppd
08:01:48:Connecting to 127.0.0.1:36330
                                                                                                                                      
                                                                                                                                      
PyON 1 ppd                                                                                                                      
0                                                                                                                                   
---   

 

Now i have no clue why i get no points for this. Work is being completed, and when it does in the log it also shows some points, but only the base points without QRB or anything else. 

F@H also receives the work without issue it seems. So why am i not receiving any points for it?

 

I verified with EOC and the F@H stats page that my new clients are showing up, but i still get no points for the new work i do.

 

Any help figuring this one out is greatly appreciated!

My guess would be to check your passkey, and/or userid it may have been mangled

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×