Jump to content

LTT Official Folding Month VI

Go to solution Solved by GOTSpectrum,

 

Message added by TVwazhere,

Daily point updates are posted here:

1 minute ago, leadeater said:

lol thanks, you made me spot I had one system not configured correctly, a big one too

 

<team v='0223518'/>

 

image.thumb.png.3eda48e6631c03c1e074e226511fd6ba.png

 

All those team 0's AAAAaaaaaa

In fairness, F@H should probably be converting the team number to an integer to avoid that kind of thing.

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, RollinLower said:

are you seeing any downtime in your production? maybe time it sits idle waiting on a download or a server work ack?

I don't have a kept record but as best as I'm aware I watch one job finish according to the log and within a minute or so it has the next job downloaded and starts crunching again. So I don't think that's the problem.

 

21 minutes ago, Alex Atkin UK said:

Generally I find F@H often over-estimates WU credit.

Over-estimating by ~300% sounds either horribly inaccurate on the software's part or something else is the problem here.

 

17 minutes ago, leadeater said:

@Windows7ge I would suggest stopping all the CPU slows, use GPU-Z or some other tool and make sure the GPU is running at the 75W it should be and the GPU core and memory clocks are running at spec. Then check if output is around what is expected. After that look at CPU, same thing check package power and per core operating clocks.

Well for starters nvidia-smi immediately says we're not ticking half those boxes even with CPU disabled:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.06             Driver Version: 535.104.06   CUDA Version: N/A      |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P4                       On  | 00000000:22:00.0 Off |                    0 |
| N/A   63C    P0              50W /  75W |   7519MiB /  7680MiB |     85%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I'll have to get back to you on clock speeds...

 

18 minutes ago, Alex Atkin UK said:

Seems you aren't getting the bonus.

https://apps.foldingathome.org/cpu?q=Windows7ge

That's for finishing the work early yeah? I guess it's an indication that the work is getting completed a lot slower than it should?

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Windows7ge said:

That's for finishing the work early yeah? I guess it's an indication that the work is getting completed a lot slower than it should?

Bonus should always be 1. The only time it won't be is if it's a new client install on a new system and you need to complete 10 WUs. So if your container or virtualization method is causing your CPUID to always change then you'll never get the bonus, this shouldn't be a problem though.

 

I know you've been changing around your setup a bit so how long has it be static and running as is without any changes?

 

9 minutes ago, Windows7ge said:

Well for starters nvidia-smi immediately says we're not ticking half those boxes even with CPU disabled:

50W and 85% isn't so bad, FAH WU's probably won't fully max it out. Seems fairly within what I'd expect. Your problem is you aren't getting the bonus.

 

Edit:

Check your config.xml actually has your passkey set too.

Link to comment
Share on other sites

Link to post
Share on other sites

lol fahclient is high, no way CPU only PPD is 6mil on this

 

FAHClient --send-command ppd
18:29:13:Connecting to 127.0.0.1:36330
PyON 1 ppd
6072074.418393
---

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

lol fahclient is high, no way CPU only PPD is 6mil on this

 

FAHClient --send-command ppd
18:29:13:Connecting to 127.0.0.1:36330
PyON 1 ppd
6072074.418393
---

 

just started a new WU or something i guess?
my humble little radeon 6900xt once did over 20M at the start of a WU, before dropping down to 5...

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, RollinLower said:

just started a new WU or something i guess?
my humble little radeon 6900xt once did over 20M at the start of a WU, before dropping down to 5...

Probably, it's reporting 9mil now. I did stop and start fah because I actually had one too many cpu slots configured. Changing the number of CPU slots must have made it confused heh.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

I know you've been changing around your setup a bit so how long has it be static and running as it without any changes?

One server is running native Linux/no VM for almost 7 strait days. (CPU + GPU)

Second server has a LXC container that's been online for almost 7 strait days. (CPU)

Second server but in a VM running Windows 10 Enterprise LTSC (GPU) almost 3 days.

 

No changes to configs during this time.

 

10 minutes ago, leadeater said:

Check your config.xml actually has your passkey set too.

IDK if the passkey is supposed to stay private so:

 

Native Linux server:

<!-- User Information -->
  <passkey v='********************************'/>
  <team v='223518'/>
  <user v='Windows7ge'/>

 

LXC Container:

<!-- User Information -->
  <passkey v='********************************'/>
  <team v='223518'/>
  <user v='Windows7ge'/>

 

Windows 10 Enterprise LTSC VM:

 

Screenshotfrom2023-11-0414-35-43.png.036617a495cfc477e5dac0277d8ed494.png

 

IDK what's going on here. I'm not used to folding. BOINC wasn't even this complicated to get working right IMO...

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Windows7ge said:

IDK if the passkey is supposed to stay private so:

yes

 

6 minutes ago, Windows7ge said:

IDK what's going on here. I'm not used to folding. BOINC wasn't even this complicated to get working right IMO...

Restart all your clients so they read in the config file again and apply the passkey, just in case.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, leadeater said:

Restart all your clients so they read in the config file again and apply the passkey, just in case.

Question. The e-mail F@H sends you when you register for a passkey...is that supposed to show your your folding name as the username or your real name?

 

Cause double-checking the e-mail they sent me it says my username is my real name...could that be...part of the problem?

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Windows7ge said:

Question. The e-mail F@H sends you when you register for a passkey...is that supposed to show your your folding name as the username or your real name?

 

Cause double-checking the e-mail they sent me it says my username is my real name...could that be...part of the problem?

folding name and pass key need to match and be the same across all clients.

 

  <!-- User Information -->
  <passkey v='****************************'/>
  <team v='223518'/>
  <user v='leadeater'/>

 

image.png.84382e3d649bfd48c6abffd99444fcef.png

 

https://folding.extremeoverclocking.com/user_summary.php?s=&u=812290

https://stats.foldingathome.org/donor/id/1398229

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

folding name and pass key need to match and be the same across all clients.

Not what I mean't. I mean I apparently created my passkey using my IRL name. IRL name does not match my folding username.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Windows7ge said:

Not what I mean't. I mean I apparently created my passkey using my IRL name. IRL name does not match my folding username.

Oh the username in the email is not 'Windows7ge'?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Oh the username in the email is not 'Windows7ge'?

Correct:

Hello IRL-Name,

    Username: IRL-Name
     Passkey: whatever

Is what I was using.

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, leadeater said:

haha nope, get a new key generated

 

https://apps.foldingathome.org/getpasskey

Way ahead of you. Already swapped them out and rebooted all three clients. Just gonna watch them for a while and make sure they're all online.

 

Re-reading the getpasskey board I see what they wanted now reading the text but FFS why do this to the inept people like me?

 

Screenshotfrom2023-11-0415-16-30.png.1d0c78757c9185add874c0fc349f6aa7.png

 

Why not, IDFK this?

 

Screenshotfrom2023-11-0415-17-18.png.ab63ced1af1f8ba6c3d31eb6dcc099cf.png

 

I'm also looking to do a performance experiment. Tesla P4 native Linux vs vGPU P4-8Q VM Windows 10. Basically the two extreme ends of what theoretically would offer the best and worst performance. I want to see how big the gap is. Ideally the gap should be negligible.

 

I suppose I will run this for the next couple of days and hope to see an improvement.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Windows7ge said:

Way ahead of you. Already swapped them out and rebooted all three clients. Just gonna watch them for a while and make sure they're all online.

 

Re-reading the getpasskey board I see what they wanted now reading the text but FFS why do this to the inept people like me?

 

Screenshotfrom2023-11-0415-16-30.png.1d0c78757c9185add874c0fc349f6aa7.png

 

Why not, IDFK this?

 

Screenshotfrom2023-11-0415-17-18.png.ab63ced1af1f8ba6c3d31eb6dcc099cf.png

 

I'm also looking to do a performance experiment. Tesla P4 native Linux vs vGPU P4-8Q VM Windows 10. Basically the two extreme ends of what theoretically would offer the best and worst performance. I want to see how big the gap is. Ideally the gap should be negligible.

 

I suppose I will run this for the next couple of days and hope to see an improvement.

the improvement should be visible from the next WU you upload with a valid passkey, so pretty quick.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, RollinLower said:

the improvement should be visible from the next WU you upload with a valid passkey, so pretty quick.

One server just got a new CPU job and we jumped from 400K to 1.8M. GPU's are gonna be a couple hours before their next jobs and other CPU is going to be almost midnight before it's done.

 

Basically I should be able to come back to this tomorrow morning and have a better estimate of what I should be getting.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Gorgon said:

 

Yup, woke up to another rig with both slots set at "Failed" and stuck there for the last 6 hours. What I noticed last night was the Rigs that failed then were all running driver 535.129.03 whereas the Rigs that were still up were on 535.113.01 or 525.125.06 and one Rig on 535.129.03.

 

Guess What? The Rig that didn't fail yesterday evening that has 535.129.03 was the one that failed overnight. So all my Rigs running 535.129.03 have all failed within 12 hours of each other.

 

All started working again after just a reboot with no "apt update" required.

 

I'm going to keep running as is and see what happens but it looks like 535.129.03 might have issues.

I have 535.129.03 and 525.147.05  both down.  They have been rock solid ever since. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Windows7ge said:

Way ahead of you. Already swapped them out and rebooted all three clients. Just gonna watch them for a while and make sure they're all online.

 

Re-reading the getpasskey board I see what they wanted now reading the text but FFS why do this to the inept people like me?

 

Screenshotfrom2023-11-0415-16-30.png.1d0c78757c9185add874c0fc349f6aa7.png

 

Why not, IDFK this?

 

Screenshotfrom2023-11-0415-17-18.png.ab63ced1af1f8ba6c3d31eb6dcc099cf.png

 

I'm also looking to do a performance experiment. Tesla P4 native Linux vs vGPU P4-8Q VM Windows 10. Basically the two extreme ends of what theoretically would offer the best and worst performance. I want to see how big the gap is. Ideally the gap should be negligible.

 

I suppose I will run this for the next couple of days and hope to see an improvement.

I get the feeling F@H as a whole was really hacked together out of necessity.  The passkey in particular an afterthought when they realised that making it into a competition might increase participation and they needed a way to make sure users were unique.

 

The website is still rather clunky and as we found out they don't seem to do a good job of sanity checking things.  The hourly stats files are an absolute mess with some seemingly invalid usernames that require hacks to make any sense out of it.  Its why F@H ranks, lar.systems and my own are not the same as we choose different methods of weeding out the mess. 

 

I ignore really bad usernames entirely as that's mostly legacy users long since abandoned, and my focus is only on the users close in rank and above me.  Lar.systems must have a heck of a time having to parse the whole list and historical data.

 

This is why the decision was made for FAHClient v8 to be open source, as they don't have a lot of resources to fix things when the priority is the actual research.   Its a shame there isn't equal focus on tidying up the back-end and website, but I guess that's tricky as they don't want to break anything when the research is the priority.

Router:  Intel N100 (pfSense) + GL.iNet GL-X3000/ Spitz AX WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

My setup has pretty much settled in:

 

About 10M PPD

800 Watts on average

Too many decibels

+2 Freedom degrees for my unfinished basement

 

Spoiler

image.png.42f26b8c1bb6946cef2cbed6cc63e639.png

How closely do I need to monitor the hot spot temperatures? If I set the fans down to 40% on the warmer server and 30% on the cooler one (so I don't hear them as much upstairs), the hot spots on the hotter cards stabilize at around 96 science degrees.

I sold my soul for ProSupport.

Link to comment
Share on other sites

Link to post
Share on other sites

34 minutes ago, Alex Atkin UK said:

I get the feeling F@H as a whole was really hacked together out of necessity. 

When you need to relly on a web browser because the installed app on the machine doesn't function, you know things are off the rails...

Want to help researchers improve the lives on millions of people with just your computer? Then join World Community Grid distributed computing, and start helping the world to solve it's most difficult problems!

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

Bonus should always be 1. The only time it won't be is if it's a new client install on a new system and you need to complete 10 WUs.

Do each of my clients need to finish 10 WU's now before the bonus kicks in? I think that's gonna be at least a couple of days then before I start to see if this fix does anything.

Link to comment
Share on other sites

Link to post
Share on other sites

Well I am dead in the water. The rig holding the 4x4090's has had a PSU failure and it can't be replaced till next weekend at the earliest as the machine isn't local to me.

 

Guess I will throw a 5700XT and 6750XT into the competition for now. It will be quite the drop in production.

My Folding Stats

My BOINC Stats

 

 

VelosterN:

AMD Ryzen 9 5950X - Asus ROG Strix X570-E Gaming - Corsair Vengeance RGB Pro 3600Mhz 32GB - Asus ROG Strix Gaming 6750 XT OC

Corsair Crystal Series 680x RGB - Samsung 970 Evo Plus 250GB NVMe - Samsung 970 Pro 512GB NVMe - Samsung 860 Pro 256 GB 2.5" SSD X2

EVGA P2 80+ Platinum 850Watt PSU - BenQ XL2730Z 27.0" 2560x1440 144 Hz - be quiet! Dark Rock 4

Corsair K70 LUX - Logitech G502 Proteus Spectrum - Sennheiser HD599 - Blue Yeti Mic

Windows 11 Professional Version 22H2

BettyBoop:

AMD Ryzen 5 2600X - Asus ROG Strix B450-I Gaming - Corsair Vengeance LPX 3000Mhz 16GB - Asus ROG Strix Gaming 5500 XT - Fractal Design Core 500

Samsung 860 Pro 512GB Sata - EVGA 550GM 80+ Gold 550Watt SFX PSU - be quiet! Dark Rock 4

Windows 11 Professional Version 22H2

HTPC:

Intel i7 6700K - Asus Maximus Hero VIII - Corsair Vengeance LPX 3000MHz 16GB - MSI RX480 Gaming-X 8GB - Cooler Master 932 HAF

Seagate 250GB HDD - EVGA G2 80+ Gold 650Watt PSU - Corsair H100i

Windows 10 Professional Version 21H2

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Windows7ge said:

Do each of my clients need to finish 10 WU's now before the bonus kicks in? I think that's gonna be at least a couple of days then before I start to see if this fix does anything.

I believe its 10 WU total.

My Folding Stats

My BOINC Stats

 

 

VelosterN:

AMD Ryzen 9 5950X - Asus ROG Strix X570-E Gaming - Corsair Vengeance RGB Pro 3600Mhz 32GB - Asus ROG Strix Gaming 6750 XT OC

Corsair Crystal Series 680x RGB - Samsung 970 Evo Plus 250GB NVMe - Samsung 970 Pro 512GB NVMe - Samsung 860 Pro 256 GB 2.5" SSD X2

EVGA P2 80+ Platinum 850Watt PSU - BenQ XL2730Z 27.0" 2560x1440 144 Hz - be quiet! Dark Rock 4

Corsair K70 LUX - Logitech G502 Proteus Spectrum - Sennheiser HD599 - Blue Yeti Mic

Windows 11 Professional Version 22H2

BettyBoop:

AMD Ryzen 5 2600X - Asus ROG Strix B450-I Gaming - Corsair Vengeance LPX 3000Mhz 16GB - Asus ROG Strix Gaming 5500 XT - Fractal Design Core 500

Samsung 860 Pro 512GB Sata - EVGA 550GM 80+ Gold 550Watt SFX PSU - be quiet! Dark Rock 4

Windows 11 Professional Version 22H2

HTPC:

Intel i7 6700K - Asus Maximus Hero VIII - Corsair Vengeance LPX 3000MHz 16GB - MSI RX480 Gaming-X 8GB - Cooler Master 932 HAF

Seagate 250GB HDD - EVGA G2 80+ Gold 650Watt PSU - Corsair H100i

Windows 10 Professional Version 21H2

 

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.


×