Jump to content

Windows7ge

Member
  • Posts

    12,134
  • Joined

  • Last visited

Posts posted by Windows7ge

  1. Interesting things I'm learning. Linux won't reuse Process ID's while the server is running.

    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
    |        ID   ID                                                             Usage      |
    |=======================================================================================|
    |    0   N/A  N/A   1293421      C   ...it/22-0.0.20/Core_22.fah/FahCore_22      428MiB |
    |    1   N/A  N/A   1299072      C   ...it/22-0.0.20/Core_22.fah/FahCore_22      178MiB |
    +---------------------------------------------------------------------------------------+

    When I started the box six days ago I was only in the 1,000's now we're in the 1,000,000's and rising. o_O

  2. Alright, ditched Windows we're going full Linux nodes.

     

    I decided since we verified the passkey is working on one Linux node with a GPU I'm just going to throw the other one in and run two nodes instead of three. Turns out every other slot is a little too close for my super basic airflow guides.

     

    20231108_195502.thumb.jpg.275afbfdcf8eb4ee98a65b44157a27e6.jpg

     

    But temps still look ok so whatever.

     

    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  Tesla P4                       Off | 00000000:02:00.0 Off |                  Off |
    | N/A   70C    P0              60W /  75W |    180MiB /  8192MiB |     99%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
    |   1  Tesla P4                       Off | 00000000:81:00.0 Off |                    0 |
    | N/A   64C    P0              54W /  75W |    122MiB /  7680MiB |     99%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+
                                                                                             
    +---------------------------------------------------------------------------------------+
    | Processes:                                                                            |
    |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
    |        ID   ID                                                             Usage      |
    |=======================================================================================|
    |    0   N/A  N/A      1593      C   ...it/22-0.0.20/Core_22.fah/FahCore_22      178MiB |
    |    1   N/A  N/A      2292      C   ...it/22-0.0.20/Core_22.fah/FahCore_22      120MiB |
    +---------------------------------------------------------------------------------------+

     

    For the moment the log says everything's working AOK. We'll see if I get another PPD bump in the next 24~48hrs. I'd like to see >2M.

  3. 3 hours ago, leadeater said:

    I think it's fixed, can't quite tell 🤔

    Okay, NOW we're starting to get somewhere. 😆

     

    2 hours ago, Alex Atkin UK said:

    If you look on the Recent CPUs page at the Last returned, it seems to be your Windows client that is not returning a bonus.  The Linux boxes are working fine now.

    Appears that way. I might swap that Windows VM for Linux. Even if the PPD in FAH Control aren't accurate Linux is consistently higher than Windows anyways. I might get the bonus that way if in the 4 days the passkey has been correct it should have gotten one by now.

  4. 28 minutes ago, Alex Atkin UK said:

    If all recent WUs are getting the bonus we can assume all is well, assuming all clients are actually showing up in the recent WU completed list.

     

    If your passkey is wrong I'm not sure if they show up on there?

    My passkey was wrong but clients were still showing up there.

     

    Screenshotfrom2023-11-0718-15-48.png.809f5150c97799765193ed07c2a31c62.png

     

    I'm actually more confused though cause that means bonuses were going...somewhere but I didn't fix the passkey until Nov 4th. How do I verify if I'm actually getting the bonus cause I definitely wasn't before the 4th.

     

    It looks like both of my Linux instances should be getting the bonus. Still waiting to see a bonus on the Windows VM.

  5. 3 hours ago, leadeater said:

    FAH is saying you are getting the bonus now and your past 24hr is going up

    Hmn 1,113,340 at this moment. I'd still like to see it quite a bit higher but at least it's a step in the right direction.

     

    3 hours ago, leadeater said:

    I always stop FAH, then modify config.xml, then start it again. Always works if you do it that way.

    Ugh, so many extra steps. We were looking at 0, 1, and 3's yeah? 0 means no bonus, 1 means bonus, and 3 is my active instances? (count is right) Can we discern from that information if all three clients are now getting the bonus because something did change we just want to verify all ends took the change.

     

    I don't want to start fiddling around with the running service and the config file if it all ends up being unnecessary.

     

    2 hours ago, Gorgon said:

    You need to patch it to get it to work properly

    The bug report is appreciated.

  6. 1 hour ago, Alex Atkin UK said:

    Like they said, FAHClient writes the config file when you terminate the process so if you modify it while still running it will restore the old configuration when you restart the service.

    Wait, but if that's the case is my new passkey even working? Do I have to stop the service/start the service to read-in the config file? A system restart isn't sufficient?

  7. 8 hours ago, Justaphf said:

    The service has to be completely stopped if you are directly editing the config files.  I fought with this issue for several rounds the first time I set up a Linux box until I figured that out.

    For Linux I modified the config files directly. It should not matter if the file was modified hot it's only read-in once. So long as I restarted the fahclient.service or rebooted the whole VM the new passkey should be read-in either way. If that wasn't the case then when I added remote management that shouldn't have worked.

     

    @leadeater I'll give it a few days but I'm still <50% what I'd expect to see. That very first big peak on my chart? 900K something? That was one Tesla P4, and my 7551P. Since then I added a second P4 & a pair of 2698v3's and I'm averaging less with double the hardware so...

  8. Like BOINC does F@H support only transferring files during certain times of day?

     

    Since I started folding I've received complaints from family that programs like Microsoft Teams and Google Meet aren't holding a solid connection on Wi-Fi.

     

    The real problem could be some deep-rooted unrelated thing but apparently folding is making the existing problem worse.

     

    Ideas? Some of what my family does is for work related purposes so I either need to stop folding when they need to be online or see if stopping file transfers during their work hours fixes the problem.

  9. 8 hours ago, Alex Atkin UK said:

    The numbers wont go up until the bonus kicks in, question is why it isn't doing so.

     

    You're 100% sure the new passkey and usernames in the clients match exactly?

    Yep. For all three instances i know the team number matches, my username matches, and I copy/pasted the new passkey which I made sure the folding username matched in the e-mail this time and I restarted all three servers so the client would load in the new passkey from the config file.

     

    If this doesn't fix it I can only guess to try disabling CPU and see what that does to GPU. Points per Watt GPU is better anyways by a multitude.

  10. 2 hours ago, leadeater said:

    Your client configs still aren't right, still at ~600k ppd

    Even if there is still something wrong I have no idea what to change or what I can change.

     

    My native client is putting out barely any more than my virtualized client so it doesn't seem like that's at fault.

    Native Linux is putting out more GPU on average than Windows virtualized but not the 900K advertised. So that sounds like a config issue but what config? 🤷‍♂️

     

    There was a small bump after we hopefully corrected the passkey but the numbers still don't add up correctly on my end. I'm lost.

  11. 6 minutes ago, RollinLower said:

    the improvement should be visible from the next WU you upload with a valid passkey, so pretty quick.

    One server just got a new CPU job and we jumped from 400K to 1.8M. GPU's are gonna be a couple hours before their next jobs and other CPU is going to be almost midnight before it's done.

     

    Basically I should be able to come back to this tomorrow morning and have a better estimate of what I should be getting.

  12. 18 minutes ago, leadeater said:

    haha nope, get a new key generated

     

    https://apps.foldingathome.org/getpasskey

    Way ahead of you. Already swapped them out and rebooted all three clients. Just gonna watch them for a while and make sure they're all online.

     

    Re-reading the getpasskey board I see what they wanted now reading the text but FFS why do this to the inept people like me?

     

    Screenshotfrom2023-11-0415-16-30.png.1d0c78757c9185add874c0fc349f6aa7.png

     

    Why not, IDFK this?

     

    Screenshotfrom2023-11-0415-17-18.png.ab63ced1af1f8ba6c3d31eb6dcc099cf.png

     

    I'm also looking to do a performance experiment. Tesla P4 native Linux vs vGPU P4-8Q VM Windows 10. Basically the two extreme ends of what theoretically would offer the best and worst performance. I want to see how big the gap is. Ideally the gap should be negligible.

     

    I suppose I will run this for the next couple of days and hope to see an improvement.

  13. 10 minutes ago, leadeater said:

    Restart all your clients so they read in the config file again and apply the passkey, just in case.

    Question. The e-mail F@H sends you when you register for a passkey...is that supposed to show your your folding name as the username or your real name?

     

    Cause double-checking the e-mail they sent me it says my username is my real name...could that be...part of the problem?

  14. 6 minutes ago, leadeater said:

    I know you've been changing around your setup a bit so how long has it be static and running as it without any changes?

    One server is running native Linux/no VM for almost 7 strait days. (CPU + GPU)

    Second server has a LXC container that's been online for almost 7 strait days. (CPU)

    Second server but in a VM running Windows 10 Enterprise LTSC (GPU) almost 3 days.

     

    No changes to configs during this time.

     

    10 minutes ago, leadeater said:

    Check your config.xml actually has your passkey set too.

    IDK if the passkey is supposed to stay private so:

     

    Native Linux server:

    <!-- User Information -->
      <passkey v='********************************'/>
      <team v='223518'/>
      <user v='Windows7ge'/>

     

    LXC Container:

    <!-- User Information -->
      <passkey v='********************************'/>
      <team v='223518'/>
      <user v='Windows7ge'/>

     

    Windows 10 Enterprise LTSC VM:

     

    Screenshotfrom2023-11-0414-35-43.png.036617a495cfc477e5dac0277d8ed494.png

     

    IDK what's going on here. I'm not used to folding. BOINC wasn't even this complicated to get working right IMO...

  15. 20 minutes ago, RollinLower said:

    are you seeing any downtime in your production? maybe time it sits idle waiting on a download or a server work ack?

    I don't have a kept record but as best as I'm aware I watch one job finish according to the log and within a minute or so it has the next job downloaded and starts crunching again. So I don't think that's the problem.

     

    21 minutes ago, Alex Atkin UK said:

    Generally I find F@H often over-estimates WU credit.

    Over-estimating by ~300% sounds either horribly inaccurate on the software's part or something else is the problem here.

     

    17 minutes ago, leadeater said:

    @Windows7ge I would suggest stopping all the CPU slows, use GPU-Z or some other tool and make sure the GPU is running at the 75W it should be and the GPU core and memory clocks are running at spec. Then check if output is around what is expected. After that look at CPU, same thing check package power and per core operating clocks.

    Well for starters nvidia-smi immediately says we're not ticking half those boxes even with CPU disabled:

    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 535.104.06             Driver Version: 535.104.06   CUDA Version: N/A      |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  Tesla P4                       On  | 00000000:22:00.0 Off |                    0 |
    | N/A   63C    P0              50W /  75W |   7519MiB /  7680MiB |     85%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+

    I'll have to get back to you on clock speeds...

     

    18 minutes ago, Alex Atkin UK said:

    Seems you aren't getting the bonus.

    https://apps.foldingathome.org/cpu?q=Windows7ge

    That's for finishing the work early yeah? I guess it's an indication that the work is getting completed a lot slower than it should?

×