Jump to content

Windows7ge

Member
  • Posts

    12,134
  • Joined

  • Last visited

Everything posted by Windows7ge

  1. Windows7ge

    Status Update

    If I wanted to shake hands with danger and wouldn't get in trouble with my boss for doing it a couple of these would actually be handy to "repair" stuff some some contractors fudged up.
  2. Windows7ge

    YouTube still thinks I have my ad blocker still…

    Does your browser have a blocker built-in? Are you running a network based ad-blocker?
  3. Windows7ge

    Mor networking! Something I love about old netw…

    I know SFP28 is 25Gig. I'll have to lookup SFP56. I kind of prefer QSFP+ for 40Gig just because of the size of my hands. The cables & modules are easier told hold. Less finnicky. We don't have that here though these are direct to server quad SFP+ connections.
  4. Interesting things I'm learning. Linux won't reuse Process ID's while the server is running. +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1293421 C ...it/22-0.0.20/Core_22.fah/FahCore_22 428MiB | | 1 N/A N/A 1299072 C ...it/22-0.0.20/Core_22.fah/FahCore_22 178MiB | +---------------------------------------------------------------------------------------+ When I started the box six days ago I was only in the 1,000's now we're in the 1,000,000's and rising. o_O
  5. Mor networking!

     

    20231114_153245.thumb.jpg.9e8b0f14bb350922ec5d3c7350e1e43d.jpg

     

    Something I love about old network cable is the ability to just chop it up, re-terminate, and repurpose it all. We're making 10 ~20ft Cat6 cables.

     

    I can finally get some use out of this unused network switch (#2). 40Gig uplink is great and all but 50Gig would be better.

     

    20231114_153112.thumb.jpg.dee175c89e5fe98ffd84c96611bce978.jpg

    1. Lurick

      Lurick

      50gig is cool, I'm still hoping in a couple years we'll finally have SFP-128 options on the market 🙂

      SFP-56 is supposed to be coming out this year or early next iirc.

    2. Windows7ge

      Windows7ge

      I know SFP28 is 25Gig. I'll have to lookup SFP56.

       

      I kind of prefer QSFP+ for 40Gig just because of the size of my hands. The cables & modules are easier told hold. Less finnicky. We don't have that here though these are direct to server quad SFP+ connections.

    3. Lurick

      Lurick

      Yah SFP56 is 50G but in SFP form factor. What I really enjoy about 25G and 50G SFPs is that they have the pull tab like the QSFP modules do 😄

  6. Windows7ge

    I only had to fight tooth and nail to get it bu…

    Actually now that you really brought it to my attention I think I have to swap my SSD's with my HDDs. There's no way all of this is sharing a single PCI_e Gen3 lane. That caps multiple Gig performance to 1GB/s.
  7. Windows7ge

    I only had to fight tooth and nail to get it bu…

    Dual socket LGA3647 w/ 7 active PCI_e slots as my primary hypervisor server with multiple things passed through. I don't really have it in me to find a temporary replacement board without taking it offline for a long period of time. Reading the motherboard schematic the PCH only gets one whopping PCI_e lane so if the current SSD in that slot a Kingston DC1000B is getting 1/4th the bandwidth and sharing it with other devices. I'm better off putting it on a riser card and putting it in a CPU2 slot. The Intel QPI link has much higher bandwidth. It'd be worth it. I might just put them on the Supermicro 4x22110 riser card I have if that happens.
  8. I only had to fight tooth and nail to get it but my replacement SSD turned into an upgraded SSD.

     

    20231111_110502.thumb.jpg.c804ef7dda171bcd0585e38c633c044d.jpg

     

    Gen3 -> Gen4

    7300 PRO -> 7400 PRO

     

    Just an FYI be careful buying from 3rd party marketplace sellers. I just went through Hell getting this replacement. Amazon could't do anything for me, the seller ghosted me. I had to talk to Micron who just said they aren't the distributor. Gave them the S/N which then directed me to Crucial. Crucial processed my RMA. Told me they didn't have 7300's anymore. Offered me a 7400. Sent me a confirmation for replacement. Sent me another e-mail saying they were out of stock. Then a couple days ago I got another email that my replacement finally shipped.

     

    So that sucked.

     

    Worse, I'm starting to suspect the motherboard is killing the SSD's but I don't know why. This is the second SSD I've replaced in this slot.

     

    20221027_182747.thumb.jpg.49da68dc733a68aab5897424029a7d04.jpg

     

    Each time I replace the SSD I start getting this error in the log:

     

    Nov 11 08:46:42 intel kernel: [1346300.041503] {62}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
    Nov 11 08:46:42 intel kernel: [1346300.042451] {62}[Hardware Error]: It has been corrected by h/w and requires no further action
    Nov 11 08:46:42 intel kernel: [1346300.042893] {62}[Hardware Error]: event severity: corrected
    Nov 11 08:46:42 intel kernel: [1346300.043288] {62}[Hardware Error]:  Error 0, type: corrected
    Nov 11 08:46:42 intel kernel: [1346300.043683] {62}[Hardware Error]:   section_type: PCIe error
    Nov 11 08:46:42 intel kernel: [1346300.044068] {62}[Hardware Error]:   port_type: 4, root port
    Nov 11 08:46:42 intel kernel: [1346300.044452] {62}[Hardware Error]:   version: 3.0
    Nov 11 08:46:42 intel kernel: [1346300.044842] {62}[Hardware Error]:   command: 0x0547, status: 0x0010
    Nov 11 08:46:42 intel kernel: [1346300.045228] {62}[Hardware Error]:   device_id: 0000:17:00.0
    Nov 11 08:46:42 intel kernel: [1346300.045620] {62}[Hardware Error]:   slot: 21
    Nov 11 08:46:42 intel kernel: [1346300.045998] {62}[Hardware Error]:   secondary_bus: 0x18
    Nov 11 08:46:42 intel kernel: [1346300.046369] {62}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2030
    Nov 11 08:46:42 intel kernel: [1346300.046747] {62}[Hardware Error]:   class_code: 060400
    Nov 11 08:46:42 intel kernel: [1346300.047113] {62}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0013
    Nov 11 08:46:42 intel kernel: [1346300.055863] pcieport 0000:17:00.0: AER: aer_status: 0x00001000, aer_mask: 0x00002000
    Nov 11 08:46:42 intel kernel: [1346300.056233] pcieport 0000:17:00.0:    [12] Timeout               
    Nov 11 08:46:42 intel kernel: [1346300.056606] pcieport 0000:17:00.0: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID

     

    When I lookup what component is at address 17:00.0:

    17:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)

    So this is either part of the CPU or C621 PCH and might be killing the SSDs. I'm not sure. If I lose a 3rd SSD I'm just going to stop using that slot and assume the motherboard has a problem.

    1. TopHatProductions115

      TopHatProductions115

      Why not test a replacement motherboard? Just curious...

    2. Windows7ge

      Windows7ge

      22 minutes ago, TopHatProductions115 said:

      Why not test a replacement motherboard? Just curious...

      Dual socket LGA3647 w/ 7 active PCI_e slots as my primary hypervisor server with multiple things passed through.

       

      I don't really have it in me to find a temporary replacement board without taking it offline for a long period of time.

       

      Reading the motherboard schematic the PCH only gets one whopping PCI_e lane so if the current SSD in that slot a Kingston DC1000B is getting 1/4th the bandwidth and sharing it with other devices. I'm better off putting it on a riser card and putting it in a CPU2 slot. The Intel QPI link has much higher bandwidth. It'd be worth it. I might just put them on the Supermicro 4x22110 riser card I have if that happens.

    3. Windows7ge

      Windows7ge

      Actually now that you really brought it to my attention I think I have to swap my SSD's with my HDDs. There's no way all of this is sharing a single PCI_e Gen3 lane. That caps multiple Gig performance to 1GB/s.

  9. Windows7ge

    Windows thinks my recycle bin is empty and won'…

    Not a fix but temporary solution, SHIFT+DELETE will skip the Recycle Bin and make the file disappear ready to be written over by new data.
  10. I like how you started at 10M, dropped to 0 then just flew past 80M.
  11. There we go, NOW we're getting somewhere. This is about what I was expecting more or less. So, how many multitudes more are others getting. Am I in last place yet?
  12. Alright, ditched Windows we're going full Linux nodes. I decided since we verified the passkey is working on one Linux node with a GPU I'm just going to throw the other one in and run two nodes instead of three. Turns out every other slot is a little too close for my super basic airflow guides. But temps still look ok so whatever. +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P4 Off | 00000000:02:00.0 Off | Off | | N/A 70C P0 60W / 75W | 180MiB / 8192MiB | 99% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla P4 Off | 00000000:81:00.0 Off | 0 | | N/A 64C P0 54W / 75W | 122MiB / 7680MiB | 99% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1593 C ...it/22-0.0.20/Core_22.fah/FahCore_22 178MiB | | 1 N/A N/A 2292 C ...it/22-0.0.20/Core_22.fah/FahCore_22 120MiB | +---------------------------------------------------------------------------------------+ For the moment the log says everything's working AOK. We'll see if I get another PPD bump in the next 24~48hrs. I'd like to see >2M.
  13. Okay, NOW we're starting to get somewhere. Appears that way. I might swap that Windows VM for Linux. Even if the PPD in FAH Control aren't accurate Linux is consistently higher than Windows anyways. I might get the bonus that way if in the 4 days the passkey has been correct it should have gotten one by now.
  14. Crap there's supposed to be three here. Edit: Oh wait, this just means the passkey is working. So at least one of three clients are using it...how helpful.
  15. My passkey was wrong but clients were still showing up there. I'm actually more confused though cause that means bonuses were going...somewhere but I didn't fix the passkey until Nov 4th. How do I verify if I'm actually getting the bonus cause I definitely wasn't before the 4th. It looks like both of my Linux instances should be getting the bonus. Still waiting to see a bonus on the Windows VM.
  16. Hmn 1,113,340 at this moment. I'd still like to see it quite a bit higher but at least it's a step in the right direction. Ugh, so many extra steps. We were looking at 0, 1, and 3's yeah? 0 means no bonus, 1 means bonus, and 3 is my active instances? (count is right) Can we discern from that information if all three clients are now getting the bonus because something did change we just want to verify all ends took the change. I don't want to start fiddling around with the running service and the config file if it all ends up being unnecessary. The bug report is appreciated.
  17. Wait, but if that's the case is my new passkey even working? Do I have to stop the service/start the service to read-in the config file? A system restart isn't sufficient?
  18. For Linux I modified the config files directly. It should not matter if the file was modified hot it's only read-in once. So long as I restarted the fahclient.service or rebooted the whole VM the new passkey should be read-in either way. If that wasn't the case then when I added remote management that shouldn't have worked. @leadeater I'll give it a few days but I'm still <50% what I'd expect to see. That very first big peak on my chart? 900K something? That was one Tesla P4, and my 7551P. Since then I added a second P4 & a pair of 2698v3's and I'm averaging less with double the hardware so...
  19. Like BOINC does F@H support only transferring files during certain times of day? Since I started folding I've received complaints from family that programs like Microsoft Teams and Google Meet aren't holding a solid connection on Wi-Fi. The real problem could be some deep-rooted unrelated thing but apparently folding is making the existing problem worse. Ideas? Some of what my family does is for work related purposes so I either need to stop folding when they need to be online or see if stopping file transfers during their work hours fixes the problem.
  20. Yep. For all three instances i know the team number matches, my username matches, and I copy/pasted the new passkey which I made sure the folding username matched in the e-mail this time and I restarted all three servers so the client would load in the new passkey from the config file. If this doesn't fix it I can only guess to try disabling CPU and see what that does to GPU. Points per Watt GPU is better anyways by a multitude.
  21. Even if there is still something wrong I have no idea what to change or what I can change. My native client is putting out barely any more than my virtualized client so it doesn't seem like that's at fault. Native Linux is putting out more GPU on average than Windows virtualized but not the 900K advertised. So that sounds like a config issue but what config? There was a small bump after we hopefully corrected the passkey but the numbers still don't add up correctly on my end. I'm lost.
  22. Do each of my clients need to finish 10 WU's now before the bonus kicks in? I think that's gonna be at least a couple of days then before I start to see if this fix does anything.
  23. One server just got a new CPU job and we jumped from 400K to 1.8M. GPU's are gonna be a couple hours before their next jobs and other CPU is going to be almost midnight before it's done. Basically I should be able to come back to this tomorrow morning and have a better estimate of what I should be getting.
  24. Way ahead of you. Already swapped them out and rebooted all three clients. Just gonna watch them for a while and make sure they're all online. Re-reading the getpasskey board I see what they wanted now reading the text but FFS why do this to the inept people like me? Why not, IDFK this? I'm also looking to do a performance experiment. Tesla P4 native Linux vs vGPU P4-8Q VM Windows 10. Basically the two extreme ends of what theoretically would offer the best and worst performance. I want to see how big the gap is. Ideally the gap should be negligible. I suppose I will run this for the next couple of days and hope to see an improvement.
  25. Correct: Hello IRL-Name, Username: IRL-Name Passkey: whatever Is what I was using.
×