Jump to content

Windows7ge

Member
  • Posts

    12,134
  • Joined

  • Last visited

Reputation Activity

  1. Like
    Windows7ge got a reaction from Gorgon in LTT Official Folding Month VI   
    Well, word through the grapevine says there's a second P4 headed my way for no reason in particular... :3
  2. Like
    Windows7ge reacted to Gorgon in LTT Official Folding Month VI   
    Typically running a single GPU will yield more PPD than 2 vGPUs because of the Quick Return Bonus so as long as the GPU is well utilized your better off just running bare metal and, as a bonus, you free up one CPU thread.
     
    Yes, You can just delete the CPU slot. You can run CPU or GPU or CPU+GPU. Just edit the config.xml or use the GUI
  3. Like
    Windows7ge reacted to TylerD321 in LTT Official Folding Month VI   
    My folding at home stats fluctuate just like that all the time without adjusting any settings at all.
  4. Funny
    Windows7ge got a reaction from HeroRareheart in Ubuntu wifi   
    The best problems are the ones that mysteriously fix themselves.
  5. Like
    Windows7ge got a reaction from Gorgon in LTT Official Folding Month VI   
    @leadeater We are exploring territories I genuinely never once thought imaginable. 😁
     
    Windows 10 22H2 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:41:39 GMT)  
    Ubuntu Server 22.04 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:52:8 GMT)  
    Both of them using one GPU:
    Wed Oct 25 23:18:26 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.06 Driver Version: 535.104.06 CUDA Version: N/A | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P4 On | 00000000:22:00.0 Off | 0 | | N/A 68C P0 53W / 75W | 7519MiB / 7680MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3124239 C+G vgpu 3744MiB | | 0 N/A N/A 3157034 C+G vgpu 3744MiB | +---------------------------------------------------------------------------------------+  
    I'm gonna keep an eye on this new Linux VM. Make sure jobs finish and I hope the average credit for both CPU & GPU go above that of it's Windows counterpart. If it does I'll ditch Windows and spin-up another Linux Server VM.

     
    I will keep this in mind. Thanks.
  6. Like
    Windows7ge got a reaction from IkeaGnome in LTT Official Folding Month VI   
    @leadeater We are exploring territories I genuinely never once thought imaginable. 😁
     
    Windows 10 22H2 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:41:39 GMT)  
    Ubuntu Server 22.04 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:52:8 GMT)  
    Both of them using one GPU:
    Wed Oct 25 23:18:26 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.06 Driver Version: 535.104.06 CUDA Version: N/A | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P4 On | 00000000:22:00.0 Off | 0 | | N/A 68C P0 53W / 75W | 7519MiB / 7680MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3124239 C+G vgpu 3744MiB | | 0 N/A N/A 3157034 C+G vgpu 3744MiB | +---------------------------------------------------------------------------------------+  
    I'm gonna keep an eye on this new Linux VM. Make sure jobs finish and I hope the average credit for both CPU & GPU go above that of it's Windows counterpart. If it does I'll ditch Windows and spin-up another Linux Server VM.

     
    I will keep this in mind. Thanks.
  7. Like
    Windows7ge got a reaction from leadeater in LTT Official Folding Month VI   
    @leadeater We are exploring territories I genuinely never once thought imaginable. 😁
     
    Windows 10 22H2 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:41:39 GMT)  
    Ubuntu Server 22.04 virtual machine:

     
    vGPU Software Licensed Product License Status : Licensed (Expiry: 2024-1-24 2:52:8 GMT)  
    Both of them using one GPU:
    Wed Oct 25 23:18:26 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.104.06 Driver Version: 535.104.06 CUDA Version: N/A | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla P4 On | 00000000:22:00.0 Off | 0 | | N/A 68C P0 53W / 75W | 7519MiB / 7680MiB | 98% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3124239 C+G vgpu 3744MiB | | 0 N/A N/A 3157034 C+G vgpu 3744MiB | +---------------------------------------------------------------------------------------+  
    I'm gonna keep an eye on this new Linux VM. Make sure jobs finish and I hope the average credit for both CPU & GPU go above that of it's Windows counterpart. If it does I'll ditch Windows and spin-up another Linux Server VM.

     
    I will keep this in mind. Thanks.
  8. Like
    Windows7ge reacted to justpoet in LTT Official Folding Month VI   
    Found it.  
     
  9. Like
    Windows7ge reacted to da na in How do I update BIOS using MS-DOS?   
    Excellent. I might first try XP and see if the BAT file will run within the built-in MS-DOS emulator. 
    Thanks for the FreeDOS suggestion though - only copy of DOS I have on hand is on six 5.25" disks. Not the most practical. 
  10. Like
    Windows7ge got a reaction from da na in How do I update BIOS using MS-DOS?   
    You might be able to use FreeDOS here if you don't already have a means to boot from MS-DOS.
     
    What are the contents of the F.bat file?
     
    It looks like AFUDOS.EXE is the firmware update program.
    C99Q3B23.ROM is the firmware image itself.
    I'm assumeing F.bat is the script that runs the update.
     
    So assuming all of these are in the same directory you'd just run the command: f.bat
     
    My guess without more context is it will run a command like: afudos.exe -h c99q3b23.rom
  11. Like
    Windows7ge got a reaction from IkeaGnome in LTT Official Folding Month VI   
    The number fluctuates wildly day to day but evidence suggests ~900k is what I'm averaging.
     
    I want to split it into two GRID P4-4Q vGPU's as I'm seeing a swaying back and forth of % utilization. I think running two project tasks at the same time would be better and on Linux at that if I can get it working.
     
    @leadeater Based on the numbers I've been seeing I'm getting the feeling my ~1.2mill PPD is "cute".
  12. Like
    Windows7ge got a reaction from IkeaGnome in LTT Official Folding Month VI   
    Since I have 64 threads to work with on one of my servers I think I'm going to opt to create two virtual machines and give each of them 32 threads w/ a GRID P4-4Q. Should cut the Tesla right down the middle and let me play with multiple platforms while utilizing more of the CPU.
  13. Agree
    Windows7ge got a reaction from Gorgon in LTT Official Folding Month VI   
    Chances are if you don't mess up any settings you'll see a small but noticeable performance gain running compute applications on Linux instead of Windows. I found this to be true for both CPU and GPU when on BOINC - the project WCG.
     
    That being said it's not as user friendly or full-proof as Windows. With the right permissions you can do things in Linux that instantly brick your install kind of like deleting System32. So think about familiarizing yourself with Linux in VM's and the like before jumping into swapping your bare metal OS.
     
    As someone new to GNU/Linux the *buntu's are a safe bet. There's Ubuntu, Lubuntu, Kubuntu, PopOS, Linux Mint. Some are built around being more plug'n'play than more obscure distros. I remember Peppermint was popular and I think still is. Among many other variants. Although each have their own under-the-hood differences your engagement with the OS will be mostly the GUI so pick whatever appeals to you the most as the desktop environments vary wildly. For CUTA/OpenGL applications you might want to make sure NVIDIA's proprietary drivers will install and run on whichever you pick though. For gaming on Linux PopOS is popular and should work for compute like F@H though I've yet to verify that so take it with a grain of salt.
     
    You can do a lot of crazy things with Linux like install the desktop environment of a different distro onto your system and run that instead of what came with your distro. Pretty cool. Like moving from a environment that resemble a Windows desktop to one that resembles MacOS.
  14. Like
    Windows7ge got a reaction from justpoet in LTT Official Folding Month VI   
    Actually while I'm here since leadeater so graciously dragged me into the conversation is this a folding month competition?
     
    I'm running my new (to me) Tesla P4 through some paces. Need to validate the long term reliability of the licensing server so I'm folding for team LTT right now. Can I still get in on the event or am I too late? I can compete for last place. 😆
  15. Funny
    Windows7ge got a reaction from leadeater in LTT Official Folding Month VI   
    @leadeater Alright I'll admit I have no idea what I'm talking about when it comes to NVIDIA graphics. I grew up with AMD. This Tesla P4 is only the 3rd NVIDIA card I've owned in over 15 years.
  16. Like
    Windows7ge reacted to leadeater in LTT Official Folding Month VI   
    Yep, but what may happen is you will get 100% utilization and the per task time might change from 1 hour to 1.2 hours. Just a note to do the math and make sure actually better 🙂
     
    It probably is with such a small time difference while doing more WUs in similar period of time. It would be really interesting to actually try it and see. I'm pretty sure you can get it working on the RTX 4090, ask @Windows7ge
  17. Like
    Windows7ge reacted to Diddlydennis in Safe to sell a BIOS locked laptop?   
    Tried but sadly no luck, but thanks anyway
  18. Like
    Windows7ge got a reaction from Diddlydennis in Safe to sell a BIOS locked laptop?   
    You get three tries then it shuts off and you have to power it on to try again but there's no absolute maximum number of tries. It's just a deterrent and time waster if you aren't meant to have access.
     
    X380 looks like it's probably too new for the resistor hack.
     
    It's not impossible to fix this issue still but it would require some tools and knowledge. I have the tools but I don't have the knowledge. I recovered a Lenovo X270 with the help of someone else and a EEPROM reader/writer. We basically read the BIOS back, modified it a little so it would glitch at the password screen, wrote the BIOS back onto the EEPROM then it just let us in and I reset the password by creating a new one and then removing it. Laptop's been fine since.
  19. Like
    Windows7ge got a reaction from Diddlydennis in Safe to sell a BIOS locked laptop?   
    Yes. More than anything that would make it more appealing towards Linux users who may need Secure Boot disabled anyhow.
  20. Informative
    Windows7ge got a reaction from Levent in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    PROXMOX is a fantastic free and open Linux KVM hypervisor (with the option of a subscription - not required) but it's not without it's caveats. If when you installed PROXMOX you opted to create a ZFS rpool for the OS be that a mirror (RAID1), striped mirror (RAID10) or any combination of parity (RAID50, 51, 60, 61) you will find the installer creates more than a ZFS partition on each disk. It creates two additional partitions these being a 1MB BIOS boot partition and a 512MB EFI boot partition.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part Now these p1 & p2 partitions play an important role in allowing the hypervisor to boot. This is known as the bootloader. Without it the BIOS has no way of knowing where the OS is even though it's there in it's entirety on p3.
     
    This is where ZFS has the caveat. If you're like me and you've suffered a few disk failures you'll be familiar with this output:
    pool: rpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J scan: scrub repaired 0B in 00:00:44 with 0 errors on Sun Jun 11 00:24:48 2023 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3 errors: No known data errors In this scenario one SSD in a mirror used to boot the server has failed.
     
    Now ZFS makes it pretty easy to restore this by just replacing the SSD and resilvering with the appropriate command but it will only restore partition #3.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part nvme3n1 259:6 0 447.1G 0 disk └─nvme0n1p3 259:9 0 118.7G 0 part At this stage ZFS is happy, it's hunky-dory, it's walking on sunshine...but what would happen if we were to lose nvme0n1? Well, everything would keep running fine. Until you shut the server down or rebooted. At that stage it doesn't matter that you had a ZFS backup of your boot drive ZFS didn't copy the bootloader. Your data is still there. It exists but you'd have to recover the bootloader partitions.
     
    To avoid having to do that what I want to demonstrate are the steps required to re-create and initialize the bootloader partitions with a real example.
     
    Before you resilver the array you'll want to run the commands:
    sgdisk /dev/nvme0n1 -R /dev/nvme3n1 sgdisk -G /dev/nvme3n1 Be careful to replicate any capitalized letters such as -R and -G as capitals can issue different operation to their lowercase counterparts.
     
    What this has done is taken a new disk from this:
    nvme3n1 259:0 0 447.1G 0 disk  
    To this:
    nvme3n1 259:0 0 447.1G 0 disk ├─nvme3n1p1 259:1 0 1007K 0 part ├─nvme3n1p2 259:2 0 512M 0 part └─nvme3n1p3 259:13 0 118.7G 0 part Now note all it has done is created/copied the partitions. We still need to re-write the partition data over to the new disk.
     
    From here we can start resilvering our ZFS array:
    zpool replace -f rpool 4849641676990992824 /dev/nvme3n1p3 And check-in on how the progress is doing:
    pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Oct 14 17:31:37 2023 37.3G scanned at 764M/s, 24.5G issued at 502M/s, 37.3G total 24.7G resilvered, 65.73% done, 00:00:26 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3/old nvme3n1p3 ONLINE 0 0 0 (resilvering) errors: No known data errors Once this completes that is p3 taken care of but p1 & p2 still aren't configured.
     
    To do that we first need to know weather PROXMOX is being booted with GRUB or SYSTEMD-BOOT. The easiest way to find out for sure is to know which boot menu you see when the server is turned on.
     
    One of these two screens should look familiar to you:
     

     

     
    If you see the blue screen then your server boots using GRUB.
     
    If you see a black screen then your server boots using SYSTEMD-BOOT.
     
    You may also check from the OS without rebooting using:
    efibootmgr -v  
    This will return a lot of information but you should see a line resembling one of the following:
    Boot0005* proxmox [...] File(\EFI\proxmox\grubx64.efi) Boot0006* Linux Boot Manager [...] File(\EFI\systemd\systemd-bootx64.efi) These should be self explanatory. If you see GRUB you are using GRUB. If you see SYSTEMD you're using SYSTEMD.
     
    From here you will use the corresponding command to configure your bootloader:
     
    GRUB:
    grub-install /dev/nvme3n1  
    SYSTEMD:
    proxmox-boot-tool format /dev/nvme3n1p2 proxmox-boot-tool init /dev/nvme3n1p2  
    And you're done.
     
    If you are so inclined as to test if this actually worked you can shutdown the server and remove the healthy drive you copied the boot data from. Turn the server on and see if it boots from the new drive you just formatted and copied all the system data to. If everything went well the server should boot as normal.
     
    I hope this helped.
  21. Like
    Windows7ge got a reaction from AbydosOne in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    PROXMOX is a fantastic free and open Linux KVM hypervisor (with the option of a subscription - not required) but it's not without it's caveats. If when you installed PROXMOX you opted to create a ZFS rpool for the OS be that a mirror (RAID1), striped mirror (RAID10) or any combination of parity (RAID50, 51, 60, 61) you will find the installer creates more than a ZFS partition on each disk. It creates two additional partitions these being a 1MB BIOS boot partition and a 512MB EFI boot partition.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part Now these p1 & p2 partitions play an important role in allowing the hypervisor to boot. This is known as the bootloader. Without it the BIOS has no way of knowing where the OS is even though it's there in it's entirety on p3.
     
    This is where ZFS has the caveat. If you're like me and you've suffered a few disk failures you'll be familiar with this output:
    pool: rpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J scan: scrub repaired 0B in 00:00:44 with 0 errors on Sun Jun 11 00:24:48 2023 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3 errors: No known data errors In this scenario one SSD in a mirror used to boot the server has failed.
     
    Now ZFS makes it pretty easy to restore this by just replacing the SSD and resilvering with the appropriate command but it will only restore partition #3.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part nvme3n1 259:6 0 447.1G 0 disk └─nvme0n1p3 259:9 0 118.7G 0 part At this stage ZFS is happy, it's hunky-dory, it's walking on sunshine...but what would happen if we were to lose nvme0n1? Well, everything would keep running fine. Until you shut the server down or rebooted. At that stage it doesn't matter that you had a ZFS backup of your boot drive ZFS didn't copy the bootloader. Your data is still there. It exists but you'd have to recover the bootloader partitions.
     
    To avoid having to do that what I want to demonstrate are the steps required to re-create and initialize the bootloader partitions with a real example.
     
    Before you resilver the array you'll want to run the commands:
    sgdisk /dev/nvme0n1 -R /dev/nvme3n1 sgdisk -G /dev/nvme3n1 Be careful to replicate any capitalized letters such as -R and -G as capitals can issue different operation to their lowercase counterparts.
     
    What this has done is taken a new disk from this:
    nvme3n1 259:0 0 447.1G 0 disk  
    To this:
    nvme3n1 259:0 0 447.1G 0 disk ├─nvme3n1p1 259:1 0 1007K 0 part ├─nvme3n1p2 259:2 0 512M 0 part └─nvme3n1p3 259:13 0 118.7G 0 part Now note all it has done is created/copied the partitions. We still need to re-write the partition data over to the new disk.
     
    From here we can start resilvering our ZFS array:
    zpool replace -f rpool 4849641676990992824 /dev/nvme3n1p3 And check-in on how the progress is doing:
    pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Oct 14 17:31:37 2023 37.3G scanned at 764M/s, 24.5G issued at 502M/s, 37.3G total 24.7G resilvered, 65.73% done, 00:00:26 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3/old nvme3n1p3 ONLINE 0 0 0 (resilvering) errors: No known data errors Once this completes that is p3 taken care of but p1 & p2 still aren't configured.
     
    To do that we first need to know weather PROXMOX is being booted with GRUB or SYSTEMD-BOOT. The easiest way to find out for sure is to know which boot menu you see when the server is turned on.
     
    One of these two screens should look familiar to you:
     

     

     
    If you see the blue screen then your server boots using GRUB.
     
    If you see a black screen then your server boots using SYSTEMD-BOOT.
     
    You may also check from the OS without rebooting using:
    efibootmgr -v  
    This will return a lot of information but you should see a line resembling one of the following:
    Boot0005* proxmox [...] File(\EFI\proxmox\grubx64.efi) Boot0006* Linux Boot Manager [...] File(\EFI\systemd\systemd-bootx64.efi) These should be self explanatory. If you see GRUB you are using GRUB. If you see SYSTEMD you're using SYSTEMD.
     
    From here you will use the corresponding command to configure your bootloader:
     
    GRUB:
    grub-install /dev/nvme3n1  
    SYSTEMD:
    proxmox-boot-tool format /dev/nvme3n1p2 proxmox-boot-tool init /dev/nvme3n1p2  
    And you're done.
     
    If you are so inclined as to test if this actually worked you can shutdown the server and remove the healthy drive you copied the boot data from. Turn the server on and see if it boots from the new drive you just formatted and copied all the system data to. If everything went well the server should boot as normal.
     
    I hope this helped.
  22. Like
    Windows7ge got a reaction from da na in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    PROXMOX is a fantastic free and open Linux KVM hypervisor (with the option of a subscription - not required) but it's not without it's caveats. If when you installed PROXMOX you opted to create a ZFS rpool for the OS be that a mirror (RAID1), striped mirror (RAID10) or any combination of parity (RAID50, 51, 60, 61) you will find the installer creates more than a ZFS partition on each disk. It creates two additional partitions these being a 1MB BIOS boot partition and a 512MB EFI boot partition.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part Now these p1 & p2 partitions play an important role in allowing the hypervisor to boot. This is known as the bootloader. Without it the BIOS has no way of knowing where the OS is even though it's there in it's entirety on p3.
     
    This is where ZFS has the caveat. If you're like me and you've suffered a few disk failures you'll be familiar with this output:
    pool: rpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J scan: scrub repaired 0B in 00:00:44 with 0 errors on Sun Jun 11 00:24:48 2023 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3 errors: No known data errors In this scenario one SSD in a mirror used to boot the server has failed.
     
    Now ZFS makes it pretty easy to restore this by just replacing the SSD and resilvering with the appropriate command but it will only restore partition #3.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part nvme3n1 259:6 0 447.1G 0 disk └─nvme0n1p3 259:9 0 118.7G 0 part At this stage ZFS is happy, it's hunky-dory, it's walking on sunshine...but what would happen if we were to lose nvme0n1? Well, everything would keep running fine. Until you shut the server down or rebooted. At that stage it doesn't matter that you had a ZFS backup of your boot drive ZFS didn't copy the bootloader. Your data is still there. It exists but you'd have to recover the bootloader partitions.
     
    To avoid having to do that what I want to demonstrate are the steps required to re-create and initialize the bootloader partitions with a real example.
     
    Before you resilver the array you'll want to run the commands:
    sgdisk /dev/nvme0n1 -R /dev/nvme3n1 sgdisk -G /dev/nvme3n1 Be careful to replicate any capitalized letters such as -R and -G as capitals can issue different operation to their lowercase counterparts.
     
    What this has done is taken a new disk from this:
    nvme3n1 259:0 0 447.1G 0 disk  
    To this:
    nvme3n1 259:0 0 447.1G 0 disk ├─nvme3n1p1 259:1 0 1007K 0 part ├─nvme3n1p2 259:2 0 512M 0 part └─nvme3n1p3 259:13 0 118.7G 0 part Now note all it has done is created/copied the partitions. We still need to re-write the partition data over to the new disk.
     
    From here we can start resilvering our ZFS array:
    zpool replace -f rpool 4849641676990992824 /dev/nvme3n1p3 And check-in on how the progress is doing:
    pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Oct 14 17:31:37 2023 37.3G scanned at 764M/s, 24.5G issued at 502M/s, 37.3G total 24.7G resilvered, 65.73% done, 00:00:26 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3/old nvme3n1p3 ONLINE 0 0 0 (resilvering) errors: No known data errors Once this completes that is p3 taken care of but p1 & p2 still aren't configured.
     
    To do that we first need to know weather PROXMOX is being booted with GRUB or SYSTEMD-BOOT. The easiest way to find out for sure is to know which boot menu you see when the server is turned on.
     
    One of these two screens should look familiar to you:
     

     

     
    If you see the blue screen then your server boots using GRUB.
     
    If you see a black screen then your server boots using SYSTEMD-BOOT.
     
    You may also check from the OS without rebooting using:
    efibootmgr -v  
    This will return a lot of information but you should see a line resembling one of the following:
    Boot0005* proxmox [...] File(\EFI\proxmox\grubx64.efi) Boot0006* Linux Boot Manager [...] File(\EFI\systemd\systemd-bootx64.efi) These should be self explanatory. If you see GRUB you are using GRUB. If you see SYSTEMD you're using SYSTEMD.
     
    From here you will use the corresponding command to configure your bootloader:
     
    GRUB:
    grub-install /dev/nvme3n1  
    SYSTEMD:
    proxmox-boot-tool format /dev/nvme3n1p2 proxmox-boot-tool init /dev/nvme3n1p2  
    And you're done.
     
    If you are so inclined as to test if this actually worked you can shutdown the server and remove the healthy drive you copied the boot data from. Turn the server on and see if it boots from the new drive you just formatted and copied all the system data to. If everything went well the server should boot as normal.
     
    I hope this helped.
  23. Like
    Windows7ge got a reaction from da na in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    I needed this information more than once now so I'm just spreading the good word that there's a solution and it's not too painful just annoying to memorize.
  24. Like
    Windows7ge got a reaction from Crunchy Dragon in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    I needed this information more than once now so I'm just spreading the good word that there's a solution and it's not too painful just annoying to memorize.
  25. Like
    Windows7ge got a reaction from Crunchy Dragon in PROXMOX - Rebuilding ZFS RAID rpool After Disk Failure   
    PROXMOX is a fantastic free and open Linux KVM hypervisor (with the option of a subscription - not required) but it's not without it's caveats. If when you installed PROXMOX you opted to create a ZFS rpool for the OS be that a mirror (RAID1), striped mirror (RAID10) or any combination of parity (RAID50, 51, 60, 61) you will find the installer creates more than a ZFS partition on each disk. It creates two additional partitions these being a 1MB BIOS boot partition and a 512MB EFI boot partition.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part Now these p1 & p2 partitions play an important role in allowing the hypervisor to boot. This is known as the bootloader. Without it the BIOS has no way of knowing where the OS is even though it's there in it's entirety on p3.
     
    This is where ZFS has the caveat. If you're like me and you've suffered a few disk failures you'll be familiar with this output:
    pool: rpool state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J scan: scrub repaired 0B in 00:00:44 with 0 errors on Sun Jun 11 00:24:48 2023 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3 errors: No known data errors In this scenario one SSD in a mirror used to boot the server has failed.
     
    Now ZFS makes it pretty easy to restore this by just replacing the SSD and resilvering with the appropriate command but it will only restore partition #3.
    nvme0n1 259:6 0 447.1G 0 disk ├─nvme0n1p1 259:7 0 1007K 0 part ├─nvme0n1p2 259:8 0 512M 0 part └─nvme0n1p3 259:9 0 118.7G 0 part nvme3n1 259:6 0 447.1G 0 disk └─nvme0n1p3 259:9 0 118.7G 0 part At this stage ZFS is happy, it's hunky-dory, it's walking on sunshine...but what would happen if we were to lose nvme0n1? Well, everything would keep running fine. Until you shut the server down or rebooted. At that stage it doesn't matter that you had a ZFS backup of your boot drive ZFS didn't copy the bootloader. Your data is still there. It exists but you'd have to recover the bootloader partitions.
     
    To avoid having to do that what I want to demonstrate are the steps required to re-create and initialize the bootloader partitions with a real example.
     
    Before you resilver the array you'll want to run the commands:
    sgdisk /dev/nvme0n1 -R /dev/nvme3n1 sgdisk -G /dev/nvme3n1 Be careful to replicate any capitalized letters such as -R and -G as capitals can issue different operation to their lowercase counterparts.
     
    What this has done is taken a new disk from this:
    nvme3n1 259:0 0 447.1G 0 disk  
    To this:
    nvme3n1 259:0 0 447.1G 0 disk ├─nvme3n1p1 259:1 0 1007K 0 part ├─nvme3n1p2 259:2 0 512M 0 part └─nvme3n1p3 259:13 0 118.7G 0 part Now note all it has done is created/copied the partitions. We still need to re-write the partition data over to the new disk.
     
    From here we can start resilvering our ZFS array:
    zpool replace -f rpool 4849641676990992824 /dev/nvme3n1p3 And check-in on how the progress is doing:
    pool: rpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Oct 14 17:31:37 2023 37.3G scanned at 764M/s, 24.5G issued at 502M/s, 37.3G total 24.7G resilvered, 65.73% done, 00:00:26 to go config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 nvme0n1p3 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 4849641676990992824 UNAVAIL 0 0 0 was /dev/nvme3n1p3/old nvme3n1p3 ONLINE 0 0 0 (resilvering) errors: No known data errors Once this completes that is p3 taken care of but p1 & p2 still aren't configured.
     
    To do that we first need to know weather PROXMOX is being booted with GRUB or SYSTEMD-BOOT. The easiest way to find out for sure is to know which boot menu you see when the server is turned on.
     
    One of these two screens should look familiar to you:
     

     

     
    If you see the blue screen then your server boots using GRUB.
     
    If you see a black screen then your server boots using SYSTEMD-BOOT.
     
    You may also check from the OS without rebooting using:
    efibootmgr -v  
    This will return a lot of information but you should see a line resembling one of the following:
    Boot0005* proxmox [...] File(\EFI\proxmox\grubx64.efi) Boot0006* Linux Boot Manager [...] File(\EFI\systemd\systemd-bootx64.efi) These should be self explanatory. If you see GRUB you are using GRUB. If you see SYSTEMD you're using SYSTEMD.
     
    From here you will use the corresponding command to configure your bootloader:
     
    GRUB:
    grub-install /dev/nvme3n1  
    SYSTEMD:
    proxmox-boot-tool format /dev/nvme3n1p2 proxmox-boot-tool init /dev/nvme3n1p2  
    And you're done.
     
    If you are so inclined as to test if this actually worked you can shutdown the server and remove the healthy drive you copied the boot data from. Turn the server on and see if it boots from the new drive you just formatted and copied all the system data to. If everything went well the server should boot as normal.
     
    I hope this helped.
×