Posted April 16 The problem: F@H disabled both GPU slots and the GPUs aren't recognized by several programs. How I caused it: I tried to enable Coolbits following this Guide: https://www.techticity.com/howto/how-to-control-nvidia-graphics-card-fan-speed-in-linux/ After creating the 20-nvidia.conf file (with the contents from the guide) in etc/X11/xorg.conf.d, I rebooted, but the OS wouldn't load properly anymore (it got stuck at the wall of text that shows all the services and drivers starting), and eventually crashed, afterwards the PC wouldn't power on anymore and showed no signs of life. After resetting CMOS it powers on again, but the GPUs don't show any output, only the iGPU works (though the BIOS still tried outputting through GPU 1, I had to set the iGPU as the primary display to get an output at all [I pulled the GPUs for that]). What I tried: I reverted all the changes I made (deleted the 20-nvidia.conf file) and reinstalled F@H completely, as well as tried switching the GPUs around into all configurations imaginable, but the GPUs are still disabled, F@H does correctly recognize them though. NVIDIA X-Server Settings doesn't see the GPUs at all, neither does PSensor. Additional info: I am using the proprietary Nvidia driver, which worked fine until now. I noticed that when the system powers off, a text saying that Nvidia-Persistenced-Service failed pops up for a short time before the machine shuts off completely. Specs: CPU: i7 4790K RAM: 2x4GB DDR3-1600 MB: asus H87M-Pro GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: EVGA 750BR (750W) SSD: LITEON 256GB SATA SSD OS: Debian 12 F@H Client version: 7.6.21 Any ideas on how to fix this are greatly appreciated! English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 Did you try completely removing and reinstalling the Nvidia driver package? I sold my soul for ProSupport. Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 Author Just now, Needfuldoer said: Did you try completely removing and reinstalling the Nvidia driver package? Not yet, I'll try that once I'm back home. English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 The not powering on is a bit odd though. Even with borked Xorg/drivers it should still have booted you into text mode without crashing. If its any use, the settings on my desktop right now are: cat /etc/X11/xorg.conf.d/01-nvidia.conf Section "Monitor" # HorizSync source: edid, VertRefresh source: edid Identifier "Monitor0" VendorName "Unknown" ModelName "GBT M28U" HorizSync 246.0 - 246.0 VertRefresh 48.0 - 144.0 Option "DPMS" EndSection Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" Option "NoLogo" "1" Option "Coolbits" "12" EndSection Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 Option "Stereo" "0" Option "nvidiaXineramaInfoOrder" "DFP-3" Option "metamodes" "3840x2160_120 +0+0 {ForceCompositionPipeline=On, AllowGSYNCCompatible=On}" Option "SLI" "Off" Option "MultiGPU" "Off" Option "BaseMosaic" "off" SubSection "Display" Depth 24 EndSubSection EndSection I'm curious how were supposed to enable coolbits on Wayland, as this will be an issue very soon. Someone claims this works without coolbits: https://github.com/nan0s7/nfancurve Must say I'm dubious given I always used the nvidia-settings CLI and fairly sure coolbits was required. Router: Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up) Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7 Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 Author 12 minutes ago, Alex Atkin UK said: The not powering on is a bit odd though. Even with borked Xorg/drivers it should still have booted you into text mode without crashing. I have no idea what caused that, it just did absolutely nothing when pressing the power button, until I reset the CMOS with the jumper. After that, it rebootet a few times until it stayed on without giving a display output. English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 5 minutes ago, Average Nerd said: I have no idea what caused that, it just did absolutely nothing when pressing the power button, until I reset the CMOS with the jumper. Could the CMOS battery be failing? Router: Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up) Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7 Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 Author 5 minutes ago, Alex Atkin UK said: Could the CMOS battery be failing? I replaced it last year because it was flat, and it does retain settings with no problems, even when unplugged for longer periods of time (the computer boots just fine now, and it has never done that before). I'll check if it's still good once I'm home. English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 (edited) Author 5 hours ago, Needfuldoer said: Did you try completely removing and reinstalling the Nvidia driver package? I did now, it's still dead (it still says "[Failed] Nvidia-Persistenced-Service" during startup, trying to start it after boot doesn't work and just points me towards "syslog", which I can't find) I used the following command for removal: apt purge "*nvidia*" And this one for reinstallation: apt install nvidia-driver firmware-misc-nonfree Edited April 16 by Average Nerd English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 Author 3 hours ago, Alex Atkin UK said: I'm curious how were supposed to enable coolbits on Wayland, as this will be an issue very soon. According to the info section in the Settings, my system is on Wayland already, so might that have contributed? 3 hours ago, Alex Atkin UK said: Someone claims this works without coolbits: https://github.com/nan0s7/nfancurve I don't really care about custom fan control, my original plan was to lower the power limits of the GPUs because they made the whole story of the house I'm in insufferably hot. EDIT: The CMOS Battery is fine, it's at 3,29V. English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 5 hours ago, Average Nerd said: According to the info section in the Settings, my system is on Wayland already, so might that have contributed? I don't really care about custom fan control, my original plan was to lower the power limits of the GPUs because they made the whole story of the house I'm in insufferably hot. EDIT: The CMOS Battery is fine, it's at 3,29V. You can change the clock speeds from the CLI which kinda works better as rather than limiting the TDP you reduce the clock speed until its at a more efficient point on the voltage curve. Quote # this is likely what nvidia-persistanced is already doing sudo nvidia-smi -pm 1 # -i is card number starting from 0, -lgc is min_clock,max-clock in Mhz # 2400 is around optimal efficiency on 4000 series in my testing sudo nvidia-smi -i 0 -lgc 0,2400 You can then check power consumption by just issuing nvidia-smi without any parameters, or custom output like: nvidia-smi --query-gpu=gpu_bus_id,timestamp,driver_version,pcie.link.gen.current,pcie.link.width.current,temperature.gpu,fan.speed,utilization.gpu,utilization.memory,memory.used,memory.free,clocks.current.graphics,clocks.current.sm,clocks.current.memory,power.draw --format=csv,noheader,nounits I utilise this on my Folding@Home page. Its not as good as on Windows where you can actually change the voltage curve (OC Scanner in Afterburner), but useful none the less. Router: Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up) Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7 Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 (edited) Author 9 minutes ago, Alex Atkin UK said: You can change the clock speeds from the CLI which kinda works better as rather than limiting the TDP you reduce the clock speed until its at a more efficient point on the voltage curve. You can then check power consumption by just issuing nvidia-smi without any parameters, or custom output like: nvidia-smi --query gpu=gpu_bus_id,timestamp,driver_version,pcie.link.gen.current,pcie.link.width.current,temperature.gpu,fan.speed,utilization.gpu,utilization.memory,memory.used,memory.free,clocks.current.graphics,clocks.current.sm,clocks.current.memory,power.draw I utilise this on my Folding@Home page. Its not as good as on Windows where you can actually change the voltage curve (OC Scanner in Afterburner), but useful none the less. Okay, that's good to know for when the machine works again, Thanks! (Question: how do I determine a good clockspeed, are there some resources or is it just trial and error?) Edited April 16 by Average Nerd English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 16 1 hour ago, Average Nerd said: Okay, that's good to know for when the machine works again, Thanks! (Question: how do I determine a good clockspeed, are there some resources or is it just trial and error?) Trial and error. On Windows I was able to view the voltage curve and see where the voltage starts to really climb, it also shows where on the curve its currently running. On Linux I just kept tweaking it based on watts vs PPD until it hit somewhere I was happy with. Of course its best to do this with a higher scoring WU. The setting is lost between reboots and as it needs to run as root, I didn't automate it. I should probably get round to adding nvidia-smi to the sudoers file so I can do that. Router: Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up) Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7 Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Posted April 17 Author On 4/16/2024 at 1:22 PM, Needfuldoer said: Did you try completely removing and reinstalling the Nvidia driver package? I have completely reinstalled the OS, and the issue persists. I don't get it, both of the GPUs used to work in Linux, and they still work just fine in windows (tested both with furmark). English is not my first language, so please excuse any confusion or misunderstandings on my end. I like to edit my posts a lot. F@H-Stats The Folding rig: CPU: Core i7 4790K RAM: 16 8GB (2x4GB) DDR3-1600 GPU 1: RTX 2070 Super GPU 2: GTX 1060 3GB PSU: Gigabyte P450B EVGA 600BR EVGA 750BR OS: Windows 11 Home Linux let me down. .- -- --- --. ..- ... Hello! Link to comment Share on other sites More sharing options... Link to post Share on other sites More sharing options...
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now