Jump to content

BAD_WORK_UNIT on fresh PopOS! install w/ OpenCL Drivers

Go to solution Solved by MSMSMSM,

Courtesy of @Metallus97

 

This post was initially to do with my install on Ubuntu 18.04 LTS, I've switched to PopOS and the post has been modified to reflect that

 

Here's the configuration script that I've written to setup my Folding@Home instance

# Step 0: Add prerequisite repositories (some installs have them disabled by default)
sudo apt-get install software-properties-common -y
sudo add-apt-repository universe -y
sudo add-apt-repository multiverse -y 

# Step 1: Download all the things we need
wget http://archive.ubuntu.com/ubuntu/pool/universe/p/python-support/python-support_1.0.15_all.deb
wget https://download.foldingathome.org/releases/public/release/fahclient/debian-stable-64bit/v7.5/fahclient_7.5.1_amd64.deb
wget https://download.foldingathome.org/releases/public/release/fahcontrol/debian-stable-64bit/v7.5/fahcontrol_7.5.1-1_all.deb
wget https://download.foldingathome.org/releases/public/release/fahviewer/debian-stable-64bit/v7.5/fahviewer_7.5.1_amd64.deb

# Step 2: Remove existing OpenCL libraries
sudo apt-get update && sudo apt-get upgrade -y
sudo apt-get remove --purge ocl-icd-opencl-dev clinfo ocl-icd-* opencl-headers -y
sudo apt autoremove

# This step is a bit too dangerous, let's not do this
# sudo rm /usr/lib/libOpenCL*

# Step 3: Install Dependencies
sudo apt-get install python-minimal python-gnome2 freeglut3 libnuma-dev -y
sudo dpkg -i python-support_1.0.15_all.deb

# Step 4: Install Folding@Home
sudo dpkg -i fahcontrol_7.5.1-1_all.deb fahviewer_7.5.1_amd64.deb fahclient_7.5.1_amd64.deb
sudo update-rc.d FAHClient defaults
sudo apt-get install -f

# Step 5: Install ROCm
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add
echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main" | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update
sudo apt install rocm-dkms -y
sudo usermod -a -G video $LOGNAME
echo "ADD_EXTRA_GROUPS=1" | sudo tee -a /etc/adduser.conf
echo "EXTRA_GROUPS=video" | sudo tee -a /etc/adduser.conf
echo "export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64" | sudo tee -a /etc/profile.d/rocm.sh

# Step 6: Ensure that ROCm libraries are accessible to Folding@Home
sudo ln -s /opt/rocm/opencl/lib/x86_64/libOpenCL.so /usr/lib/libOpenCL.so
sudo ln -s /opt/rocm/opencl/lib/x86_64/libOpenCL.so.1 /usr/lib/libOpenCL.so.1

# Step 7: Say goodbye
echo "Goodbye! :)"

Here's the /opt/rocm/opencl/bin/x86_64/clinfo output after running my script

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3084.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Ellesmere [Radeon RX 470/480/570/570X/580/580X]
  Device Topology:				 PCI[ B#41, D#0, F#0 ]
  Max compute units:				 36
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1306Mhz
  Address bits:					 64
  Max memory allocation:			 7301444403
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 2048
  Max image 3D height:				 2048
  Max image 3D depth:				 2048
  Max samplers within kernel:			 26591
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 8589934592
  Constant buffer size:				 7301444403
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 3006477107
  Max global variable size:			 7301444403
  Max global variable preferred total size:	 8589934592
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7fc4e50c5d30
  Name:						 gfx803
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 3084.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2 
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program 

And here is what sudo FAHClient has to say

05:50:19:WU01:FS01:0x22:Project: 14533 (Run 0, Clone 2369, Gen 0)
05:50:19:WU01:FS01:0x22:Unit: 0x0000000180fccb025e72f2048b14ac2f
05:50:19:WU01:FS01:0x22:Reading tar file core.xml
05:50:19:WU01:FS01:0x22:Reading tar file integrator.xml
05:50:19:WU01:FS01:0x22:Reading tar file state.xml
05:50:19:WU01:FS01:0x22:Reading tar file system.xml
05:50:19:WU01:FS01:0x22:Digital signatures verified
05:50:19:WU01:FS01:0x22:Folding@home GPU Core22 Folding@home Core
05:50:19:WU01:FS01:0x22:Version 0.0.2
05:50:40:WU01:FS01:0x22:Caught signal SIGABRT(6) on PID 4257
05:50:40:WU01:FS01:0x22:WARNING:Unexpected exit from science code
05:50:40:WU01:FS01:0x22:Saving result file ../logfile_01.txt
05:50:40:WU01:FS01:0x22:Saving result file science.log
05:50:40:WU01:FS01:0x22:Folding@home Core Shutdown: BAD_WORK_UNIT
05:54:00:WU01:FS01:0x22:Caught signal SIGINT(2) on PID 4257
05:54:00:WU01:FS01:0x22:Exiting, please wait. . .

EDIT: Switched from ROCm to bleeding edge OpenCL drivers

 

Link to comment
Share on other sites

Link to post
Share on other sites

im not a linux expert so i cant comment on that, but there is a lot of new users for the covid-19 projects. regardless of if youre contributing to those or not, i think that youd still be affected. a lot of people are reporting f@h not working at all, regardless of their project selection

Either @piratemonkey or quote me when responding to me. I won't see otherwise

Put a reaction on my post if I helped

My privacy guide | Why my name is piratemonkey PSU Tier List Motherboard VRM Tier List

What I say is from experience and the internet, and may not be 100% correct

Link to comment
Share on other sites

Link to post
Share on other sites

Not getting a work unit will be due to the number of people folding exceeding the limitations of the current system (They are working on adding more capacity). I don't know anything about linux but I have gotten some bad work units in the past. will start a wu sometimes manage to get a few % in and then stops and goes back to downloading. prob best to let it run for a while and see if the next work unit dose the same thing or if it folds correctly. 

Intel 7600K Over Clocked to 4.8 jibahurtz, GTX 1080 Founders Edition space heater, Cooler Master 212 Evo jet engine, 8GB DDR4 Ballistic Ram, 250GB hyperX ssd, Fractel Design Define S, Blue snowball on an arm mount, mismatched god monitors, dual keyboards, LG mouse (no vital signs, Think it died.)

Link to comment
Share on other sites

Link to post
Share on other sites

@unholyprfectionI've been assigned multiple unique work units and all of them have been discarded as bad work units, I'm starting to think it's a configuration problem

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, SpicyMustard said:

@unholyprfectionI've been assigned multiple unique work units and all of them have been discarded was bad work units, I'm starting to think it's a configuration problem

Yea if they are all failing then then you are probably correct. Unfortunately I don't know enough about linux to offer any solutions but a number of other people on the forum run linux for folding, hopefully one of them has seen the issue before and will have an idea of what could fix it.

Intel 7600K Over Clocked to 4.8 jibahurtz, GTX 1080 Founders Edition space heater, Cooler Master 212 Evo jet engine, 8GB DDR4 Ballistic Ram, 250GB hyperX ssd, Fractel Design Define S, Blue snowball on an arm mount, mismatched god monitors, dual keyboards, LG mouse (no vital signs, Think it died.)

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, SpicyMustard said:

@Whaler_99 @Metallus97 @NodCommander @Gorgon, could you please take a look? Any clue? Got AMD drivers to behave, but also - not actually do anything

YESS!

 

So I basically have the same problem on Ubuntu 19.10 or everything else Debian. There are reports of it working for Ubuntu 18.04.1 but I never did get it to work. Not the AMD stuff and also not ROCm. Well ROCm installed ok but the OpenCL implementation in the official and AMD drivers is kind of broken...

AMD does only do releases for Ubuntu LTS versions. Therefore I hope that 20.04 support is coming soon. Then Ill give it a new try.

Some further reading:

https://github.com/RadeonOpenCompute/ROCm/issues/575

 

Maybe @Windows7ge can help here. He did give some nice tips in the mentioned thread. Helped me but didn't fix the problem. Drivers installed but OpenCL is still broken, leading to the exact same errors you got there. OpenMM wont ruin without OpenCl

 

FOLDING MONTH 2021! GOGOGO and save on some heating costs 🙂

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Metallus97, since the problem seems to be AMD's OpenCL implementation, here's the cleanup script I wrote

 

# Step 1: Cleanup ROCm drivers
sudo rm /etc/profile.d/rocm.sh /etc/apt/sources.list.d/rocm.list
sudo apt remove --purge rocm-dkms -y
sudo rm /usr/lib/libOpenCL.so*
sudo apt autoremove
sudo rm /opt/rocm-3.1.0 -rf
sudo apt update && sudo apt upgrade -y && sudo apt autoremove -y && sudo apt autoclean

# Step 2: Install bleeding edge open source drivers
sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt update && sudo apt upgrade
sudo apt install --reinstall xserver-xorg-video-amdgpu -y
sudo dpkg --configure -a
sudo dpkg-reconfigure gdm3 xserver-xorg-video-amdgpu
sudo apt-get install mesa-vdpau-drivers ocl-icd-opencl-dev opencl-headers ocl-icd-* clinfo -y

# Step 3: Link them for Folding@Home
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so /usr/lib/libOpenCL.so
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 /usr/lib/libOpenCL.so.1
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so.1.0.0
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenGL.so /usr/lib/libOpenGL.so
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 /usr/lib/libOpenGL.so.0
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenGL.so.0.0.0 /usr/lib/libOpenGL.so.0.0.0

# Step 4: Say Goodbye
echo "Goodbye! :)"

Here's what I got

12:25:30:******************************* System ********************************
12:25:30:        CPU: AMD Ryzen 7 1800X Eight-Core Processor
12:25:30:     CPU ID: AuthenticAMD Family 23 Model 1 Stepping 1
12:25:30:       CPUs: 16
12:25:30:     Memory: 31.37GiB
12:25:30:Free Memory: 29.44GiB
12:25:30:    Threads: POSIX_THREADS
12:25:30: OS Version: 5.3
12:25:30:Has Battery: false
12:25:30: On Battery: false
12:25:30: UTC Offset: 5
12:25:30:        PID: 3125
12:25:30:        CWD: /home/username
12:25:30:         OS: Linux 5.3.0-7642-generic x86_64
12:25:30:    OS Arch: AMD64
12:25:30:       GPUs: 1
12:25:30:      GPU 0: Bus:41 Slot:0 Func:0 AMD:5 Ellesmere XT [Radeon RX 470/480/570/580]
12:25:30:       CUDA: Not detected: Failed to open dynamic library 'libcuda.so':
12:25:30:             libcuda.so: cannot open shared object file: No such file or
12:25:30:             directory
12:25:30:     OpenCL: Not detected: clGetPlatformIDs() returned -1001
12:25:30:***********************************************************************

:(

 

Link to comment
Share on other sites

Link to post
Share on other sites

I personally never got the ROCm drivers working. BOINC wasn't having it.

 

I got the AMD drivers w/ OpenCL working on both Ubuntu 18.04.3 & 19.04 for BOINC:

 

opencl.thumb.png.df7d5b592380d35dae068326c2cfb113.png

 

It was a PITA though. I believe it should be possible from PopOS but I wouldn't gamble on installing any AMD drivers unless you're sure you don't have anything to lose on your install but through my testing of installing drivers I have bricked my install multiple times.

 

If this isn't a daily driver of yours where you want PopOS I can tell you 18.04.3 will work with the current 18.04.3 AMD driver if you downgrade the kernel to 4.15.0-88-generic as the latest kernel is incompatible with the driver currently.

 

I'm going to work on testing ROCm once more in something like Ubuntu Server in a VM with GPU pass-through. See where it takes me.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, SpicyMustard said:

@Whaler_99 @Metallus97 @NodCommander @Gorgon, could you please take a look? Any clue? Got AMD drivers to behave, but also - not actually do anything

Sorry - work is actually really busy for me with this COVID thing so I'm not actually folding now as I'm too busy to babysit my rigs with all the server-side issues.

 

I also don't have an AMD GPU but might be buying a 5700XT with all the overtime I'm working.

FaH BOINC HfM

Bifrost - 6 GPU Folding Rig  Linux Folding HOWTO Folding Remote Access Folding GPU Profiling ToU Scheduling UPS

Systems:

desktop: Lian-Li O11 Air Mini; Asus ProArt x670 WiFi; Ryzen 9 7950x; EVGA 240 CLC; 4 x 32GB DDR5-5600; 2 x Samsung 980 Pro 500GB PCIe3 NVMe; 2 x 8TB NAS; AMD FirePro W4100; MSI 4070 Ti Super Ventus 2; Corsair SF750

nas1: Fractal Node 804; SuperMicro X10sl7-f; Xeon e3-1231v3; 4 x 8GB DDR3-1666 ECC; 2 x 250GB Samsung EVO Pro SSD; 7 x 4TB Seagate NAS; Corsair HX650i

nas2: Synology DS-123j; 2 x 6TB WD Red Plus NAS

nas3: Synology DS-224+; 2 x 12TB Seagate NAS

dcn01: Fractal Meshify S2; Gigabyte Aorus ax570 Master; Ryzen 9 5900x; Noctua NH-D15; 4 x 16GB DDR4-3200; 512GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750Mx

dcn02: Fractal Meshify S2; Gigabyte ax570 Pro WiFi; Ryzen 9 3950x; Noctua NH-D15; 2 x 16GB DDR4-3200; 128GB NVMe; 2 x Zotac AMP 4070ti; Corsair RM750x

dcn03: Fractal Meshify C; Gigabyte Aorus z370 Gaming 5; i9-9900k; BeQuiet! PureRock 2 Black; 2 x 8GB DDR4-2400; 128GB SATA m.2; MSI 4070 Ti Super Gaming X; MSI 4070 Ti Super Ventus 2; Corsair TX650m

dcn05: Fractal Define S; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SATA NVMe; Gigabyte Gaming RTX 4080 Super; Corsair TX750m

dcn06: Fractal Focus G Mini; Gigabyte Aorus b450m; Ryzen 7 2700; AMD Wraith; 2 x 8GB DDR 4-3200; 128GB SSD; Gigabyte Gaming RTX 4080 Super; Corsair CX650m

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Windows7ge said:

I personally never got the ROCm drivers working. BOINC wasn't having it.

 

I got the AMD drivers w/ OpenCL working on both Ubuntu 18.04.3 & 19.04 for BOINC:

 

opencl.thumb.png.df7d5b592380d35dae068326c2cfb113.png

 

It was a PITA though. I believe it should be possible from PopOS but I wouldn't gamble on installing any AMD drivers unless you're sure you don't have anything to lose on your install but through my testing of installing drivers I have bricked my install multiple times.

 

If this isn't a daily driver of yours where you want PopOS I can tell you 18.04.3 will work with the current 18.04.3 AMD driver if you downgrade the kernel to 4.15.0-88-generic as the latest kernel is incompatible with the driver currently.

 

I'm going to work on testing ROCm once more in something like Ubuntu Server in a VM with GPU pass-through. See where it takes me.

Wellll then... how to downgrade a kernel ?:D

I am tempted to just hold tight untill 20.04 and install the AMD driver then and see how that goes 

FOLDING MONTH 2021! GOGOGO and save on some heating costs 🙂

 

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, Metallus97 said:

Wellll then... how to downgrade a kernel ?:D

I am tempted to just hold tight untill 20.04 and install the AMD driver then and see how that goes 

Programs like UKUU. How to make the system run using the older kernel I haven't figured out yet because installing it doesn't seem to switch with the running kernel after a reboot.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, Windows7ge said:

I personally never got the ROCm drivers working. BOINC wasn't having it.

 

I got the AMD drivers w/ OpenCL working on both Ubuntu 18.04.3 & 19.04 for BOINC:

 

opencl.thumb.png.df7d5b592380d35dae068326c2cfb113.png

 

It was a PITA though. I believe it should be possible from PopOS but I wouldn't gamble on installing any AMD drivers unless you're sure you don't have anything to lose on your install but through my testing of installing drivers I have bricked my install multiple times.

 

If this isn't a daily driver of yours where you want PopOS I can tell you 18.04.3 will work with the current 18.04.3 AMD driver if you downgrade the kernel to 4.15.0-88-generic as the latest kernel is incompatible with the driver currently.

 

I'm going to work on testing ROCm once more in something like Ubuntu Server in a VM with GPU pass-through. See where it takes me.

Would this sidestep the issue that @Metallus97 had highlighted (https://github.com/RadeonOpenCompute/ROCm/issues/575)

 

I'm willing to give it a second shot but I've spent more time fixing this then actually giving my CPU time to FAH, so I've gone back to my macOS install (Hackintosh Ryzen is surprisingly good), it's just that Apple's OpenCL drivers don't behave with OpenMM, so it's just CPU for me from here on...

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, SpicyMustard said:

Would this sidestep the issue that @Metallus97 had highlighted (https://github.com/RadeonOpenCompute/ROCm/issues/575)

 

I'm willing to give it a second shot but I've spent more time fixing this then actually giving my CPU time to FAH, so I've gone back to my macOS install (Hackintosh Ryzen is surprisingly good), it's just that Apple's OpenCL drivers don't behave with OpenMM, so it's just CPU for me from here on...

I cannot confirm that it will be the workaround you need as I've never folded before but for BOINC it was the fix I required when ROCm didn't work. There's a chance it'll be the fix for folding as well.

Link to comment
Share on other sites

Link to post
Share on other sites

GUIIIIIZ I GOT IT! Manjaro + amdgpu drivers!

Will write more when I have time later today!

FOLDING MONTH 2021! GOGOGO and save on some heating costs 🙂

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×