Jump to content

Folding At Home GPU MINING RIG - 18 TESLA K20M's - a little help needed.

DF_CA

>>> ISSUE - WILL NOT POST WITH MORE THAN TWO NVIDIA TESLA K20M's on ASUS MINING EXPERT MOBO<<<<<<

 

I saw the LTT FAH vid last year and immediately jumped on the band wagon. I set up the classic 18 slot PCI-E Riser with the ASUS B250 MINING EXPERT MOBO.

Please refrain from sniping on the the K20's - I paid NOTHING for them.  That would be off-topic and unproductive and a rabbit hole. Besides I can't buy much in the current climate. My goal is to get this running solidly and rotate "end of life" mining GPU's from the free market whenever those transitions occur.

 

Here's the issue:

In short I got it up and running flawlessly with two K20's. I can move the two K20's to any slots and it continues to works just fine. (So MOBO, RISERS, POWER, COOLING - all validated.)

However - when I add a third K20 it will not post at all - no screen; no beeps. (Removing K20 #3 - back to booting; re-install #3 - failing. Swap 3rd with second K20 and repeat - same - not a bad third K20.) The ASUS Q-Code that comes up when in three card mode is 0x96 which decodes to "PCI Bus Assign Resources". 

My thinking is PCI allocation is either Memory, IO, Interrupts or power. Plenty of power. Memory is set to >4G (and two work) so that does not seem like an issue. IO and Interrupts should never be an issue in PCI-E.

On a lark I grabbed four crappy ancient(er) NVS300 GPU's. (My possibly flawed logic here was to grab four boards worth of resources OTHER than memory. THE NVS's were 512MB per card whereas the K20's are 5GB.)  Further these NVS' were pre-UEFI so that eliminates that UEFI / CSM required BIOS question, I believe.  The four NVS' posted, booted and worked just fine. This should indicate PCI-E Resource-wise it is only a MEMORY issue. Yes/No?

 

Anyway - I feel I am either very close or this is hopeless/impossible. Or there is a OPTROM/VBIOS MOD needed.  It chugs along happily with FAH and two K20's but if I can get it fully running its a good thing for the cause.

 

Folding At Homes progress is just stunning BTW. When CURECOIN kicked in when Crypto declined, the numbers went stratospheric. No of course with crypto returning to profitability all those rigs have left the "fold". LTT is topping the charts! I overheard a researcher stating that with that type of modelling/compute power and the introduction of AI it is conceivable to let an algorithm "trial and error" to find high probability vectors that can be fed back to the scientists for analysis and augmentation. It's a new world. And of course they jumped on COVID and saw incredible results too. And Parkinsons.

 

I digress - Any constructive suggestions appreciated.

 

Config:

No overclocking of any kind.

ASUS B250 MINING EXPERT MOBO with latest / final BIOS image (1208)

CELERON G3900  (I'm told it does not matter as its just an IO Processor Packet scheduler for FAH)

16GB SDRAM 2X8G (2333)

NVIDIA TESLA K20M (2 work when installed 3 or more - no go - 18 slots available)

1TB SATA 6Gbps SSD - 10TB SAN available when / if needed

1200W CORSAIR HX1200 MAIN SUPPLY / 2-2400W 12V GPU supplies in reserve

18X PCI-E 1X RISERS with USB - PCI-E GEN3 speed cable cables (TESLAS are PCI-E 2 I tried all speeds 1,2,3)

Three tier aluminum rack/blowers etc, all pedestrian, and not important at this moment. Seem to keep the GPUs at around 58-60 degrees at 100% utilization. FAH does pin them at 100%.

USB Keyboard/Mouse and 1080P monitor (60hz) - irrelevant

WINDOWS 10 - Latest CUDA and NVIDIA drivers. All working as noted.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I vaguely recall one manufacturer or another removing multiple gpu capacity entirely for some systems and limiting some others to two GPUs.  I don’t know if you’re running into that problem or not.  It’s possible you might do better with older bits that don’t have that limitation if that is what the problem is.  Of course then you could run into OTHER problems.  There are reasons things get updated. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks Bombastinator. I would expect that limit after the system posts and the OS and that specific manufacturers driver loads but I'm not even getting a single BEEP when three are installed. The PCI Enumeration does not complete successfully. I did not know that the PCI Enumeration part of the BIOS did anything but deal with one card, sequentially and monolithically at a time. Meaning CARD #3 does not know CARD#1 and #2 even exists and what they are or are using resource wise.  The BIOS knows what it has allocated from the resource pool certainly. And it may even know it has more than one card of the same type but given this is a board built for running 18 GPUs I'd hope it cares little about more than 2 of the same type. I've seen miners run a mix of NVIDIA and AMD to get to 18 and that was limited by the OS driver maxing out at I think 9 cards of any given type. That was fixed about a year ago.  

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks ShrimpBrime - I'm trying to get over the hump on this setup not transition to something new. 

I noted the 0x96 - PCI BUS ASSIGN RESOURCE - ASUS BIOS QCODE. That is the real nut to crack here I believe.

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, DF_CA said:

Thanks Bombastinator. I would expect that limit after the system posts and the OS and that specific manufacturers driver loads but I'm not even getting a single BEEP when three are installed. The PCI Enumeration does not complete successfully. I did not know that the PCI Enumeration part of the BIOS did anything but deal with one card, sequentially and monolithically at a time. Meaning CARD #3 does not know CARD#1 and #2 even exists and what they are or are using resource wise.  The BIOS knows what it has allocated from the resource pool certainly. And it may even know it has more than one card of the same type but given this is a board built for running 18 GPUs I'd hope it cares little about more than 2 of the same type. I've seen miners run a mix of NVIDIA and AMD to get to 18 and that was limited by the OS driver maxing out at I think 9 cards of any given type. That was fixed about a year ago.  

If it’s built into the bios though it could hit before post.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/17/2021 at 1:02 PM, Bombastinator said:

If it’s built into the bios though it could hit before post.

Thanks - you make a good point. I tend to doubt it as this MOBO is specifically designed for multi-GPU mining. At least that was my hope. If this IS the situation I wonder if the BIOS can be mod'd to remove that limit.  

I do not assume you know this, if you do please do not take offense. For this particular MOBO people were able to fill to a full 19 GPUs to POST without issues. (Thus the attraction for Folding.) BUT the GPU drivers for Linux and Win10 had maximum# setups. IE: AMD  would enable only 11 and NVIDIA would only enable 8 (P106) if memory serves.  So they were forced to do that split.

B250_config.JPG

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×