Jump to content

The Pure Solid State Server Build Log

43 minutes ago, Windows7ge said:

It would be stranded based on its flexibility. At my house here though this install won't be in the wall or include a patch panel. It just drapes down the wall leading up to a drop ceiling which next to it is the effective network closet where there is no patch panel and it just plugs directly into the 10Gbit switch. Not professional at all. The Cat5e cable will be solid core though if that means anything to you. So reliability here isn't a concern as the wires aren't going to be constantly unplugged and re-plugged or moved around very much. I'm pretty sure if I wanted solid core I would have had to buy it in bulk and terminate it myself. UTP Cat5e is easy enough but STP Cat6a would be a bit challenging to do right so I just bought a pre-made cable.

Ok just thought i'd say just in case you were running it through a wall to a faceplate or similar. I'm actually  thinking about building/getting a server myself at some point if happen to spot a good deal... Planning to maybe  to run network cables (cat 5e solid core)  through the house though at some point

Please quote or tag  @Ben17 if you want to see a reply.

If I don't reply it's probly because I am in a different time zone or haven't seen your message yet but I will reply when I see it ? 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Ben17 said:

Ok just thought i'd say just in case you were running it through a wall to a faceplate or similar. I'm actually  thinking about building/getting a server myself at some point if happen to spot a good deal... Planning to maybe  to run network cables (cat 5e solid core)  through the house though at some point

Some day in the future I'd like to do that in my own home but that's not the case at my current residence. Outside the wall is my only option. I've ran wires in walls before and terminated wall jacks and I've only ever used solid core for that but I don't have shielded Cat6a keystone jacks and for this short ~16ft run its not worth it. Especially since the whole setup is quite temporary. It isn't intended to be a permanent installation.

Link to comment
Share on other sites

Link to post
Share on other sites

I got the cables ran yesterday and got it online. I have to start looking into drivers as it seems I'm not immediately able to use the X540 ports. In fact while fiddling around with that, this happened.

Screenshot_1.png.151834a62e120927ac3cb024243c52bf.png

Yay. Not even up and running yet and already have my first BSOD. When I got back in I saw my 10Gbit NICs disappeared from network connections. I found them in Device Manager and it said it disabled them because something went wrong(error 43). So all of that is on the to do list so I can give you guys some performance metrics.

 

I do however have the IPMI up and running so I can start doing a little screen capture instead of pictures with my phone (where applicable anyways).

 

With the IPMI up and running I was able to get a view of the system temps and everything appeared acceptable except for:

Screenshot_2.png.44614cd669d12e325bae881b7e5a67ef.png

oof, I saw it peak at 93°C. The MB_10G is the motherboards built-in 10Gbit NIC which is physically

IMAG0423.thumb.jpg.120d26bfc78bd22ac8f60022449dfedf.jpg

right underneath that aluminum heatsink under the SAS/SATA HBA (left of RAM stick near the back). It's a tiny heatsink so obviously it can't passively cool itself very well.

@leadeater In your opinion how hot is too hot for a NIC like this?

 

The 140mm fan I have mounted there gets quite suffocated by the lid when I put it on causing insufficient airflow down onto the board.

 

This puts me in a bind as there isn't really any good place to put another fan.

IMAG0424.thumb.jpg.70871632306ceee6479ed09842219ccc.jpg

There is a gap in-between the two sockets where a fan could just sit but I don't know if it would move enough air in that direction.

 

Suggestions?

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Windows7ge said:

@leadeater In your opinion how hot is too hot for a NIC like this?

Argh was going to say I'd check mine but I gave my X540's away. Had them working fine though under Windows and ESXi. Technically below thermal max is 'fine' but I like to keep things below 80 if possible.

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, leadeater said:

Argh was going to say I'd check mine but I gave my X540's away. Had them working fine though under Windows and ESXi. Technically below thermal max is 'fine' but I like to keep things below 80 if possible.

That's what I thought. Digging though my fan collection these are the most likely candidates to do something

IMAG0425.thumb.jpg.2d40c3b77d333cd26b51c728c720aa97.jpg

The little guy in the middle is a 60x60x10mm delta fan. I have no idea where I got it but it moves some serious air for its size. I'll probably rig it up, see if it helps. I also have some basic corsair SP fans. They're loud at 12v but move a lot of air. I'll have to test them. They might overcome the ~15mm air gap I'm working with.

Link to comment
Share on other sites

Link to post
Share on other sites

So I grabbed an old Corsair 120mm SP fan and tested that out with the cover on the server and it did A LOT better than the Noctua 140mm. I needed a way to secure it though. I figured simple problems require simple solutions.

 

I own a 3D printer for the exact use of making useful tools & adapters for my projects. So I spent 5~10mins designing these.

Screenshot_3.png.38494e5f542a2593bb4268f922313023.png

 

Printing them took 20mins.

IMAG0427.thumb.jpg.ea62ec4f2b8e6a5bb4ed5efbd2020e2d.jpg

Cleaned then up. Tweaked the hole size a bit with an art knife.

 

Attached them to the fan.

IMAG0429.thumb.jpg.af35ee6899ea5e7a94ae7ecbe7213649.jpg

 

Then mounted the fan back in place. You can see here how it actually connects.

IMAG0430.thumb.jpg.0d287261a895a89f02d4c958021b0f9d.jpg

 

The final result of the labor.

Screenshot_4.png.ac276bb136cf29e1a718c33ec2e3fe03.png

a decrease of 40°C @leadeater. I can say this will also benefit the HBAs greatly.

 

Next up is diagnosing the issues going on in Windows Server about the X540. It use to give me error 43. Now it's giving me something different.

Screenshot_5.png.d7e2efa03b54a9a47771bdfc9316e896.png

...not enough free resources...64 threads & 128GB of RAM isn't enough free resources?

 

Hopefully installing the driver is the solution. Wish me luck.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Windows7ge said:

Hopefully installing the driver is the solution. Wish me luck.

Usually it's drivers or the amount of hardware devices and their required reserved address space has exceeded the default maximum allowed. You can increase the amount of address space that hardware devices can use.

 

https://appuals.com/fix-this-device-cannot-find-enough-free-resources-that-it-can-use-code-12-error-on-windows-7-8-and-10/

 

Try fixing it with drivers etc first, then the bios tweak if it has it (servers should always have it) then last the reg tweak.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

Usually it's drivers or the amount of hardware devices and their required reserved address space has exceeded the default maximum allowed. You can increase the amount of address space that hardware devices can use.

 

https://appuals.com/fix-this-device-cannot-find-enough-free-resources-that-it-can-use-code-12-error-on-windows-7-8-and-10/

 

Try fixing it with drivers etc first, then the bios tweak if it has it (servers should always have it) then last the reg tweak.

Well, I tried installing the driver. Didn't help.

Tried enabling above 4G decoding (could not locate Top Of Lower Usable Dram) didn't help.

Tried the regedit hack. Didn't help

 

The system keeps disabling the interfaces, I keep seeing the error "Error getting the adapter info", Device Manager keeps Not Responding, and attempts to restart the server results in hangs. I'm uncertain if the issue has to do with lack of resources since I've only seen the error once. Every error since them has been "Windows has stopped this device because it has reported problems. (Code 43)". Doesn't say what the problem actually is. Defective NIC? Do I have to RMA the motherboard?

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Windows7ge said:

The system keeps disabling the interfaces, I keep seeing the error "Error getting the adapter info", Device Manager keeps Not Responding, and attempts to restart the server results in hangs. I'm uncertain if the issue has to do with lack of resources since I've only seen the error once. Every error since them has been "Windows has stopped this device because it has reported problems. (Code 43)". Doesn't say what the problem actually is. Defective NIC? Do I have to RMA the motherboard?

I had almost the exact same recently with a Huawei server, never got it sorted out since it's a loan server. I just ended up disabling non used nics and other hardware that I didn't need and then the 10G nics I needed worked.

 

Could be anything, bios firmware, UEFI vs legacy bios, NIC firmware, OS version bug. Update every firmware you possibly can, try a different version of Windows just to rule the OS out.

Link to comment
Share on other sites

Link to post
Share on other sites

56 minutes ago, leadeater said:

I had almost the exact same recently with a Huawei server, never got it sorted out since it's a loan server. I just ended up disabling non used nics and other hardware that I didn't need and then the 10G nics I needed worked.

 

Could be anything, bios firmware, UEFI vs legacy bios, NIC firmware, OS version bug. Update every firmware you possibly can, try a different version of Windows just to rule the OS out.

This is one of the reasons I didn't want to put Windows on the server to begin with. Dealing with this kind of crap. It's possible though the NIC is faulty meaning Windows is fine. Tomorrow I'll load up Linux on a thumb drive and setup the interfaces. If they work as desired we'll know it's just some giant software conflict in Windows. If they don't I may have to RMA the motherboard.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Windows7ge said:

This is one of the reasons I didn't want to put Windows on the server to begin with. Dealing with this kind of crap. It's possible though the NIC is faulty meaning Windows is fine. Tomorrow I'll load up Linux on a thumb drive and setup the interfaces. If they work as desired we'll know it's just some giant software conflict in Windows. If they don't I may have to RMA the motherboard.

I've never had issues like this on HPE/IBM/Lenovo/Dell servers though, running Windows with 2x dual 10G/25G NICs and multiple RAID cards and HBAs. That Huawei server was actually the first I've had problems like that.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

I've never had issues like this on HPE/IBM/Lenovo/Dell servers though, running Windows with 2x dual 10G/25G NICs and multiple RAID cards and HBAs. That Huawei server was actually the first I've had problems like that.

This is why I'm fearing defective NIC.

 

This is a server grade Supermicro board (reputable)

Using a Intel X540 (reputable brand & NIC in general)

Using an up-to-date version of Windows Server (reputable/stable)

Brand new hardware & fresh OS install.

 

There is virtually nothing that could go wrong here except failed hardware, but we'll see. I'll test it tomorrow.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Windows7ge said:

This is a server grade Supermicro board (reputable)

Supermicro is known for having some buggy firmware though, this has more to do with just how many different motherboards and revisions they make of them compared to the much more controlled and smaller set someone like HPE has, a lof of their firmware is common across models as well.

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Windows7ge said:

Well, I tried installing the driver. Didn't help.

Tried enabling above 4G decoding (could not locate Top Of Lower Usable Dram) didn't help.

Tried the regedit hack. Didn't help

 

The system keeps disabling the interfaces, I keep seeing the error "Error getting the adapter info", Device Manager keeps Not Responding, and attempts to restart the server results in hangs. I'm uncertain if the issue has to do with lack of resources since I've only seen the error once. Every error since them has been "Windows has stopped this device because it has reported problems. (Code 43)". Doesn't say what the problem actually is. Defective NIC? Do I have to RMA the motherboard?

Hope you get it solved and working properly Let us know how it goes. Hope for the best....?

Please quote or tag  @Ben17 if you want to see a reply.

If I don't reply it's probly because I am in a different time zone or haven't seen your message yet but I will reply when I see it ? 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/21/2019 at 1:44 PM, Windows7ge said:

That's a lot of threads.

IMAG0421.thumb.jpg.17ceba5f928878f35ddfc9a6d9aa6b02.jpg

With the server apparently stable I tried to run a stress-test with AIDA64 but ADIA64 couldn't handle it and kept (not responding). I'll have to find a different tool.

I think Prime95 can handle a bunch of threads but the GUI gets nasty cluttered if you don't have it merge all the workers to a single window output. I saw a video where Linus fired it up on like a 16 or more thread CPU, you've got a few more than that but it should scale. It will however be a workload which is unrealistically intense, I typically call it stable after 24hrs but lately I've been dedicating spare time to the GIMPS project and also using that longer term lower intensity work to suss out stability issues, it works great for that and you can help make math better at the same time.

Link to comment
Share on other sites

Link to post
Share on other sites

In all honesty I did not expect 9 people to follow this project. 2 or 3 maybe. Jeez.

 

So I have a not so happy update. I got in contact with Supermicro Support and after explaining everything I did to troubleshoot they pretty much immediately said submit an RMA.

 

And let me say their RMA service is a mother of a heap of hoops to jump though. Their only option for replacement is a cross-shipment where they require you to put-down a security deposit and even if everything goes smoothly you may only receive a refurbished board in exchange for your non-functional one. What kind of business model is that?

If the security deposit means they ship the replacement immediately and I just have to get around to returning their defective one then that's fine I kind of like that but the possibility of receiving a refub is kind of bogus.

 

So while we wait on that the server will just be sitting here with it's guts hanging out.

IMAG0431.thumb.jpg.6042a0635d37bcf2c4df1e68b846be8c.jpg

I wasn't going to waste your time posting updates about disassembling what I just built.

 

In the meantime I'd like to startup an unrelated project but I'll give it it's own build-log thread once it's ready (when I have the necessary parts). I'll tease it here.

Spoiler

So this is the NORCO RPC-431 it's a 4U rack mountable chassis and it's physically in between a ATX & Micro-ATX sized chassis.

IMAG0432.thumb.jpg.a2f3bcead574689f6d90c5a1af5541fa.jpg

 

There's nothing really special about the chassis itself but considering it's size

IMAG0433.thumb.jpg.89018cea5beb0c537f4a751b53492e86.jpgIMAG0434.thumb.jpg.63bd40d84bb2e022a7035c7afbe154a4.jpg

 

It can hold 9 hard drives and I'm convinced I could modify it to hold 12.

 

IMAG0435.thumb.jpg.2a0dd0bccdc70f8f5fcd2a91824302ac.jpg

 

I plan to put a mini-ITX motherboard in it with 10Gbit NIC and installing FreeNAS. It's going to be a very high density, low power draw backup server. Imagine a server that could hold 90 to 120TB but drew less than 60W most of the time?

 

So this is on the project list and I might start it sooner rather than later depending on what the wait time will be on getting the replacement board in.

Back on the main topic I wonder is it possible for a device that operates off of PCI_e to stop function if the PCI_e controller itself is malfunctioning? Would the device plugged in stop functioning or would windows just crash? I mean the CPU's are second hand and I have to assume the built-in NIC is going though the CPU's PCI_e bus...hmn...

 

If the replacement board has the exact same issue I'll have to swap the CPUs in their sockets and see if anything changes.

Link to comment
Share on other sites

Link to post
Share on other sites

They neglected to inform you that refurb boards only carry a 7 day warranty which starts at time of shipment, good luck!

 

Just kidding, but do follow up if the refurb board they may send you continues to carry the warranty of the original part or if it carries a shorter/lesser warranty. I have been burned by that in the past with other electronics. Buy new, RMA, get refurb as replacement, fails, find out refurb only has 15/30/60 day warranty where as new item bought had 1/2/3 year warranty. Customer service simply asked for a serial number, oh sorry that's a refurb and the warranty is expired now. Most mind numbingly frustrating thing to deal with ever.

 

I sincerely hope you don't run into that issue but from what I've read about SM historically it's enough to put me off them when looking for deals on used stuff through eBay. They're known for being buggy going back like 2 decades with many tales from IT guys blogs about trouble shooting problems that 'can't exist' according to their tech support and never finding a resolution other than to use a different brand of hardware all together!

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Bitter said:

They neglected to inform you that refurb boards only carry a 7 day warranty which starts at time of shipment, good luck!

 

Just kidding, but do follow up if the refurb board they may send you continues to carry the warranty of the original part or if it carries a shorter/lesser warranty. I have been burned by that in the past with other electronics. Buy new, RMA, get refurb as replacement, fails, find out refurb only has 15/30/60 day warranty where as new item bought had 1/2/3 year warranty. Customer service simply asked for a serial number, oh sorry that's a refurb and the warranty is expired now. Most mind numbingly frustrating thing to deal with ever.

 

I sincerely hope you don't run into that issue but from what I've read about SM historically it's enough to put me off them when looking for deals on used stuff through eBay. They're known for being buggy going back like 2 decades with many tales from IT guys blogs about trouble shooting problems that 'can't exist' according to their tech support and never finding a resolution other than to use a different brand of hardware all together!

Everything appeared to be working fine just the NICs refused to. I can only assume they're doing something right or else they wouldn't be the tech giant they are now. To be honest the only reason I went with them was because of something leadeater told me close to 2 years ago. Hadn't he I probably would have built this server with either an ASUS or ASRock Rack board.

 

Who knows if the replacement gives me a whole new spectrum of issues that might just end up happening.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Bitter said:

I sincerely hope you don't run into that issue but from what I've read about SM historically it's enough to put me off them when looking for deals on used stuff through eBay. They're known for being buggy going back like 2 decades with many tales from IT guys blogs about trouble shooting problems that 'can't exist' according to their tech support and never finding a resolution other than to use a different brand of hardware all together!

Same is true for Tyan, Asus, ASRock etc in the past. I've not had too many problems with Supermicro but they've always fixed the problem for me, or the OEM that is using Supermicro hardware. They have to though when it's under warranty which is a factor plus we're big enough for them to actually care about future sales.

 

I've had Asus motherboard not able to handle multiple RAID cards correctly, that couldn't be fixed. Tyan is a list of issues too big to bother and I refuse to ever buy their stuff. ASRock I have never used but I doubt they are issues free, who isn't.

 

Supermicro seems to have a double standard here for support though, I've never had to return a part first nor pay money ever. That kind of support is trash.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

Same is true for Tyan, Asus, ASRock etc in the past. I've not had too many problems with Supermicro but they've always fixed the problem for me, or the OEM that is using Supermicro hardware. They have to though when it's under warranty which is a factor plus we're big enough for them to actually care about future sales.

 

I've had Asus motherboard not able to handle multiple RAID cards correctly, that couldn't be fixed. Tyan is a list of issues too big to bother and I refuse to ever buy their stuff. ASRock I have never used but I doubt they are issues free, who isn't.

 

Supermicro seems to have a double standard here for support though, I've never had to return a part first nor pay money ever. That kind of support is trash.

I've ran across blog tales about all of those too. Seemed to be the only way to guarantee a working system was to buy all the parts within a single ecosystem, like all Dell or all HP because they're all validated to work with their own stuff, even if their stuff is made by someone else like SM, Tyan, etc.

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Bitter said:

like all Dell or all HP because they're all validated to work with their own stuff, even if their stuff is made by someone else like SM, Tyan, etc.

Dell, HPE and Lenovo are all their own designs and manufacturing though, system board wise. Supermicro is common outside of those 3 big names, like Nutanix for example which is SM hardware. We've always used HP/HPE servers and issues with those are extremely rare, I've also had good success with IBM/Lenovo but not quite a polished as HPE at the time. Not used  any Lenovo server equipment since the IBM sale of x Server systems and storage to Lenovo.

 

One of my home lab servers is an IBM x3500 M4 which has been excellent, even for non IBM parts. I've got an IBM M5015 and LSI 9361 in it no problems, along with a Emulex 10Gb NIC.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

I've had Asus motherboard not able to handle multiple RAID cards correctly, that couldn't be fixed. Tyan is a list of issues too big to bother and I refuse to ever buy their stuff. ASRock I have never used but I doubt they are issues free, who isn't.

 

Supermicro seems to have a double standard here for support though, I've never had to return a part first nor pay money ever. That kind of support is trash.

When the 3x LSI 9201-16i's first showed up I put them in my existing server (ASRock Rack ep2c602-4l/d16) and there were 0 issues so I'd like to imagine stepping up to SM would mean the board can handle it as well.

 

My assumption to the LONG RMA process & security deposit is they want to weed-out the people who will just waste their time. Had I built this within the 30 day return period Newegg offers I would have done that but things did turn out that way. Plus it's cross-shipment which according to their papers they're going to FedEx standard overnight the new(or refurbished) board to me not waiting for me to ship the defective one back and I just need to make sure I ship it back eventually.

 

I will let them know if the replacement has more issues. I have no idea what they'll do for me from there.

Link to comment
Share on other sites

Link to post
Share on other sites

Link to comment
Share on other sites

Link to post
Share on other sites

So far, I think progress has been made.

Screenshot_1.png.9d1193e5aad6b490e8aa0acfeb923b81.png

They're behaving. Except for the Gigabit NIC. It's fine right now but likes to stop working. May have to try a different NIC.

 

I was able to assign IP's, however we now have a new problem.

Screenshot_3.png.bec199cdc75b0e1667b708c6666a20ec.png

They refuse to recognize the networks.

 

Despite having configured the IP's known correct

Screenshot_2.png.7102d95c0166b6ab8472b11e6d40b547.png

I cannot ping any other device on either network.

The local network is fine though. (1Gbit NIC)

 

I'm currently lost. I can't figure out what the problem is.

 

I installed the latest driver.

I tried switching the addresses around.

I played with Jumbo Packets.

Nothing is helping.

 

Does Windows Server by default block outgoing & incoming pings?

Does Windows Server REQUIRE a router on the network?

I'm currently out of ideas.

 

Suggestions?

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×