HomeLab/HomeDatacentre and PowerUsers of the Forum

TopHatProductions115 · June 30, 2019

Hey - thanks for dropping by! This thread is for discussion of personal setups that involve (or revolve around) use of enterprise hardware/software (ESXi, Windows Server, etc.), distributed computing/HPC clusters, workstations, server racks/farms, mainframes, multi-device management/administration, MS AD (Active Directory), database access/management, hypervisor management, VPN/vLAN, VPS/Multi-User Remote Access, and other service-provisioning tasks. Think of it as ~~r/homelab~~ or r/homedatacenter, without the cancer risk of going to Reddit. But, those subreddits are actually better than most, so don't be scared to visit them if you need something that isn't available here - and then report back here with your findings. The only restriction is that the setup's primary purpose (or at least one of its main purposes) has to be for high-performance computation of some sort, and not just for bragging rights.

Spoiler

Also might be time to go storm r/homedatacenter to keep it alive :3

For instance, I currently use my workstation for Plex Media Server, Simple DNSCrypt (acting as a local DNS resolver), hosting the occasional game server, multimedia encoding/livestreaming, and Moonlight Gamestreaming (like a personal version of Google Stadia). I also am working toward a personal initiative that should be ready by the middle of 2022 if everything goes as planned This is for heavy tasks that either benefit from or require compute capabilities that are afforded by high core-count CPUs, considerable RAM capacities, and/or considerable GPU acceleration (ie., nVIDIA CUDA). Software H.264/H.265 and CUDA-accelerated video encoding counts. AV1 also counts, because that thing's a monster. If you own a public-facing web server or file server, that counts too. NextCloud and VPS as well (as long as it's port forwarded and accessible over the Internet). If it's something that's publicly-facing (accessible over the internet) and can be used by multiple people, the server counts NVENC is not included - since almost every modern nVIDIA card can do it. AMD's VCE is also out, due to similar reasoning. Intel QuickSync only counts if it's used for tasks like video editing and transcoding. If your configuration doesn't achieve any of the above points, there is one more possibility - the GPU itself. If you use your GPU for non-gaming tasks, like GPGPU, ANN, ML/AI, PCI Passthrough, or other specialised tasks on a regular basis (at least monthly), you can still post here. Radeon Pro/FirePro, Quadro/Tesla, and other workstation cards are welcome if you're using them for non-consumer workloads regularly (and not just for ePeen). Crypto mining is also not covered in this thread. Please leave that for a thread of its own. With that, I now will list a few related threads on the forum:

If you, by some rare chance, want your thread listed here, please feel free to say so below If your computer has a Xeon (or i7 equivalent), Threadripper (or Ryzen 9 equivalent), EPYC, Opteron, or other specialised CPU in it, it's definitely welcome. Just make sure it's actually doing something (like video transcoding or 3D rendering). Mobile workstations (like HP's EliteBook 8770W and Dell's Precision M7720) are allowed as well.

Spoiler

If you want, please feel free to post your benchmarking results below. Please note that the following test suites and hardware loads are expected here:

3DMark
Cinebench R15 (Vanilla/modded) and R20
Unigine Valley/Heaven/Superposition
V-Ray
FurMark
HWBOT HDBC and ffmpeg
Crystal DiskMark
F@H/BOINC
~~UserBenchmark~~ no longer

If you want to use other tools, feel free to do so. However, it may be more difficult to get comparison numbers (for relative performance). Please tell us what settings you used when benchmarking, to allow for easier comparisons. Otherwise, it defeats the purpose of comparing benchmarks.

Have fun!!!

DogKnight · July 1, 2019

So just to clarify, are you looking for people to share what they have done? Or projects currently underway? Or is this to contain links to other threads where the content is discussed? Just wanting to better understand the contributions you are looking for / the intent of this thread.

TopHatProductions115 · July 2, 2019

@DogKnight I am okay with both finished and in-progress projects/works. If you've already posted about it on the forums, feel free to link it here. Otherwise, explain it in detail in your post

July 23, 2019

My home test lab where I mess with things before pushing them to the production rack.

leadeater · July 24, 2019

Not quite sure I'm clear on what this topic is for? It's like extremely broad and for things that do and do not count, count towards what? Confused.

TopHatProductions115 · July 24, 2019

@leadeater

Quote

Think of it as r/homelab, without the cancer risk of going to Reddit. The only restriction is that the setup's primary purpose (or at least one of its main purposes) has to be for high-performance computation of some sort, and not just for bragging rights.

2FA · July 29, 2019

So I finally got a lancache VM up and running tonight after working on it for a good part of the day, the documentation is so so and partially out of date in some parts due to the project going from just steam caching to multi service caching. Not generally needed for 1-2 users but I find my game installs break sometimes, especially if I reinstall Linux, and I have very slow download speeds (currently 5Mbps but going to 25 after next billing cycle). Having to redownload games that are potentially 80+GB is painful (looking at you ESO). I have it set up to go client->lancache->pihole->upstream dns. This is also easier to maintain than some sort of backup method instead of caching.

My next project will probably be getting the Sickbeard MP4 Automator python script to encode my media files into Plex direct play friendly format automatically upon download yarr harr if they aren't already.

alpha754293 · July 29, 2019

Do you mean like this?

(cf.

)

Specs: (copied and pasted from my other post)

Equipment list:

Supermicro 6027TR-HTRF (quad dual socket half-width blade nodes in a 2RU rackmount, each node having dual Intel Xeon E5-2690 (v1) (8-cores, 2.9 GHz stock, 3.3 GHz max all core turbo, 3.6 GHz max turbo), 8x Samsung 16 GB DDR3-1866 ECC Reg. 2Rx4 RAM (128 GB total per node, 512 GB total for the whole system) running at DDR3-1600 speed (because it's 2R), SATA SSD for OS, HGST 7200 SATA for data, Mellanox ConnectX-4 dual port 100 Gbps (4x EDR Infiniband) NIC) (Storage configuration varies a little depending on which OS I am booting into. I have it physically separately by different OS SSDs and data HDDs.)

Mellanox 36-port 100 Gbps 4x EDR Infiniband externally managed switch (MSB-7890)

Qnap TS-832X 8-bay NAS (8x 10 TB HGST SATA 7.2krpm drives, in RAID5)

Qnap TS-832X 8-bay NAS (7x 6 TB HGST SATA 7.2krpm drives, in RAID5)

(those two are tied together via dual SFP+ to SFP+ 10GbE connections)

Buffalo Linkstation 441DE (4x 6 TB HGST SATA 7.2 krpm drives, in RAID5)

Netgear GS116 16-port 1 GbE switch

Netgear GS208 8-port 1 GbE switch

There's a bunch of other stuff that's not pictured here (4 workstations, another NAS, and some old, decommissioned servers). I'll have to get longer cables before I can bring those systems back up online.

==end of copy and paste==

The relatively "new(er)" thing that I'm going to be testing is different ways of building/compiling OpenFOAM in CentOS 7.6.1810 that will enable Infiniband and also whether I will build OpenMPI as part of the "normal" build process as outlined per OpenFOAM's OpenFOAM v6/CentOS 7 instructions and/or whether I will actually DISABLE building OpenMPI 2.2.1 per the instructions and use the OpenMPI 1.10.7 that's from the CentOS repo in order to try and help resolve an issue that I was having with trying to run the Motorbike OpenFOAM benchmark.

(Ironically, using the exact same physical hardware, but using Ubuntu 18.04 LTS, I had no problems getting everything to work. It's with CentOS that I have an issue with, but the idea is that I don't want to have to switch to a different OS whenever I want to run a different application -- I want it all to run on CentOS, so I'm doing some testing/research for the OpenFOAM development team since I submitted a bug report in regards to this issue that I was seeing after running through their install instructions.)

I'm starting with OpenFOAM and the benchmark because that's a case that's readily and publically available that I can use to ensure that the system is up and running as it should be. I'm looking into moving to Salome as well for FEA and I've already used GPGPU on my other workstations with DualSPHysics for SPH/particle modelling and simulation (my last run was for an offshore power generation installation simulation).

Here is an animation of a CFD simulation of a Wankel internal combustion engine that ran on this system (which took just shy of around 22 hours to run). On my 8-core workstation, it would have taken a little almost 7.5 days (or ~176 hours) to run the same so this system is helping to cut the run times of my HPC/CAE/CFD/FEA applications down SIGNIFICANTLY.

alpha754293 · August 5, 2019

The small update over this past weekend was I now have a 52-port L2 managed GbE switch (Netgear GSM7248) instead of the 16-port GbE switch that I had previously (Netgear GS116).

This is in preparation for my office moving to the basement from where it is now, currently occupying a room.

There is a proposal that I might actually end up consolidating almost all of my centralised network/computing equipment (with very few exceptions) to the rack now that I have it up and running.

It's going to be quite the PITA to take down though, when we eventually move to a bigger house.

alpha754293 · August 12, 2019

The other small news is that in the last month or two or so, I've managed to kill all of my Intel SSDs by burning through the write endurance limit on ALL of the drives.

So, now, I'm looking to see what I can do about as all of the consumer grade drives are being pulled from the micro cluster.

The latest round of SSD deaths occurred in about a little over two years of ownership (out of a 5 year warranty), and based on the power-on hours data from SMART, it's actually even sooner than that -- about 1.65 years.

So yeah...that happened.

Anybody here ever played with gluster/pNFS/Ganesha before?

leadeater · August 13, 2019

7 hours ago, alpha754293 said:

Anybody here ever played with gluster/pNFS/Ganesha before?

Limited amount, Ganesha backed by a Ceph cluster. Gluster itself is pretty simple/basic so I don't think you'll have any issue with that.

alpha754293 · August 13, 2019

9 hours ago, leadeater said:

Limited amount, Ganesha backed by a Ceph cluster. Gluster itself is pretty simple/basic so I don't think you'll have any issue with that.

Interesting.

Thanks.

Yeah, I'm trying to decide on what I want to do ever since I wore through the write endurance of all of my Intel SSDs.

I'm trying to decide if I want to switch over to data center/enterprise grade SSDs (which supports 3 DWPD) or if I want to create a tmpfs on each of my compute nodes and then export it to a parallel/distributed file system like GlusterFS or Ceph or pNFS (although the pNFS server isn't supported in CentOS 7.6.1810), so I'm not sure what's better in my usage scenario.

Upside with tmpfs (RAM drive) is that I won't have the write endurance issue that I am currently facing with SSDs (even enterprise grade SSDs).

Downside with tmpfs is that it's volatile memory which means that there is a potential risk for data loss, even with high availability (especially if the nodes are physically connected to the same power supply/source).

On the other hand, using the new usage data that I have from the newly worn SSDs, IF my usage pattern persists, then I might actually be able to get away with replacing the SSDs with enterprise grade SSDs and that will be sufficient over the life of the system. Not really sure yet, also only because the enterprise grade SSDs are larger capacity, and therefore; I might be inclined to use it more, especially if I DO end up deploying either GlusterFS or Ceph or alternatively, all of the enterprise grade will go to a new head node for the cluster, and it will just be a "conventional" NFSoRDMA export, which will simplify things for me. (Also might have the fringe benefit that it if is a new head node, I might be able to take advantage of NVMe.)

Decisions, decisions, decisions (especially when, again, I'm trying to get the best bang for the buck, and working with a VERY limited budget.)

leadeater · August 13, 2019

6 hours ago, alpha754293 said:

-snip-

For what you want I wouldn't go with Ceph, it's more resilient that say Gluster and has great throughput over the cluster but it's not good at low latency and per client throughput can be lower than other options. It's a lot harder to get really good performance out of Ceph compared to say Gluster with underlying ZFS volumes etc. Lustre is another option for you.

alpha754293 · August 14, 2019

20 hours ago, leadeater said:

For what you want I wouldn't go with Ceph, it's more resilient that say Gluster and has great throughput over the cluster but it's not good at low latency and per client throughput can be lower than other options. It's a lot harder to get really good performance out of Ceph compared to say Gluster with underlying ZFS volumes etc. Lustre is another option for you.

Yeah, I was reading about the difference between the two and at least one source that I found online said that GlusterFS is better for large, sequential transfers whereas Ceph work better for lots of smaller files or more random transfers.

Yeah, I think that I've mentioned Lustre in my other thread as well.

Still trying to decide whether I want to have a parallel/distributed RAM drive vs. enterprise SSDs, or whether I want to just get the SSDs and have a new head node that will pretty much just only do that (and present the enterprise SSDs as a single RAID0 volume to the network as "standard/vanilla" NFSoRDMA).

alpha754293 · August 19, 2019

On 8/13/2019 at 4:08 PM, leadeater said:

For what you want I wouldn't go with Ceph, it's more resilient that say Gluster and has great throughput over the cluster but it's not good at low latency and per client throughput can be lower than other options. It's a lot harder to get really good performance out of Ceph compared to say Gluster with underlying ZFS volumes etc. Lustre is another option for you.

For Gluster, can the data servers also be the clients or is the assumed model that the data servers are separate from the clients?

Thanks.

(I was trying to play with pNFS over the weekend and I couldn't figure out how to make the data servers and the clients export to the same mount point.)

leadeater · August 19, 2019

5 hours ago, alpha754293 said:

For Gluster, can the data servers also be the clients or is the assumed model that the data servers are separate from the clients?

You can, most probably don't do that but there isn't anything that would stop you. Suspect what you are wanting to do isn't anything different from hyper-converged infrastructure.

alpha754293 · August 20, 2019

6 hours ago, leadeater said:

You can, most probably don't do that but there isn't anything that would stop you. Suspect what you are wanting to do isn't anything different from hyper-converged infrastructure.

Sort of.

The idea really currently stems from I've got four nodes and I want to allocate half of the RAM to a RAM drive (tmpfs) and then using that as the source of the space ("data server") that then will serve the four nodes itself as the client.

In other words, each node right now has 128 GB of RAM.

If I only allocate half of the RAM to each local node, then each node will only get 64 GB.

But if I can pool them together using GlusterFS, then all four nodes would be able to see and address a total of 256 GB (combined) which is more than any single node can address/provide.

I'm not sure if that really means "hyperconverged" because I thought that converged meant something different.

leadeater · August 20, 2019

11 minutes ago, alpha754293 said:

But if I can pool them together using GlusterFS, then all four nodes would be able to see and address a total of 256 GB (combined) which is more than any single node can address/provide.

I'm not sure if that really means "hyperconverged" because I thought that converged meant something different.

Yea pretty much is, all it means in the VM hosting world is each node serves both as storage as well as compute in a scale out fashion so when you add a node you are adding storage capacity, storage performance and compute resources.

alpha754293 · August 20, 2019

14 hours ago, leadeater said:

Yea pretty much is, all it means in the VM hosting world is each node serves both as storage as well as compute in a scale out fashion so when you add a node you are adding storage capacity, storage performance and compute resources.

Thanks.

alpha754293 · August 21, 2019

For those who might be interested, here's my current testing results with GlusterFS:

For those that might be following the saga, here's an update:

I was unable to mount tmpfs using pNFS.

Other people (here and elsewhere) suggested that I use GlusterFS, so I've deployed that and am testing it now.

On my compute nodes, I created a 64 GB RAM drive on each node:

# mount -t tmpfs -o size=64g /bricks/brick1

~~and edited my /etc/fstab likewise.~~ *edit* -- I ended up removing this like from /etc/fstab, due in part to the bricks being on volatile memory, so I was recreating the brick mount points with each reboot, which helped to clean up the configuration stuff. (e.g. if I deleted a GlusterFS volume, and then tried to create another one using the same brick mount points, it wouldn't let me.)

I then created the mount points for the GlusterFS volume and then created said volume:

# gluster volume create gv0 transport=rdma node{1..4}:/bricks/brick1/gv0

but that was a no-go when I tried to mount it, so I disabled SELinux (based on the error message that was being written to the log file), deleted the volume, and created it again with:

# gluster volume create gv0 transport=tcp,rdma node{1..4}:/bricks/brick1/gv0

Started the volume up, and I was able to mount it now with:

# mount -t glusterfs -o transport=rdma,direct-io-mode=enable node1:/gv0 /mnt/gv0

Out of all of the test trials, here's the best result that I've been able to get so far. (The results are VERY sporadic and they're kind of all over the map. I haven't quite why just yet.)

[root@node1 gv0]# for i in `seq -w 1 4`; do dd if=/dev/zero of=10Gfile$i bs=1024k count=10240; done
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 5.47401 s, 2.0 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 5.64206 s, 1.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 5.70306 s, 1.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 5.56882 s, 1.9 GB/s

Interestingly enough, when I try to do the same thing on /dev/shm, I only max out at around 2.8 GB/s.

So at best right now, with GlusterFS, I'm able to get about 16 Gbps throughput on four 64 GB RAM drives (for a total of 256 GB split acrossed four nodes).

Note that IS with a distributed volume for the time being.

Here are the results with the dispersed volume:

[root@node1 gv1]# for i in `seq -w 1 4`; do dd if=/dev/zero of=10Gfile$i bs=1024k count=10240; done
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 19.7886 s, 543 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 20.9642 s, 512 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 20.6107 s, 521 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 21.7163 s, 494 MB/s

It's quite a lot slower.

TopHatProductions115 · August 22, 2019

Okay - I have to admit that I've radically changed my plans for my next workstation.

https://linustechtips.com/main/profile/511347-tophatproductions115/?status=241608&type=status

Editing OP to reflect said changes...

TopHatProductions115 · September 5, 2019

SR-IOV is gonna be a pain to get my hands on. Might just grab a GRID card instead and call it a day.

Unless AMD decides to let SR-IOV be a thing on consumer cards for once...

TopHatProductions115 · September 20, 2019

Looks like I'm going with Tesla K10's. Prices are getting a bit high, though...

Windows7ge · September 20, 2019

Looks like I missed this thread being started. I'll have to share my gear when I find the time. It's too bad the server enthusiasts here are the minority of the forum. Seeing more discussions like this would be great.

TopHatProductions115 · September 21, 2019

@Windows7ge Notify them all - this thread is for them :3

Sign In

HomeLab/HomeDatacentre and PowerUsers of the Forum

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites