These Servers are TOO EXPENSIVE

nicklmg · March 4, 2019

Buy HDDs on Amazon: https://geni.us/yIvdUPw

unijab · March 4, 2019

With your resources Linus.

I believe you could get ahold of some of those optane dimm modules... ? (no not that optane memory mess intel released first)

edit... the official name is: optane dc persistent memory

Use those as a "shit ton" of RAM for ZFS

unijab · March 4, 2019

Also.. Linus.

Now that there are 1.5TB 2.5" optane 905p drives...

Why not upgrade the nvme storage server that you have full of the intel 750 SSDs?

SkyHound0202 · March 4, 2019

Well, it could have been Epyc...

...or you can just shell out ten grand to get one of those 1U Intel Ruler SSD storage server.

Andrew Blundon · March 4, 2019

At our office we tried to implement a tiered storage system for a 40 TB server (20x2 TB HDD in Raid 0 - no redundancy) with 4 x 200 GB SSD storage for writing large amounts of data. Unfortunately, we ran into a limitation with the write-back cache size (1 GB). Our cluster of 15 machines each with 2 Xeon E-2690v3 (24 cores per machine) was able to fill that write back cache pretty quickly and therefore the write speed reverted to that of the Raid 0 HDD system. The SSDs were basically useless. I know this is write vs read but we were disappointed with the performance overall. Our newer storage system is 20 TB of 20x1TB SSD Raid-0. Much faster at writing than the older system with the tiered storage. It cost more... but well worth it.

VanBantam · March 4, 2019

Have you thought about using a framework like Apache Ignite? I know you'd have to sort out the same sort of policies regarding data aging and prioritization but it is a scalable solution. Ignite is an in memory data fabric. From what I understand Ignite can work with a distributed file system like HDFS, GPFS (aka Spectrum Scale), etc. Then again 7k for ~8.4 TB of nvme drives sounds less expensive than however many servers you would need to have 8.4 TB of available memory.

I know Ignite is used by a major SaaS provider to serve their MySQL DB in memory for their clients.

Anyways just thought I'd throw it out there.

CajuCLC · March 4, 2019

Let me start by saying I don't have nearly the amount of data you guys have or workload.

But one thing I did was, I am using FreeNAS and I have 2 nvme on RAID 0 where I dump my files from the sd cards. Then I copy the files to my local nvme for editing. I export to nvme locally and then send the exported file to the nvme RAID 0. From there I copy all files to HDD raid 1 I have setup. 2x 10TB disks.

While the files are being copied from nvme to hdd, the speed really doesn't matter much because you are not really using it anymore.

At some point I will create some automation for this, but I got the idea from AWS S3. Send in as a tier, then at some point you won't touch that file, so move it to glacier and call it a day.

RobbinM · March 4, 2019

@LinusTech PrimoCache Linus, you know this software. They have a server edition.

Unro · March 4, 2019

Storage tiering works great in Ceph, but it would be an overkill for your use-case tho

netzkata · March 4, 2019

Spending thousand of dollars on storage is always a pain,but quite necessary. Going forward with increased data and complexity it starts to be a big pain to avoid performance issues. At some point it does make sense to invest in some type of tiered storage from any vendor. After using and maintaining HPE 3pars going to the Cisco UCS was the best thing for me as it gets me something that is really easy to manage and say goodbye to most of the the bottleneck issues you might see normally. Sooner is always better as migrations of big data is a great pain in the butt, especially if you have a lot of datastores and VM's.

Hope you find the best fit for you :). Love LTT, hope you do another maintenance video as it was the funniest thing I have ever seen :D

JCHelios · March 4, 2019

I'm sure Linus will get this all sorted, just in time for his employees to convince him to quadruple the number of pixels for the third time.

Tyronialy · March 4, 2019

As I watched this video, a feeling crept on me that approaching this problem with a Windows Server solution is not guaranteed to have full control as you wish for. I'm a Linux server admin and creating a solution as you expressed in the video is pretty straight forward.

You set up your server with high speed low capacity and mechanical drives, for the sake of the argument two nvme SSDs of 1 TB each and 4 HDDs of 10 TB each (and a small SSD for the operating system). You create two file systems, ZFS or BtrFS, one of 2 TB on the nvme SSDs and one of 20 TB mirrored on the HDDs. Both file systems are opened to the network by Samba shares, one for "work flow" and one for archives.

Now you create an archiving / de-archiving script which handles the decision when to move projects around. Let's assume that every project begins with a folder with the projects name with all the files below it. I know that the project will be around 400 GB, but say they are a 10th of the size for this argument.

We declare "enough space" as 25% or more left on the fast drives, so 500 GB or more free is "enough space". If the script runs once a day, during night time when probably no one edits? , this leaves with an example of 40 GB per project space for 12 new projects during the day before the script runs again.

First the script will perform an inventory. This is done by using the Linux file parameter "accessed time" (atime), which is updated every time a file is opened, whether or not is has been edited. First it scans the atime for the fast drives and creates an list of the projects ordered by atime: last accessed first, projects that hasn't been accessed frequently last. Along with this inventory the total project size is also archived. This list can become to look something like this (note that the first line is for explanation):

project atime size (GB)

PROJ1 2019-03-04 40,5

PROJ3 2019-03-04 39,3

PROJ2 2019-03-02 40,3

(and so on, you'll get the picture)

Secondly another archive list is created, of all the projects in the archive with the atime. If a project there is accessed, the script will note that this project is becoming "important" again, flagging it for move to the fast storage.

You can choose to leave the files on fast storage until you run out of space (say at 75% disk space used), or move a project by default to archive when it has not been accessed for a week. And move the project(s) from archive to fast when it has been accessed in the past day, keeping track of the disk space usage on the fast drives, potentially flagging the project with the oldest atime on fast storage to be moved to archive.

With a script like this you can be pretty versatile, building it completely to your liking. It is also possible to create "one network share" with both filesystems in it, keeping the user experience (as shown by Taran in the video) the least techy.

@LinusTech I am open to help exploring this possibility.

Edited March 4, 2019 by Tyronialy
added linus tag

crptic1 · March 4, 2019

hi there

someone close to Linus please forward him this link

https://www.c-sharpcorner.com/UploadFile/cd7c2e/how-to-set-caching-options-for-a-shared-drivefolder-by-usin/

kubo6472 · March 4, 2019

Dunno if a good idea, but just syncing beyond compare with the calendar and putting the active projects on the WHONNOCK and inacctive to DELTA 1/2 automatically, then use the leftover space on QQ server for some projects on scheduled client meetings.

Edited March 4, 2019 by kubo6472
Mistype

Enderman · March 4, 2019

This is why I always say cache drives are stupid.

The data is constantly moving back and forth from HDD to SSD, resulting in higher drive usage which impedes regular use, as well as more drive wear.

It is much better to have separate drives and manually choose what goes where, so that you get to choose exactly what is sped up and what is not.

leadeater · March 4, 2019

30 minutes ago, Tyronialy said:

I'm a Linux server admin and creating a solution as you expressed in the video is pretty straight forward.

You set up your server with high speed low capacity and mechanical drives, for the sake of the argument two nvme SSDs of 1 TB each and 4 HDDs of 10 TB each (and a small SSD for the operating system). You create two file systems, ZFS or BtrFS, one of 2 TB on the nvme SSDs and one of 20 TB mirrored on the HDDs. Both file systems are opened to the network by Samba shares, one for "work flow" and one for archives.

That's not actually storage tiering, it's archiving, and you can do that with Windows also. What I actually recommended was similar to what you suggested but a more standard automated system. File archiving with stubbing is what I think is the best solution for having fast storage for live projects and lots of capacity with no manual or script movement of files or file paths chaining, stubbing is pretty great for that.

If you move files around, meaning the file paths change, it's pretty annoying for the editors since you have to re-path everything in the projects.

joshuaauger · March 4, 2019

*** DELETED ***

Acedia · March 4, 2019

Get a HP MSA, pack it full of SAS and SSD drives. Set up tiering.

leadeater · March 4, 2019

3 minutes ago, Acedia said:

Get a HP MSA, pack it full of SAS and SSD drives. Set up tiering.

And stop using Intel 750's that aren't actually that good.

joshuaauger · March 4, 2019

*** DELETED ***

rsethc · March 4, 2019

What about just caching the files on the editors' machines? Instead of having the cache be in a centralized location, each workstation would have its own cache and you would also have less network traffic. I am not sure if Windows supports this but there may be some third party software to do it. The idea would be that when you attempt to open a file in the cache directory, if it does not exist (or there is a mismatch of last-modified time stamp) then Windows (or whatever third party software) would stream in the file, in its entirety or maybe in certain sized chunks, to the cache, and meanwhile also serve it to the file handle that is trying to read. As for the "in its entirety" / "chunks" this would mean that there would not have to be an explicit request by the application to read a small region, in order to go ahead and also load what is likely to be read next anyway since it's nearby the requested region. Although I'm not sure if Windows / certain file APIs used by the application already try to pre-load more data themselves and buffer it in memory.

twstdude0to1 · March 4, 2019

Hey Linus, before splurging on more NVME drives, look into Storage Spaces a little more. The more drives you have in both cold and hot tier will increase performance. As data is moved from cold to hot it will load the cold data sequentially to avoid random reads. You could also add a medium tier with lower end SATA SSDs to improve performance even more. Let me know if you would want to discuss.

joshuaauger · March 4, 2019

*** DELETED ***

leadeater · March 4, 2019

2 minutes ago, twstdude0to1 said:

Hey Linus, before splurging on more NVME drives, look into Storage Spaces a little more. The more drives you have in both cold and hot tier will increase performance. As data is moved from cold to hot it will load the cold data sequentially to avoid random reads. You could also add a medium tier with lower end SATA SSDs to improve performance even more. Let me know if you would want to discuss.

Too late, fairly sure he mentioned in a recent video (FP only from memory) that he's purchased the extra SSDs already.

twstdude0to1 · March 4, 2019

1 minute ago, leadeater said:

Too late, fairly sure he mentioned in a recent video (FP only from memory) that he's purchased the extra SSDs already.

Ahh that's unfortunate. In a future video he could turn the storage server into a expandable storage box and Mini-SAS it to the NVME server. This could effectively give him huge amounts of storage to load into the cold tier.

Sign In

These Servers are TOO EXPENSIVE

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites