Jump to content

These Servers are TOO EXPENSIVE

nicklmg

With your resources Linus.

 

 

I believe you could get ahold of some of those optane dimm modules... ? (no not that optane memory mess intel released first)

edit... the official name is: optane dc persistent memory

 

Use those as a "shit ton" of RAM for ZFS

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

Also.. Linus.

 

 

Now that there are 1.5TB 2.5" optane 905p drives...

Why not upgrade the nvme storage server that you have full of the intel 750 SSDs?

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

Well, it could have been Epyc...

...or you can just shell out ten grand to get one of those 1U Intel Ruler SSD storage server.

"Mankind’s greatest mistake will be its inability to control the technology it has created."

Link to comment
Share on other sites

Link to post
Share on other sites

At our office we tried to implement a tiered storage system for a 40 TB server (20x2 TB HDD in Raid 0 - no redundancy) with 4 x 200 GB SSD storage for writing large amounts of data.  Unfortunately, we ran into a limitation with the write-back cache size (1 GB).  Our cluster of 15 machines each with 2 Xeon E-2690v3 (24 cores per machine) was able to fill that write back cache pretty quickly and therefore the write speed reverted to that of the Raid 0 HDD system.  The SSDs were basically useless.  I know this is write vs read but we were disappointed with the performance overall.  Our newer storage system is 20 TB of 20x1TB SSD Raid-0.  Much faster at writing than the older system with the tiered storage.  It cost more... but well worth it.

Link to comment
Share on other sites

Link to post
Share on other sites

Have you thought about using a framework like Apache Ignite?  I know you'd have to sort out the same sort of policies regarding data aging and prioritization but it is a scalable solution.  Ignite is an in memory data fabric.  From what I understand Ignite can work with a distributed file system like HDFS, GPFS (aka Spectrum Scale), etc.  Then again 7k for ~8.4 TB of nvme drives sounds less expensive than however many servers you would need to have 8.4 TB of available memory. 

 

I know Ignite is used by a major SaaS provider to serve their MySQL DB in memory for their clients.

 

Anyways just thought I'd throw it out there. 

Link to comment
Share on other sites

Link to post
Share on other sites

Let me start by saying I don't have nearly the amount of data you guys have or workload.

But one thing I did was, I am using FreeNAS and I have 2 nvme on RAID 0 where I dump my files from the sd cards. Then I copy the files to my local nvme for editing. I export to nvme locally and then send the exported file to the nvme RAID 0. From there I copy all files to HDD raid 1 I have setup. 2x 10TB disks.

While the files are being copied from nvme to hdd, the speed really doesn't matter much because you are not really using it anymore.

 

At some point I will create some automation for this, but I got the idea from AWS S3. Send in as a tier, then at some point you won't touch that file, so move it to glacier and call it a day.

Link to comment
Share on other sites

Link to post
Share on other sites

Storage tiering works great in Ceph, but it would be an overkill for your use-case tho

Link to comment
Share on other sites

Link to post
Share on other sites

Spending thousand of dollars on storage is always a pain,but quite necessary. Going forward with increased data and complexity it starts to be a big pain to avoid performance issues. At some point it does make sense to invest in some type of tiered storage from any vendor. After using and maintaining HPE 3pars going to the Cisco UCS was the best thing for me as it gets me something that is really easy to manage and say goodbye to most of the the bottleneck issues you might see normally. Sooner is always better as migrations of big data is a great pain in the butt, especially if you have a lot of datastores and VM's.

Hope you find the best fit for you :). Love LTT, hope you do another maintenance video as it was the funniest thing I have ever seen :D

Link to comment
Share on other sites

Link to post
Share on other sites

I'm sure Linus will get this all sorted, just in time for his employees to convince him to quadruple the number of pixels for the third time.  

Link to comment
Share on other sites

Link to post
Share on other sites

As I watched this video, a feeling crept on me that approaching this problem with a Windows Server solution is not guaranteed to have full control as you wish for. I'm a Linux server admin and creating a solution as you expressed in the video is pretty straight forward.

 

You set up your server with high speed low capacity and mechanical drives, for the sake of the argument two nvme SSDs of 1 TB each and 4 HDDs of 10 TB each (and a small SSD for the operating system). You create two file systems, ZFS or BtrFS, one of 2 TB on the nvme SSDs and one of 20 TB mirrored on the HDDs. Both file systems are opened to the network by Samba shares, one for "work flow" and one for archives.

Now you create an archiving / de-archiving script which handles the decision when to move projects around. Let's assume that every project begins with a folder with the projects name with all the files below it. I know that the project will be around 400 GB, but say they are a 10th of the size for this argument. 

We declare "enough space" as 25% or more left on the fast drives, so 500 GB or more free is "enough space". If the script runs once a day, during night time when probably no one edits? , this leaves with an example of 40 GB per project space for 12 new projects during the day before the script runs again.

 

First the script will perform an inventory. This is done by using the Linux file parameter "accessed time" (atime), which is updated every time a file is opened, whether or not is has been edited. First it scans the atime for the fast drives and creates an list of the projects ordered by atime: last accessed first, projects that hasn't been accessed frequently last. Along with this inventory the total project size is also archived. This list can become to look something like this (note that the first line is for explanation):

project atime size (GB)

PROJ1 2019-03-04 40,5

PROJ3 2019-03-04 39,3

PROJ2 2019-03-02 40,3

(and so on, you'll get the picture)

 

Secondly another archive list is created, of all the projects in the archive with the atime. If a project there is accessed, the script will note that this project is becoming "important" again, flagging it for move to the fast storage.

 

You can choose to leave the files on fast storage until you run out of space (say at 75% disk space used), or move a project by default to archive when it has not been accessed for a week. And move the project(s) from archive to fast when it has been accessed in the past day, keeping track of the disk space usage on the fast drives, potentially flagging the project with the oldest atime on fast storage to be moved to archive.

 

With a script like this you can be pretty versatile, building it completely to your liking. It is also possible to create "one network share" with both filesystems in it, keeping the user experience (as shown by Taran in the video) the least techy.

 

@LinusTech I am open to help exploring this possibility.

Edited by Tyronialy
added linus tag
Link to comment
Share on other sites

Link to post
Share on other sites

Dunno if a good idea, but just syncing beyond compare with the calendar and putting the active projects on the WHONNOCK and inacctive to DELTA 1/2 automatically, then use the leftover space on QQ server for some projects on scheduled client meetings.

Edited by kubo6472
Mistype
Link to comment
Share on other sites

Link to post
Share on other sites

This is why I always say cache drives are stupid.

The data is constantly moving back and forth from HDD to SSD, resulting in higher drive usage which impedes regular use, as well as more drive wear.

It is much better to have separate drives and manually choose what goes where, so that you get to choose exactly what is sped up and what is not.

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, Tyronialy said:

I'm a Linux server admin and creating a solution as you expressed in the video is pretty straight forward.

 

You set up your server with high speed low capacity and mechanical drives, for the sake of the argument two nvme SSDs of 1 TB each and 4 HDDs of 10 TB each (and a small SSD for the operating system). You create two file systems, ZFS or BtrFS, one of 2 TB on the nvme SSDs and one of 20 TB mirrored on the HDDs. Both file systems are opened to the network by Samba shares, one for "work flow" and one for archives.

That's not actually storage tiering, it's archiving, and you can do that with Windows also. What I actually recommended was similar to what you suggested but a more standard automated system. File archiving with stubbing is what I think is the best solution for having fast storage for live projects and lots of capacity with no manual or script movement of files or file paths chaining, stubbing is pretty great for that.

 

If you move files around, meaning the file paths change, it's pretty annoying for the editors since you have to re-path everything in the projects.

Link to comment
Share on other sites

Link to post
Share on other sites

Get a HP MSA, pack it full of SAS and SSD drives. Set up tiering.

Link to comment
Share on other sites

Link to post
Share on other sites

What about just caching the files on the editors' machines? Instead of having the cache be in a centralized location, each workstation would have its own cache and you would also have less network traffic. I am not sure if Windows supports this but there may be some third party software to do it. The idea would be that when you attempt to open a file in the cache directory, if it does not exist (or there is a mismatch of last-modified time stamp) then Windows (or whatever third party software) would stream in the file, in its entirety or maybe in certain sized chunks, to the cache, and meanwhile also serve it to the file handle that is trying to read. As for the "in its entirety" / "chunks" this would mean that there would not have to be an explicit request by the application to read a small region, in order to go ahead and also load what is likely to be read next anyway since it's nearby the requested region. Although I'm not sure if Windows / certain file APIs used by the application already try to pre-load more data themselves and buffer it in memory.

Link to comment
Share on other sites

Link to post
Share on other sites

Hey Linus, before splurging on more NVME drives, look into Storage Spaces a little more. The more drives you have in both cold and hot tier will increase performance. As data is moved from cold to hot it will load the cold data sequentially to avoid random reads. You could also add a medium tier with lower end SATA SSDs to improve performance even more. Let me know if you would want to discuss.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, twstdude0to1 said:

Hey Linus, before splurging on more NVME drives, look into Storage Spaces a little more. The more drives you have in both cold and hot tier will increase performance. As data is moved from cold to hot it will load the cold data sequentially to avoid random reads. You could also add a medium tier with lower end SATA SSDs to improve performance even more. Let me know if you would want to discuss.

Too late, fairly sure he mentioned in a recent video (FP only from memory) that he's purchased the extra SSDs already.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Too late, fairly sure he mentioned in a recent video (FP only from memory) that he's purchased the extra SSDs already.

Ahh that's unfortunate. In a future video he could turn the storage server into a expandable storage box and Mini-SAS it to the NVME server. This could effectively give him huge amounts of storage to load into the cold tier. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×