Jump to content

Request for advice on PB-storage solutions

I turn to the LTT-forum to seek some advice on how to handle storage needs that goes beyond what normal consumers usually face. 

 

I work on a project in which we expect to produce several PBs of data (initially 2-10 PB, but maybe eventually upwards of 50 PB, not counting data backup/redundancy). The data will be a mix of very large files (250+ gb) and millions of small files. The files will be generated on cameras equipped with a 10GbE ethernet port and fast flash memory, which will need to be unloaded into our storage solution. The data will be processed in parallel with new data being generated, so a high read/write speed is necessary. 

 

My question is, if anyone here has any suggestions on where to look for solutions, either bought as a complete system or custom built. I'm aware of a few companies that deliver PetaByte-solutions, but I may be overlooking an obvious solution to my problem. 

 

Furthermore, what kind of disk scheme would you guys suggest? A raid-based system, such as Raid-5/6/10, JBOD or something else? 


Thanks for your input!

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, SimonRibergaard said:

I turn to the LTT-forum to seek some advice on how to handle storage needs that goes beyond what normal consumers usually face. 

 

I work on a project in which we expect to produce several PBs of data (initially 2-10 PB, but maybe eventually upwards of 50 PB, not counting data backup/redundancy). The data will be a mix of very large files (250+ gb) and millions of small files. The files will be generated on cameras equipped with a 10GbE ethernet port and fast flash memory, which will need to be unloaded into our storage solution. The data will be processed in parallel with new data being generated, so a high read/write speed is necessary. 

 

My question is, if anyone here has any suggestions on where to look for solutions, either bought as a complete system or custom built. I'm aware of a few companies that deliver PetaByte-solutions, but I may be overlooking an obvious solution to my problem. 

 

Furthermore, what kind of disk scheme would you guys suggest? A raid-based system, such as Raid-5/6/10, JBOD or something else? 


Thanks for your input!

Honestly when we are talking about PB's, you should go with a professional company that can offer a complete package. Because apart from storage, the performance needs to be there and the network needs to handle everything as well.

 

Maybe it's a good idea to contact a company like 45drives and ask them for help. This looks like a very big task 1 person shouldn't do on its own.

If you want my attention, quote meh! D: or just stick an @samcool55 in your post :3

Spying on everyone to fight against terrorism is like shooting a mosquito with a cannon

Link to comment
Share on other sites

Link to post
Share on other sites

As said above, at that scale, you're better off with enterprise solutions that take into account your exact needs, usage, budget.

HAL9000: AMD Ryzen 9 3900x | Noctua NH-D15 chromax.black | 32 GB Corsair Vengeance LPX DDR4 3200 MHz | Asus X570 Prime Pro | ASUS TUF 3080 Ti | 1 TB Samsung 970 Evo Plus + 1 TB Crucial MX500 + 6 TB WD RED | Corsair HX1000 | be quiet Pure Base 500DX | LG 34UM95 34" 3440x1440

Hydrogen server: Intel i3-10100 | Cryorig M9i | 64 GB Crucial Ballistix 3200MHz DDR4 | Gigabyte B560M-DS3H | 33 TB of storage | Fractal Design Define R5 | unRAID 6.9.2

Carbon server: Fujitsu PRIMERGY RX100 S7p | Xeon E3-1230 v2 | 16 GB DDR3 ECC | 60 GB Corsair SSD & 250 GB Samsung 850 Pro | Intel i340-T4 | ESXi 6.5.1

Big Mac cluster: 2x Raspberry Pi 2 Model B | 1x Raspberry Pi 3 Model B | 2x Raspberry Pi 3 Model B+

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for both of your quick replies!
 

I’m already talking to 45drives, but with an investment of this size, we will most likely have to put out a tender, so I want to make sure that I check out the different options available and that I don’t miss an obvious solution :)

 

Again, appreciate your input!

Link to comment
Share on other sites

Link to post
Share on other sites

For that amount of data and at the rates you probably want to be unloading it, you should be looking at SAN solutions; i'd be looking at talking with companies like NetApp & Nimble to see what they can offer and what they recommend rather than going to a forum on the internet asking about RAID levels. There is *so* much more to worry about when you're looking at this level of enterprise data storage.  Also consider backup and monitoring solutions you need to put in place for all that storage. Veamm have excellent integration with both the above vendors appliances, and something like Grafana is great for agentless monitoring. Also consider the access to the storage as well, you probably want an Active Directory environment to do role based access, or at the very least an LDAP. 

Spoiler

Desktop: Ryzen9 5950X | ASUS ROG Crosshair VIII Hero (Wifi) | EVGA RTX 3080Ti FTW3 | 32GB (2x16GB) Corsair Dominator Platinum RGB Pro 3600Mhz | EKWB EK-AIO 360D-RGB | EKWB EK-Vardar RGB Fans | 1TB Samsung 980 Pro, 4TB Samsung 980 Pro | Corsair 5000D Airflow | Corsair HX850 Platinum PSU | Asus ROG 42" OLED PG42UQ + LG 32" 32GK850G Monitor | Roccat Vulcan TKL Pro Keyboard | Logitech G Pro X Superlight  | MicroLab Solo 7C Speakers | Audio-Technica ATH-M50xBT2 LE Headphones | TC-Helicon GoXLR | Audio-Technica AT2035 | LTT Desk Mat | XBOX-X Controller | Windows 11 Pro

 

Spoiler

Server: Fractal Design Define R6 | Ryzen 3950x | ASRock X570 Taichi | EVGA GTX1070 FTW | 64GB (4x16GB) Corsair Vengeance LPX 3000Mhz | Corsair RM850v2 PSU | Fractal S36 Triple AIO + 4 Additional Venturi 120mm Fans | 14 x 20TB Seagate Exos X22 20TB | 500GB Aorus Gen4 NVMe | 2 x 2TB Samsung 970 Evo Plus NVMe | LSI 9211-8i HBA

 

Link to comment
Share on other sites

Link to post
Share on other sites

You need to be more specific with your needs. What you most likely will need is NAS (not block storage).

You mention 'high r/w speeds' - you need to be quite more specific - because at >PB size you'll often go also into NL (aka SATA) area with drives, since in the end it will be quite a lot of drives. If you really need extreme performance, you'll need to go SSD/NVMe way, which is a LOT more expensive, especially since you say you intend to store video, which is not compressible or possible to dedup.

Also, with workload you mention, tiering does not seem to be efficient.

You need to have more specific requirements for RFP and tender, and see from 'big boys' what they offer (like NetApp, EMC, HPE, ...).

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×