Jump to content

Moving 200TB of Data quickly?

How can I move 200TB of data off of my NAS, re-format, and put it back without taking weeks to do so?

The best solution I can come up with that doesn't cost an arm and a leg is just getting some High capacity SSD arrays to temporarily store the data. Does anyone else have a better idea?

 

A bit of background of why this needs to happen:

I'll start off by saying I am NOT an IT professional or a network engineer. But I do know enough to be dangerous - so I'm looking for suggestions from people who know more than me.

 

I manage a video editing NAS for a small corporate video team. It's a Synology Rack Station with 300TB of storage. It's currently divided into 2 volumes - since Synology limits volume size to 200TB. Since my team uses this server only for one purpose - it's severely limiting to have an arbitrary subdivision of storage.

 

According to the Synology documentation - you can do up to 1PB volumes on our model IF you are using Raid 6. Unfortunately I'm currently using Raid 5. Thus my conundrum of having to relocate ALL of the data.

 

Link to comment
Share on other sites

Link to post
Share on other sites

what model is the rack station, and what network setup?

 

I'd buy a second rackstation/nas and fill it with drives. Should copy pretty fast over 10/25/40GBE. Then you have a backup server incase something happens to the main one.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

It's an RS4017xs+ with 2 expansion units filled with 10TB drives.. So buying a duplicate setup might be a bit costly for us. As it stands, our Backup is LTO based - so speed is an issue there.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Fatty 227 said:

How can I move 200TB of data off of my NAS, re-format, and put it back without taking weeks to do so?

The best solution I can come up with that doesn't cost an arm and a leg is just getting some High capacity SSD arrays to temporarily store the data. Does anyone else have a better idea?

 

A bit of background of why this needs to happen:

I'll start off by saying I am NOT an IT professional or a network engineer. But I do know enough to be dangerous - so I'm looking for suggestions from people who know more than me.

 

I manage a video editing NAS for a small corporate video team. It's a Synology Rack Station with 300TB of storage. It's currently divided into 2 volumes - since Synology limits volume size to 200TB. Since my team uses this server only for one purpose - it's severely limiting to have an arbitrary subdivision of storage.

 

According to the Synology documentation - you can do up to 1PB volumes on our model IF you are using Raid 6. Unfortunately I'm currently using Raid 5. Thus my conundrum of having to relocate ALL of the data.

 

There has to be a firm that rents out servers for just this scenario.  It's not an uncommon occurrence.

 

or is service based.


Either way, be prepared to come out of pocket a good bit I would figure.

"Do what makes the experience better" - in regards to PCs and Life itself.

 

Onyx AMD Ryzen 7 7800x3d / MSI 6900xt Gaming X Trio / Gigabyte B650 AORUS Pro AX / G. Skill Flare X5 6000CL36 32GB / Samsung 980 1TB x3 / Super Flower Leadex V Platinum Pro 850 / EK-AIO 360 Basic / Fractal Design North XL (black mesh) / AOC AGON 35" 3440x1440 100Hz / Mackie CR5BT / Corsair Virtuoso SE / Cherry MX Board 3.0 / Logitech G502

 

7800X3D - PBO -30 all cores, 4.90GHz all core, 5.05GHz single core, 18286 C23 multi, 1779 C23 single

 

Emma : i9 9900K @5.1Ghz - Gigabyte AORUS 1080Ti - Gigabyte AORUS Z370 Gaming 5 - G. Skill Ripjaws V 32GB 3200CL16 - 750 EVO 512GB + 2x 860 EVO 1TB (RAID0) - EVGA SuperNova 650 P2 - Thermaltake Water 3.0 Ultimate 360mm - Fractal Design Define R6 - TP-Link AC1900 PCIe Wifi

 

Raven: AMD Ryzen 5 5600x3d - ASRock B550M Pro4 - G. Skill Ripjaws V 16GB 3200Mhz - XFX Radeon RX6650XT - Samsung 980 1TB + Crucial MX500 1TB - TP-Link AC600 USB Wifi - Gigabyte GP-P450B PSU -  Cooler Master MasterBox Q300L -  Samsung 27" 1080p

 

Plex : AMD Ryzen 5 5600 - Gigabyte B550M AORUS Elite AX - G. Skill Ripjaws V 16GB 2400Mhz - MSI 1050Ti 4GB - Crucial P3 Plus 500GB + WD Red NAS 4TBx2 - TP-Link AC1200 PCIe Wifi - EVGA SuperNova 650 P2 - ASUS Prime AP201 - Spectre 24" 1080p

 

Steam Deck 512GB OLED

 

OnePlus: 

OnePlus 11 5G - 16GB RAM, 256GB NAND, Eternal Green

OnePlus Buds Pro 2 - Eternal Green

 

Other Tech:

- 2021 Volvo S60 Recharge T8 Polestar Engineered - 415hp/495tq 2.0L 4cyl. turbocharged, supercharged and electrified.

Lenovo 720S Touch 15.6" - i7 7700HQ, 16GB RAM 2400MHz, 512GB NVMe SSD, 1050Ti, 4K touchscreen

MSI GF62 15.6" - i7 7700HQ, 16GB RAM 2400 MHz, 256GB NVMe SSD + 1TB 7200rpm HDD, 1050Ti

- Ubiquiti Amplifi HD mesh wifi

 

Link to comment
Share on other sites

Link to post
Share on other sites

Shooting from the hip - buying a duplicate setup would be around $10k all in. Unfortunately I don't think I can get that kind of budget for this operation.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Fatty 227 said:

How can I move 200TB of data off of my NAS, re-format, and put it back without taking weeks to do so?

The best solution I can come up with that doesn't cost an arm and a leg is just getting some High capacity SSD arrays to temporarily store the data. Does anyone else have a better idea?

 

A bit of background of why this needs to happen:

I'll start off by saying I am NOT an IT professional or a network engineer. But I do know enough to be dangerous - so I'm looking for suggestions from people who know more than me.

 

I manage a video editing NAS for a small corporate video team. It's a Synology Rack Station with 300TB of storage. It's currently divided into 2 volumes - since Synology limits volume size to 200TB. Since my team uses this server only for one purpose - it's severely limiting to have an arbitrary subdivision of storage.

 

According to the Synology documentation - you can do up to 1PB volumes on our model IF you are using Raid 6. Unfortunately I'm currently using Raid 5. Thus my conundrum of having to relocate ALL of the data.

 

As long as its not too heavy, you can probably move it in a couple minutes.

 

5 minutes ago, Fatty 227 said:

It's an RS4017xs+ with 2 expansion units filled with 10TB drives.. So buying a duplicate setup might be a bit costly for us. As it stands, our Backup is LTO based - so speed is an issue there.

LTO is probably your best bet here, especially if that 200TB is important. Not sure how you have 200TB of data, especially important data, so having an actual LTO backup system would be worth it not only for the migration but as a backup source. You'd be buying like 100 LTO6 for that much data though, so either way, this is going to be expensive.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Agall said:

As long as its not too heavy, you can probably move it in a couple minutes.

 

LTO is probably your best bet here, especially if that 200TB is important. Not sure how you have 200TB of data, especially important data, so having an actual LTO backup system would be worth it not only for the migration but as a backup source. You'd be buying like 100 LTO6 for that much data though, so either way, this is going to be expensive.

We do a VERY high volume of content with our team (Around 300 videos per quarter) and we BROLL from projects in the last 2 years quite often - Thus the need for immediate access to that much data.

 

We currently use an LTO8 system which gets us just under 10TB per tape. Each tape takes about 24 hours to write - so that ends up being about 20 days just to get the data off 😞

Link to comment
Share on other sites

Link to post
Share on other sites

...Based on what I'm hearing - sounds like I've gotten myself stuck between a rock and a hard place of time vs money?

$10k to replicate the NAS setup and get it done quick vs using LTO and taking over a month to do this?

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Fatty 227 said:

We do a VERY high volume of content with our team (Around 300 videos per quarter) and we BROLL from projects in the last 2 years quite often - Thus the need for immediate access to that much data.

 

We currently use an LTO8 system which gets us just under 10TB per tape. Each tape takes about 24 hours to write - so that ends up being about 20 days just to get the data off 😞

I still use LTO6 tapes but in a Superloader.

 

Either way, you should have redundancy in that data. Just having RAID6 or so in the enclosure isn't enough for how valuable that data is. Enclosures fail and sometimes its not recoverable, so having that in a secondary location, even as an archive if need be is worth it.

 

Unless you'd want to plan for a couple weeks of downtime, you're going to need a second enclosure if you're planning on reformatting the whole thing. In that case, if there's a performance reason for you to be doing all of this, you could simply buy a new enclosure and have it setup for this purpose. You'd then be still spending a couple of days transferring that much data.

 

TLDR: get your organization to buy a secondary enclosure for this and for redundancy. It'll be expensive to have 200TB of RAID5 or RAID6 data, but worth it in the long run. Losing all that data would be extremely costly.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Agall said:

I still use LTO6 tapes but in a Superloader.

 

Either way, you should have redundancy in that data. Just having RAID6 or so in the enclosure isn't enough for how valuable that data is. Enclosures fail and sometimes its not recoverable, so having that in a secondary location, even as an archive if need be is worth it.

 

Unless you'd want to plan for a couple weeks of downtime, you're going to need a second enclosure if you're planning on reformatting the whole thing. In that case, if there's a performance reason for you to be doing all of this, you could simply buy a new enclosure and have it setup for this purpose. You'd then be still spending a couple of days transferring that much data.

 

TLDR: get your organization to buy a secondary enclosure for this and for redundancy. It'll be expensive to have 200TB of RAID5 or RAID6 data, but worth it in the long run. Losing all that data would be extremely costly.

I can't argue with any of these points haha. A couple of weeks of downtime is not something we can get away with - a couple of days is. But I really do need to just get management to cough up the budget 🤣

Link to comment
Share on other sites

Link to post
Share on other sites

With time being such a factor here I would get a second setup running and get people using it. Prioritize the currently in use project then transfer the rest in batches. Having a second nas running once the transfer is done you can sync the two machines and use the old one as second high speed backup (once it is re-configured). Keep in mind 1 is none and 2 is one and critical data should always be backed up elsewhere, remember the rule of 3. Who ever needs to approve of it needs to be given a two options and what it will cost. No matter what route you go time will be a huge factor and down time can quickly outstrip 10k for another nas. If you can get a second nas running with the currently needed date going over a very long weekend then start moving the rest it should be a relatively painless transition. Make sure to copy and not delete anything until the new one is ready to roll. Your suggestion of using a TON of ssd could work buy opens you up to a serious risk of data loss and will take a very long time. 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Fatty 227 said:

I can't argue with any of these points haha. A couple of weeks of downtime is not something we can get away with - a couple of days is. But I really do need to just get management to cough up the budget 🤣

Time is money, how much do they want to lose due to down time? A single week of down time can easily cost more between labor, lost business, etc then the cost of another nas

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Fatty 227 said:

I can't argue with any of these points haha. A couple of weeks of downtime is not something we can get away with - a couple of days is. But I really do need to just get management to cough up the budget 🤣

If the unit is a few years old with drives bought at the same time, you could easily make the argument that you could have cascading hard drive failures at anytime which could result in a complete loss of data. The only way around this is a secondary enclosure that data can be replicated to that would also allow for you to do this optimization. 

 

Really its the right answer, its kind of insane to think y'all have 200TB of priceless data that isn't replicated to another source. Sure, you might have LTO8 tapes of recent stuff, but unless you have a whole archive of the unit, its not actually backed up. I imagine you just rotate a set of LTO8 tapes for recovery of recent data, in case it gets corrupted/destroyed.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Agall said:

If the unit is a few years old with drives bought at the same time, you could easily make the argument that you could have cascading hard drive failures at anytime which could result in a complete loss of data. The only way around this is a secondary enclosure that data can be replicated to that would also allow for you to do this optimization. 

 

Really its the right answer, its kind of insane to think y'all have 200TB of priceless data that isn't replicated to another source. Sure, you might have LTO8 tapes of recent stuff, but unless you have a whole archive of the unit, its not actually backed up. I imagine you just rotate a set of LTO8 tapes for recovery of recent data, in case it gets corrupted/destroyed.

To be fair not ALL of the data is what could be considered "priceless". As it Stands LTO holds all of the stuff that's "archived". The projects older than a couple of years. LTO tapes also have backups of the stuff that is "priceless". That being said - I do feel uncomfortable with the amount of redundancy  - even if the data is something that isn't TOP priority. It still represents a lot of value

Link to comment
Share on other sites

Link to post
Share on other sites

I think you need to separate needed data from non-needed data.
surely they dont need 200TB of data live on the server for one month?
if you can trim that down to 20TB, then you can offload the rest to tape over the course of a week and a half or so, then buy 20ish 2TB nvme drives and offload the working data to those, verify, do your work, and write it all back. that should be able to be done in a night or weekend.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On some Synology NAS you can upgrade from RAID 5 to RAID 6 but in this situation it would takes weeks to complete this job.

Not sure if you can access the array while it's converted.

 

What you could do is switching to TrueNAS:

1. Buy a server or PC with a lot of drive sledges (roughly $1k).

2. Buy 150-200TB worth of storage/HDDs which would be roughly $2k.

2. Setup the ZFS with this capacity and copy the first Synology pool.

3. Take the drives (pool 1) out of the Synology and add them to the TrueNAS install.

4. Now copy the second Synology pool.

5. With the remaining HDDs from the synology NAS (pool 2) add these drives to TrueNAS as hotspares or put them in cold storage.

People never go out of business.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Fatty 227 said:

To be fair not ALL of the data is what could be considered "priceless". As it Stands LTO holds all of the stuff that's "archived". The projects older than a couple of years. LTO tapes also have backups of the stuff that is "priceless". That being said - I do feel uncomfortable with the amount of redundancy  - even if the data is something that isn't TOP priority. It still represents a lot of value

My organization has a similar graphics archive of similar structure/value. We had redundant chassis' with about 26TB each and one failed (which was the primary). I was able to take the secondary to duplicate to a new chassis.

 

Having just one chassis to duplicate from is miles better than having to pull from tape in the event of multiple drive failures. I would strongly suggest that if they want to reformat the current storage unit, to simply buy another one where that formatting can be done without potentially losing data. The old unit can act as an a backup.

 

200TB is a lot of data, so nothing to scoff at. It should be redundant on both disk and tape, the tape being off site or remote enough to survive a fire.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, FlyingPotato_is_taken said:

On some Synology NAS you can upgrade from RAID 5 to RAID 6 but in this situation it would takes weeks to complete this job.

Not sure if you can access the array while it's converted.

 

What you could do is switching to TrueNAS:

1. Buy a server or PC with a lot of drive sledges (roughly $1k).

2. Buy 150-200TB worth of storage/HDDs which would be roughly $2k.

2. Setup the ZFS with this capacity and copy the first Synology pool.

3. Take the drives (pool 1) out of the Synology and add them to the TrueNAS install.

4. Now copy the second Synology pool.

5. With the remaining HDDs from the synology NAS (pool 2) add these drives to TrueNAS as hotspares or put them in cold storage.

This sounds very doable - and I do run TrueNAS at home - so I do have at least some experience there haha.

But where are you finding 200TB of storage for $2k? Please tell me where so I can buy it all! Haha

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Fatty 227 said:

But where are you finding 200TB of storage for $2k? Please tell me where so I can buy it all! Haha

150TB is possible. 

200TB would be a good offer.

 

So if you could free up 1 pool to less than 150TB this would be enough.

People never go out of business.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Fatty 227 said:

How can I move 200TB of data off of my NAS, re-format, and put it back without taking weeks to do so?

I have some bad news, that is essentially unavoidable when seeking any semblance of reasonability in cost.

 

Absolute best case if you can sustain 400MB/s it's 5 days for each copy so 10 days.

 

2 hours ago, Fatty 227 said:

I manage a video editing NAS for a small corporate video team. It's a Synology Rack Station with 300TB of storage. It's currently divided into 2 volumes - since Synology limits volume size to 200TB. Since my team uses this server only for one purpose - it's severely limiting to have an arbitrary subdivision of storage.

Why not just convert the RAID level from 5 to 6?

https://kb.synology.com/en-br/DSM/help/DSM/StorageManager/storage_pool_change_raid_type?version=7

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Agall said:

-snap-

Sounds like you and @Fatty 227 both need to start looking at different storage solutions with more inbuilt redundancy like a NetApp or cheaper option a QNAP Dual Controller ZFS model.

 

Going with NetApp, probably QNAP too, you'd be gaining access redundancy through having HA controllers and dual paths to all disks through dual redundant SAS controllers in the disk shelves so there is no single point of failure. Totally up front though, NetApp is decently more expensive than Synology/QNAP.

 

Obviously NetApp and QNAP are not the only options in this regard just what I have used most commonly. HPE also have many good options along with Dell/EMC etc.

 

P.S. You could be cheeky and ask/rent an AWS Snowball Edge NVMe which has 210TB of capacity copy your data to that and back and then tell AWS "nah never mind, change our mind here is your equipment back". Just don't tell me you did this 😉

Link to comment
Share on other sites

Link to post
Share on other sites

@leadeater - Unfortunately it won't let me change raid type as I'm two deep so to speak. The pool is made of 3 Raid 5 Arrays together - so it's not giving me the option to change. Also - without a complete snapshot backup - that scares me haha.

 

10 Days I'd say is doable down time. A month is not.

 

Hahahahaha not me contacting AWS right now 🤣

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, Fatty 227 said:

@leadeater - Unfortunately it won't let me change raid type as I'm two deep so to speak. The pool is made of 3 Raid 5 Arrays together - so it's not giving me the option to change. Also - without a complete snapshot backup - that scares me haha.

I think you maybe have to add more disks (3 in your case) for the option to come up, I've never done a migration with Synology or really used them so don't really know the practical details of doing to for them. Problem is spending money to add disks to find out no still won't let you do it.

 

Edit:

Actually never mind, think you are using the RAID group feature which means

Quote

The RAID type of RAID arrays in a RAID Group cannot be changed. The RAID Group feature is available on specific models only.

 

Link to comment
Share on other sites

Link to post
Share on other sites

I would take the approach of "if it aint broke, dont fix it", unless the company are adamant that they want a single volume. 

If its a mixture of "Working Data" and "Archive Data", then I would do a data shuffle to redesign the volumes to be working projects and archived projects if that makes more sense to them to be easier to find what theyre looking for. 

Spoiler

Desktop: Ryzen9 5950X | ASUS ROG Crosshair VIII Hero (Wifi) | EVGA RTX 3080Ti FTW3 | 32GB (2x16GB) Corsair Dominator Platinum RGB Pro 3600Mhz | EKWB EK-AIO 360D-RGB | EKWB EK-Vardar RGB Fans | 1TB Samsung 980 Pro, 4TB Samsung 980 Pro | Corsair 5000D Airflow | Corsair HX850 Platinum PSU | Asus ROG 42" OLED PG42UQ + LG 32" 32GK850G Monitor | Roccat Vulcan TKL Pro Keyboard | Logitech G Pro X Superlight  | MicroLab Solo 7C Speakers | Audio-Technica ATH-M50xBT2 LE Headphones | TC-Helicon GoXLR | Audio-Technica AT2035 | LTT Desk Mat | XBOX-X Controller | Windows 11 Pro

 

Spoiler

Server: Fractal Design Define R6 | Ryzen 3950x | ASRock X570 Taichi | EVGA GTX1070 FTW | 64GB (4x16GB) Corsair Vengeance LPX 3000Mhz | Corsair RM850v2 PSU | Fractal S36 Triple AIO | 12 x 8TB HGST Ultrastar He10 (WD Whitelabel) | 500GB Aorus Gen4 NVMe | 2 x 2TB Samsung 970 Evo Plus NVMe | LSI 9211-8i HBA

 

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, leadeater said:

Sounds like you and @Fatty 227 both need to start looking at different storage solutions with more inbuilt redundancy like a NetApp or cheaper option a QNAP Dual Controller ZFS model.

 

Going with NetApp, probably QNAP too, you'd be gaining access redundancy through having HA controllers and dual paths to all disks through dual redundant SAS controllers in the disk shelves so there is no single point of failure. Totally up front though, NetApp is decently more expensive than Synology/QNAP.

 

Obviously NetApp and QNAP are not the only options in this regard just what I have used most commonly. HPE also have many good options along with Dell/EMC etc.

 

P.S. You could be cheeky and ask/rent an AWS Snowball Edge NVMe which has 210TB of capacity copy your data to that and back and then tell AWS "nah never mind, change our mind here is your equipment back". Just don't tell me you did this 😉

The problem with those solutions is cost to benefit, along with the higher upfront cost. Its also still putting your eggs in one basket too, where its susceptible to other types of failures compared to a separate redundant unit.

 

I've found it difficult to make even the most robust argument on why the company should spend more upfront. It took a hilarious amount of effort just to convince them to spend money on a robust office printer rather than continuing to buy and maintain a dozen small/medium office printers that fail after a couple years of heavy use (cost/click was probably the only reason they ended up going with it). 

 

In this case, attempting to sell a solution with +$10k upfront costs would likely prevent them from doing it all together. I'm not saying it isn't worth proposing, but just very unlikely for a small/medium company.

 

Even with redundancy within the chassis, you're still relying on a single layer that could otherwise be duplicated for about the same cost.

 

Most executives seem like they'd rather run something till it dies than proactively prevent downtime with proactive replacement. Maybe if there's a technically educated executive who would otherwise know better.

 

I'm not in sales, but I have seen two different groups of senior executives as a civilian with the same role who both followed the same characteristics.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×