Jump to content

Backing up multiple TB of data!?

Hihi,

 

I'm looking for some advice on backing up large quantities of data to an offsite storage provider.

This is for the company I work for so we're dealing with important data and lots of it. In the realm of 32TB and more.

 

I obviously cannot go into detail about the companies practices.

 

We have two 16TB Windows shares that hold expensive data that's mainly output from sensors and such.
It can cost £££££'s to recapture this sensor data so it's extremely important it's backed up.

 

We're looking at backing it up to something like Amazon Glacier. However we have no idea how we'd get the data from the shares to the storage provider.

We want to keep storage size low, for example taking snapshots every week or two and being able to delete old full snapshot with differential backups inbetween.

 

We've recently purchased a 48TB Synology storage server we can use as a temporary storage location.
e.g. copying the shares to the storage server then performing the backup on that rather than trying to backup from a Windows share directly.

 

In previous experience with two different backup solution providers they both failed to manage this, one couldn't manage the quantity of data and the other couldn't successfully copy from the Windows shares without file read errors.

 

Any advice on how other companies manage to backup large amounts of data to the cloud reliably would be really appreciated.

I've looked at Arq, Veeam and others but can't find one that matches our needs.

 

Many thanks!

Link to comment
Share on other sites

Link to post
Share on other sites

Backblaze is quite good, $5/Month/Terabyte, and it works well with all kinds of files, quite relayable. They also have Synology integration.

But do keep in mind that it is only for backups, not for long term storage.

I only see your reply if you @ me.

This reply/comment was generated by AI.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Origami Cactus said:

Backblaze is quite good, $5/Month/Terabyte, and it works well with all kinds of files, quite relayable. They also have Synology integration.

But do keep in mind that it is only for backups, not for long term storage.

I did look into this, however I'm in need of a cold archive system where files can be pulled from within the last 7-10 years due to law and regulation surrounding the data we process where I work.

 

It's a nightmare trying to keep so much data for so long when 1-2TB of data can be produced in one week for one project.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, MajesticFudgie said:

I did look into this, however I'm in need of a cold archive system where files can be pulled from within the last 7-10 years due to law and regulation surrounding the data we process where I work.

 

It's a nightmare trying to keep so much data for so long when 1-2TB of data can be produced in one week for one project.

Mhm. I thought Backblaze also had a cold archive feature if needed, idk if they still have it.

Well then, Amazon's cold archive is pretty good too.

I only see your reply if you @ me.

This reply/comment was generated by AI.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Origami Cactus said:

Mhm. I thought Backblaze also had a cold archive feature if needed, idk if they still have it.

Well then, Amazon's cold archive is pretty good too.

Yeah,

 

The main issue is getting the data into the cloud with the ability to get rid of old backups and grab a file thats needed at a moments notice.

 

Wasabi looks like a decent storage provider. But again, getting a large amount of data into the cloud reliably is an issue.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, MajesticFudgie said:

Yeah,

 

The main issue is getting the data into the cloud with the ability to get rid of old backups and grab a file thats needed at a moments notice.

 

Wasabi looks like a decent storage provider. But again, getting a large amount of data into the cloud reliably is an issue.

if you in the US; amazon glazier had a feature when they send a van do you with lots of hdds, and then they pick your info up. IDK how much it costs..

I only see your reply if you @ me.

This reply/comment was generated by AI.

Link to comment
Share on other sites

Link to post
Share on other sites

I was going to say that you should just backup locally and ship the drives to a datacenter and have them mount it if you need access. Or create your own server and do a colo and ship the drives there and have them mount it.

Link to comment
Share on other sites

Link to post
Share on other sites

Backblaze B2 will serve you well for the file storage, for the managing of which files are uploaded and retrieveing specific ones, there are many cloud data management companies that work with B2 as a storage backend. I don’t have any specific ones to recommend. Many of the content management companies are set up for video management but I know there are some that do general file management.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

AWS has their Snowball which is effectively a portable file server in a case specifically meant for transport. They ship the Snowball to you, you load the data onto it, and ship it back to AWS where they load your data into a S3 bucket.

[Out-of-date] Want to learn how to make your own custom Windows 10 image?

 

Desktop: AMD R9 3900X | ASUS ROG Strix X570-F | Radeon RX 5700 XT | EVGA GTX 1080 SC | 32GB Trident Z Neo 3600MHz | 1TB 970 EVO | 256GB 840 EVO | 960GB Corsair Force LE | EVGA G2 850W | Phanteks P400S

Laptop: Intel M-5Y10c | Intel HD Graphics | 8GB RAM | 250GB Micron SSD | Asus UX305FA

Server 01: Intel Xeon D 1541 | ASRock Rack D1541D4I-2L2T | 32GB Hynix ECC DDR4 | 4x8TB Western Digital HDDs | 32TB Raw 16TB Usable

Server 02: Intel i7 7700K | Gigabye Z170N Gaming5 | 16GB Trident Z 3200MHz

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, 2FA said:

AWS has their Snowball which is effectively a portable file server in a case specifically meant for transport. They ship the Snowball to you, you load the data onto it, and ship it back to AWS where they load your data into a S3 bucket.

Backblaze has the same thing which they call Fireball. While either one helps with initial ingest, it doesn’t answer the question of how new data will be uploaded, and both new or existing data will be retrieved. That is where a software provider that uses AWS or Backblaze as a backend comes in.

Looking to buy GTX690, other multi-GPU cards, or single-slot graphics cards: 

 

Link to comment
Share on other sites

Link to post
Share on other sites

With that amount of data generated per week, if you are wanting it offsite I am hoping you have a decent enough connection.

 

Personally I would just host an off-site backup file server and then use Veeam as a backup tool to backup the server with the shares on it to the off-site backup file server.  You can create a local backup copy first (and have it driven to the server); that way you don't have to transfer terabytes over the connection.  Just remember to do encrypted backups (and with veeam use WAN target).  Just my opinion though, it could greatly depend on how you actually have things setup, and what is at your disposal

3735928559 - Beware of the dead beef

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, MajesticFudgie said:

The main issue is getting the data into the cloud with the ability to get rid of old backups and grab a file thats needed at a moments notice.

That's were a decent backup product comes in or a data management tool to manage the archiving of data.

 

The bigger question I have is are you actually looking for an archive solution or a backup solution, these are different things that serve different needs even though they are very similar. Sounds to me like you need archiving but traditional backups will work for you too.

 

I would look at something like Commvault with Solution Set licensing, get a DL160/DL360 like server with dual 2x SSD RAID 1 storage for DDB and Index Cache and dedup the data to your Synology NAS then AUX Copy the data into the cloud. You'll save a lot of cloud storage cost by sending it to cloud storage deduplicated and Commvault will manage the data retention and pruning of data.

 

I'd also be careful of using Glacier just because it looks cheap, there is additional costs around using it and last I check you can't directly put data in to Glacier you have to land it on EC2 then setup a tiering policy to move it in to Glacier (this may have changed). Azure Archive works in a similar fashion.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

That's were a decent backup product comes in or a data management tool to manage the archiving of data.

 

The bigger question I have is are you actually looking for an archive solution or a backup solution, these are different things that serve different needs even though they are very similar. Sounds to me like you need archiving but traditional backups will work for you too.

 

I would look at something like Commvault with Solution Set licensing, get a DL160/DL360 like server with dual 2x SSD RAID 1 storage for DDB and Index Cache and dedup the data to your Synology NAS then AUX Copy the data into the cloud. You'll save a lot of cloud storage cost by sending it to cloud storage deduplicated and Commvault will manage the data retention and pruning of data.

 

I'd also be careful of using Glacier just because it looks cheap, there is additional costs around using it and last I check you can't directly put data in to Glacier you have to land it on EC2 then setup a tiering policy to move it in to Glacier (this may have changed). Azure Archive works in a similar fashion.

Hm,

 

Well we're needing to archive older data as well as backup recent data. We need to be able to pull a file that's been deleted/archived anytime in the past several years.

Generally, because we're using a limited size share we clear it out by deleting files, they can later be found in backups if they're needed again. a file normally sits for a good few months after last time it was needed.

 

We're trying to avoid buying more storage, we really want to get data offsite in the event of a natural disaster or fire.

 

Wasabi is looking like a decent alternative to Glacier in terms of pricing.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, MajesticFudgie said:

Hm,

 

Well we're needing to archive older data as well as backup recent data. We need to be able to pull a file that's been deleted/archived anytime in the past several years.

Generally, because we're using a limited size share we clear it out by deleting files, they can later be found in backups if they're needed again. a file normally sits for a good few months after last time it was needed.

 

We're trying to avoid buying more storage, we really want to get data offsite in the event of a natural disaster or fire.

 

Wasabi is looking like a decent alternative to Glacier in terms of pricing.

Finding a well priced storage provider isn't really a big problem, managing the data is more important so using a backup product is really important factor to being able to make use of the backups you have. If you can't quickly find what you need to restore and track where it is and what versions of it exist then you're paying for storage while getting not much value out of that expense.

 

One of the key differences between a backup solution and an archive solution in your case comes down to how you are managing the onsite data. Currently you are deleting what you deem no longer required on site therefore relying on your backups/alternate copies of the data. You can get an archive product to do this automatically for you using metrics like file access time and last modified time. You can even leave behind a stub of the archived file so you can see it exists when browsing the share but when you access/open the file it triggers a recall of the file automatically in the background, there is a delay in the file actually open of course.

 

Commvault can do both backup and archiving.

 

You can backup the data directly to cloud/offsite but I suggested using the Synology since you actually have it and it's actually better and faster to stage the data to local disk deduplicated and optimized for copying in to the cloud. It is much faster to AUX Copy data to the cloud than it is to directly copy it.

 

Depending on your data type you can save a lot of storage by using deduplicated backups/archives.

 

Example of database backups (MSSQL, MySQl, Postgres etc) where we take daily fulls so the difference in each backup is very, very small.

image.png.97714f271a0c8f58cd6e735dcf22c532.png

image.png.d470b41c500cfc1063978e3ad2426be2.png

 

VM backups don't deduplicate as well as database backups do

image.png.ba4e2845ffb12a96b26eb6c96719601e.png

image.png.0392646cab23c7f644ff18cbe0496259.png

 

File share backups are a bit better than VM backups

image.png.83e6bc2f9c18e72cc512e6147b6dcfec.png

 

I can't show you the file share backups as well as I'd like because we moved to using Netapp SnapVault for those which means Commvault isn't really doing the backups and deduplication just telling the storage array what to do. Before this change we had about 580TB of used disk storage for backups which without deduplication would of been 8PB.

 

We have a secondary copy of on disk data backups off site at another datacenter, they replicate to each other but you can't do that. Our longer term backups go out to LTO-7 tape, cloud really can't beat the price of tape backups still but there are other benefits to using cloud backups like being able to turn that backup in to a live VM if your backup tool can do that.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

Finding a well priced storage provider isn't really a big problem, managing the data is more important so using a backup product is really important factor to being able to make use of the backups you have. If you can't quickly find what you need to restore and track where it is and what versions of it exist then you're paying for storage while getting not much value out of that expense.

 

One of the key differences between a backup solution and an archive solution in your case comes down to how you are managing the onsite data. Currently you are deleting what you deem no longer required on site therefore relying on your backups/alternate copies of the data. You can get an archive product to do this automatically for you using metrics like file access time and last modified time. You can even leave behind a stub of the archived file so you can see it exists when browsing the share but when you access/open the file it triggers a recall of the file automatically in the background, there is a delay in the file actually open of course.

 

Commvault can do both backup and archiving.

 

You can backup the data directly to cloud/offsite but I suggested using the Synology since you actually have it and it's actually better and faster to stage the data to local disk deduplicated and optimized for copying in to the cloud. It is much faster to AUX Copy data to the cloud than it is to directly copy it.

 

Depending on your data type you can save a lot of storage by using deduplicated backups/archives.

 

Example of database backups (MSSQL, MySQl, Postgres etc) where we take daily fulls so the difference in each backup is very, very small.

image.png.97714f271a0c8f58cd6e735dcf22c532.png

image.png.d470b41c500cfc1063978e3ad2426be2.png

 

VM backups don't deduplicate as well as database backups do

image.png.ba4e2845ffb12a96b26eb6c96719601e.png

image.png.0392646cab23c7f644ff18cbe0496259.png

 

File share backups are a bit better than VM backups

image.png.83e6bc2f9c18e72cc512e6147b6dcfec.png

 

I can't show you the file share backups as well as I'd like because we moved to using Netapp SnapVault for those which means Commvault isn't really doing the backups and deduplication just telling the storage array what to do. Before this change we had about 580TB of used disk storage for backups which without deduplication would of been 8PB.

 

We have a secondary copy of on disk data backups off site at another datacenter, they replicate to each other but you can't do that. Our longer term backups go out to LTO-7 tape, cloud really can't beat the price of tape backups still but there are other benefits to using cloud backups like being able to turn that backup in to a live VM if your backup tool can do that.

Ah, Thanks for the info. Does CommVault create archived filestubs?

 

This may be the solution we're looking for.

Currently 99% of our infrastructure is visualised across three ESXi/vSphere hosts.

A few machines run Windows Server and provide shares.

 

Would CommVault work with this setup?

Link to comment
Share on other sites

Link to post
Share on other sites

Consider using Datto. Their ideal setup is backing up to a large share (which you have) and then that replicates out to cloud. 

You can run Alto or NAS on your own or buy into a local service provider's Siris operation. 

Intel 11700K - Gigabyte 3080 Ti- Gigabyte Z590 Aorus Pro - Sabrent Rocket NVME - Corsair 16GB DDR4

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, MajesticFudgie said:

Ah, Thanks for the info. Does CommVault create archived filestubs?

Can do yes, I have to warn you though Commvault is one of the most feature rich backup products out there but also one of the most expensive but for your data size and deployment type they have competitive options because of how strong competition Veeam gives them.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×