Jump to content

Data deduplication

Go to solution Solved by alpenwasser,

@GoodBytes knows lots more about Windows than I do, maybe he can be of help, or

catch any errors I've made in my assumptions about how Windows does volumes etc. :)

 

To me it sounds like the website is saying this:

 

If I have an identical file/block located on 2 different volumes, then dedupe will delete one of those, and create a pointer that points to the location on the other volume. This works because something somewhere is keeping track of where these pointers are going

 

Without being absolutely sure, I don't think dedupe would work across different volumes.

I would expect it to work across different disks within the same volume, but not across

volume borders. I have however not yet been able to find a reliable source to confirm

this.

EDIT:

It seems that this is correct:

Deduplication with Windows works within a single volume (though multiple volumes can have it enabled)

source: http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplication-how-and-where-to-tweak/

So deduplication would not work across volume boundaries according to that site. Only

within the same volume.

/EDIT

 

A volume that is under deduplication control is an atomic unit.

Note: volume, singular. Not volumes.

Hence why a volume is an atomic unit, but a disk not necessarily so, unless you have a

volume which consists only of a single disk.

It says you can move a volume to another server, not a single disk:

You can back up the volume and restore it to another server. You can rip it out of one Windows 2012 server and move it to another

At least that's what I make of it.

If you take out one disk from a multi-disk volume, yes, the data should not be available

on the new machine, because obviously it's not on that disk. But if you move an entire

volume over to another machine (consisting of multiple disks), then everything should

be somewhere on those drives, distributed.

Now, if you have volumes which consist of single disks and enable dedupe, then I would

expect it to keep one copy of the data on that one disk, because it wouldn't deduplicate

over volume boundaries (or at least I would not expect it to).

This is just based on some googling and what I know about deduplication in general, if

Windows does something distinctly different, I hope somebody catches any errors I've made.

EDIT2:

This source also says that it's per-volume, not across volumes:

Deduplication is performed on a per-volume basis.

https://redmondmag.com/articles/2014/03/13/data-deduplication-in-windows-server.aspx

So you need to make a clear distinction between volumes and disks. Individual disks

can only be moved between systems without problems if they contain an entire volume,

as soon as a volume spans multiple disks, it seems you need to move all associated

drives to the new system. Which makes sense in my book.

If I have block level deduplication running on my server, and it removes duplicates , what happens to my data if I the unplug that hard drive and plug it into another machine?

 

Assuming the unplugged hard drive was one where duplicate blocks were found and removed, does that mean files wont be accessible on the new computer now?

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/
Share on other sites

Link to post
Share on other sites

Assuming the unplugged hard drive was one where duplicate blocks were found and removed, does that mean files wont be accessible on the new computer now?

Yes, that's what it means. 

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5457960
Share on other sites

Link to post
Share on other sites

but presumably everything would be accessible if I plugged it into another server machine where deduplication was available?

Nope. 

Deduplication basically deletes the file off the disk and adds a pointer to point it to the original file. i.e. there's only 1 file when deduplication is on. If you have it saved in 4 places across 4 hard drives, and deduplication is working across hard drives, then only 1 of those hard drives will actually have the file. The other 3 will only have a pointer. 

I mean, it gets more complicated across multiple hard drives. It depends if your software also considers how dedupe would affect access times. Then it gets a lot more complicated.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5457980
Share on other sites

Link to post
Share on other sites

If I have block level deduplication running on my server, and it removes duplicates , what happens to my data if I the unplug that hard drive and plug it into another machine?

 

Assuming the unplugged hard drive was one where duplicate blocks were found and removed, does that mean files wont be accessible on the new computer now?

What specific implementation of dedupe and HDD setup are we talking about here?

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5458449
Share on other sites

Link to post
Share on other sites

What specific implementation of dedupe and HDD setup are we talking about here?

 windows 2012 standard

 

4 physical drives, 1 volume each. no storage pooling

 

my question is, if multiple of those volumes have dedupe switched on, and I remove only 1 of those physical drives and stick it into say a win8 machine, can I still access the duplicate data. alternatively, if I put it into another win2012 machine, can I still access the duplicate data?

 

I read here: http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx

That each volume is its own atomic unit and can be moved around even with dedupe turned on. But that is confusing to me because as Vitalius pointed out, if duplicate data is found, the dupes are deleted and pointers created on the volume instead. If that volume gets moved around, how does it know what to point to? In other words, how is this atomic unit thing meant to work if the data is not physically on the drive...

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5458604
Share on other sites

Link to post
Share on other sites

- snip -

Okay, I must admit I'm not very knowledgeable about Win Server's implementation of dedupe, however,

reading this section:

 

3) Portability: A volume that is under deduplication control is an atomic unit. You can back up the volume and restore it to another server. You can rip it out of one Windows 2012 server and move it to another. Everything that is required to access your data is located on the drive. All of the deduplication settings are maintained on the volume and will be picked up by the deduplication filter when the volume is mounted. The only thing that is not retained on the volume are the schedule settings that are part of the task-scheduler engine. If you move the volume to a server that is not running the Data Deduplication feature, you will only be able to access the files that have not been deduplicated.

I gather the following:

You can move the volume to another machine, but it also has to be a Windows Server w/ deduplication

available. So moving it to a Windows8 machine, I would estimate that it will not work.

Also, since you didn't mention your specific setup at the start, Vitalius could have interpreted

your question as "What happens when I remove a single HDD from a multi-HDD volume w/ dedupe enabled?",

in which case obviously the pointers on that one HDD would point to files being located on other

HDDs. Not sure, but just a thought. He can clarify if necessary.

As it says on that site, "Each volume is its own atomic unit.", not each HDD. In your case,

since you have one volume per HDD if I've understood correctly, that would mean that you can indeed

take one HDD and move it to another machine, but the deduped files would only be accessible if that

other machine also has deduplication available.

As said, I don't have any experience with dedupe on Windows, I've only looked into it on ZFS,

which I think has a rather different implementation. But going by that site you linked that's

what I make of it.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5458866
Share on other sites

Link to post
Share on other sites

To me it sounds like the website is saying this:

 

If I have an identical file/block located on 2 different volumes, then dedupe will delete one of those, and create a pointer that points to the location on the other volume. This works because something somewhere is keeping track of where these pointers are going

 

I guess Im trying to understand how this stuff works in general now, because I fail to understand how volumes can be atomic units. Clearly the physical file/block must be completely removed from disk via dedupe. At this point it technically doesn't exist anymore on that disk anymore. But the site says if you move that disk to another server with dedupe available, that deleted file will be accessible...which doesn't make sense because it doesnt exist on there...

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5462903
Share on other sites

Link to post
Share on other sites

@GoodBytes knows lots more about Windows than I do, maybe he can be of help, or

catch any errors I've made in my assumptions about how Windows does volumes etc. :)

 

To me it sounds like the website is saying this:

 

If I have an identical file/block located on 2 different volumes, then dedupe will delete one of those, and create a pointer that points to the location on the other volume. This works because something somewhere is keeping track of where these pointers are going

 

Without being absolutely sure, I don't think dedupe would work across different volumes.

I would expect it to work across different disks within the same volume, but not across

volume borders. I have however not yet been able to find a reliable source to confirm

this.

EDIT:

It seems that this is correct:

Deduplication with Windows works within a single volume (though multiple volumes can have it enabled)

source: http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplication-how-and-where-to-tweak/

So deduplication would not work across volume boundaries according to that site. Only

within the same volume.

/EDIT

 

A volume that is under deduplication control is an atomic unit.

Note: volume, singular. Not volumes.

Hence why a volume is an atomic unit, but a disk not necessarily so, unless you have a

volume which consists only of a single disk.

It says you can move a volume to another server, not a single disk:

You can back up the volume and restore it to another server. You can rip it out of one Windows 2012 server and move it to another

At least that's what I make of it.

If you take out one disk from a multi-disk volume, yes, the data should not be available

on the new machine, because obviously it's not on that disk. But if you move an entire

volume over to another machine (consisting of multiple disks), then everything should

be somewhere on those drives, distributed.

Now, if you have volumes which consist of single disks and enable dedupe, then I would

expect it to keep one copy of the data on that one disk, because it wouldn't deduplicate

over volume boundaries (or at least I would not expect it to).

This is just based on some googling and what I know about deduplication in general, if

Windows does something distinctly different, I hope somebody catches any errors I've made.

EDIT2:

This source also says that it's per-volume, not across volumes:

Deduplication is performed on a per-volume basis.

https://redmondmag.com/articles/2014/03/13/data-deduplication-in-windows-server.aspx

So you need to make a clear distinction between volumes and disks. Individual disks

can only be moved between systems without problems if they contain an entire volume,

as soon as a volume spans multiple disks, it seems you need to move all associated

drives to the new system. Which makes sense in my book.

Edited by alpenwasser

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
https://linustechtips.com/topic/404908-data-deduplication/#findComment-5463051
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×