Jump to content

Python 3 preferably, How to make transparent compression, like NTFS Compression?

Poet129

I've been working on many different ways of compressing files, most of them aren't great however I think I've got something good this time. It isn't magic but it should work. Anyways on to the question. I was hoping I could implement this compression method into a program you install that is able to "integrate" with windows explorer so that the compressed files appear as normal, exactly like how NTFS compression works. I realize I probably wouldn't be able to do it as well as NTFS compression as I don't have the windows source code so I can't build it into windows obviously. I was hoping there would be something like a file attribute I could set that would change how programs interpret the files so that I could do this (This last sentence is just an idea on how it could be done).

Link to comment
Share on other sites

Link to post
Share on other sites

What would be your goal? Better compression ratio? Speed?

 

Can you make another file path where all the compressed files would go? Not sure how windows does it, but on linux you can have a filesystem that just is anouther filesystem that has been modified and stores its files in another location.

 

What don't you like about the included ntfs compression?

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Electronics Wizardy said:

What would be your goal? Better compression ratio? Speed?

 

Can you make another file path where all the compressed files would go?

 

Not sure how windows does it, but on linux you can have a filesystem that just is another filesystem that has been modified and stores its files in another location.

 

What don't you like about the included ntfs compression?

Better compression ratio, Yes, Okay, Just isn't good enough I'm trying to make a better version.

The third statement sounds like something that would work for me, but what would an existing be implementation that I could pull from?

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Poet129 said:

Better compression ratio, Yes, Okay, Just isn't good enough I'm trying to make a better version.

The third statement sounds like something that would work for me, but what would an existing  be implementation that I could pull from?

I have yet to see something on windows for this, id look at the linux filesystems(or just use linux for storage). ZSTD is about as good as you will get for realtime.

 

But also filesystem compression doens't really do much normally. What files are you storing. Most files are already compressed.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Electronics Wizardy said:

I have yet to see something on windows for this, id look at the linux filesystems(or just use linux for storage). ZSTD is about as good as you will get for realtime.

 

But also filesystem compression doens't really do much normally. What files are you storing. Most files are already compressed.

I'll look into ZSTD, as for files I'm storing nothing particular just trying to make a better option perhaps faster at some point, but to be honest I just want some experience and this is something I found interesting.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Poet129 said:

I'll look into ZSTD, as for files I'm storing nothing particular just trying to make a better option perhaps faster at some point, but to be honest I just want some experience and this is something I found interesting.

Yea filesystem compression normally doesn't help as most files(word docs, movies, pictures) are already compressed.

 

If your goal is to learn, id try to do it on linux, as filesystem stuff is opensource and id just say better.

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Poet129 said:

I was hoping I could implement this compression method into a program you install that is able to "integrate" with windows explorer so that the compressed files appear as normal, exactly like how NTFS compression works.

NTFS-compression happens at the filesystem-layer, not application-layer, which is why it works transparently. You can't do that without writing a custom filesystem.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

First ... make a program that receives two parameters or arguments :  input file name, output file name

The program must be able to take an uncompressed file and compress it, and must also recognize that the input file was previously compressed and decompress it. Alternatively, add a third parameter (method, ex "compress" and "decompress") to tell your program what to do with the input files (compress input to output, decompress input to output)

 

The program MUST NOT rely on any other things to compress the file, must not rely on video cards, must not rely on third party programs, your code must do everything.

 

Prove that you have some kind of compression working, then you can think about transparent compression and other crap... the most important bit is the actual compression.

 

You can do something similar to zip folders in Windows but that's quite advanced. You can do whole file systems in Linux and have your file system mounted as a regular drive, but that's also very advanced, and you need a lot of C and C++ knowledge for that... same for the Windows stuff.

It's very difficult if not impossible to do that in an interpreted language like Python.

 

NTFS compression is not that great, because the aim wasn't for the best compression, the aim was near real time compression, with as little as possible overhead. They use small buffers, they compress small chunks of data, so the compression algorithm can't use big chunks of a file to find stuff that repeats in order to compress a file significantly.

It's also designed with SEEKING in mind, so that an application can randomly go in a file and read chunks of it with minimal latency : worst case scenario, the decompression routing only has to read 8-64 KB and decompress it, to serve 512 bytes or 4096 bytes or whatever.  When you compress with zip or other formats, you have configurable "minimum chunks" which are usually set to huge sizes, or you have to extract everything up to that point in a file in order to read it.

 

See this blog post from one of the Microsoft guys : https://devblogs.microsoft.com/oldnewthing/20160718-00/?p=93895

Quote

Transparent file compression such as that used by NTFS has very different requirements from archival file compression such as that used by WinZip.

Programs like WinZip are not under time constraints; they can take a very long time to analyze the data in order to produce high compression ratios. Furthermore, the only operations typically offered by these programs are “Compress this entire file” and “Uncompress this entire file”. If you want to read the last byte of a file, you have to uncompress the whole thing and throw away all but the last byte. If you want to update a byte in the middle of the file, you have to uncompress it, update the byte, then recompress the whole thing.

Transparent file compression, on the other hand, is under real-time pressure. Programs expect to be able to seek to a random position in a file and read a byte; they also expect to be able to seek to a random position in a file and write a byte, leaving the other bytes of the file unchanged. And these operations need to be O(1), or close to it.

In practice, what this means is that the original file is broken up into chunks, and each chunk is compressed independently by an algorithm that strikes a balance between speed and compression. Compressing each chunk independently means that you can uncompress an arbitrary chunk of a file without having to uncompress any chunks that it is dependent upon. However, since the chunks are independent, they cannot take advantage of redundancy that is present in another chunk. (For example, if two chunks are identical, they still need to be compressed separately; the second chunk cannot say “I’m a copy of that chunk over there.”)

 

Also see the explanation from developers : https://web.archive.org/web/20190430175652/https://blogs.msdn.microsoft.com/ntdebugging/2008/05/20/understanding-ntfs-compression/

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×