Jump to content

That's basically how a compression algorithm works. Find the most common element(s) in a file, replace them with a short pattern. Then continue with the next common element and replace it with the next shortest unique pattern.

 

~edit: So the question would be, do you simply want to compress the file's contents? If so, there should be a ton of libraries that can do that already. Do you have something more specific in mind, like: The source is human readable text and the compressed file needs to be readable, etc.

Remember to either quote or @mention others, so they are notified of your reply

Link to post
Share on other sites

Yeah ... a basic compression would be something like this

[ 1 byte - how many characters back to look or 0 if there's some uncompressible text following] 

[ 1 byte - length  ] 

[ n bytes - data if first byte is 0 ] 

 

So your text file ( \n is the enter character, assuming it's just the LF part, because ENTER in Windows is two bytes, CR - move cursor to start of row - and LF - go down a row - , or \r and \n 😞

12345678[\n]
12345678[\n]
1234321

could be compressed into 

[0][9] [ 123456789\n]  [9][9]  [9][4] [0][3][321]

 

So 21 bytes in total ... the original was 27 bytes. 

 

Compressors squeeze more bits by resorting to tricks like not always using 2 bytes like in above example, for example they could make a rule like ... if the first bit is 1, then it's an uncompressible sequence and the rest of 7 bits is the length of the uncompressed data... so you save a byte for every uncompressed piece, but now you can only go back up to 127 characters because that 8th bit always have to be 0 otherwise. 

So with the above trick the sequence could change to  

 

[128+9] [12345678\n] [9][9] [9][4] [128+3] [321]  - so you saved 2 more bytes. 

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×