Jump to content

Python, File to PNG Compressor.

Poet129
Go to solution Solved by Poet129,
2 minutes ago, WereCatf said:

No need to.

Well I would to thank everybody who participated in helping me make this compressor, helped me test it, and even showed me how it isn't better in most cases. Thanks again, will definitely look for help here again in my future projects.

1 minute ago, Kilrah said:

So the whole "color" thing doesn't actually matter at all, you're just mapping things to a 24bit number. 

Yes, it does matter because of the amount of combinations of two pixels.

Link to comment
Share on other sites

Link to post
Share on other sites

But the number of combinations in 2 24bit numbers is always the same, doesn't matter that you interpret them as a color.

And you're losing space by having considered combinations instead of permutations, precisely because of the color thing. 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Kilrah said:

But the number of combinations in 2 24bit numbers is always the same, doedn't matter that you interpret them as a color.

So what your saying is a 48 bit number would work just as well?

Link to comment
Share on other sites

Link to post
Share on other sites

Of course...

When we gave you the number of combinations of colors at the beginning we didn't care that they were colors, they're just 2 24-bit numbers.

 

BTW you'd probably want to use multiples of 32bit for processing efficiency.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Kilrah said:

Of course...

When we gave you the number of combinations of colors at the beginning we didn't care that they were colors, they're just 2 24-bit numbers.

 

BTW you'd probably want to use multiples of 32bit for processing efficiency.

Just curious why does multiples of 32bit mean higher efficiency?

Link to comment
Share on other sites

Link to post
Share on other sites

Becasue your CPU's instructions work on 32 or 64bit numbers. if you're using 24bit numbers then on every 32bit instruction there's 8 bits worth of wasted potential. 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Kilrah said:

Becasue your CPU's instructions work on 32 or 64bit numbers. if you're using 24bit numbers then on every 32bit instruction there's 8 bits worth of wasted potential. 

So I would need to use 64 bit then?

Link to comment
Share on other sites

Link to post
Share on other sites

And you could also resort to some tricks to reduce the amount used ... for example, how multi-byte UTF-8 works, see the video below from around 6:30 onwards  (the whole video is informative, but the utf-8 trick is explained there) :

 

Or even simpler,  if the first bit is 1, then that means another byte follows... so you encode a number like 1500   - 0000 0101 1101 1100  in binary  as   [1000 101 1 ] [ 0101 1100  ]   - first 1 bit tells your program there's at least another byte so keep going. The 0 first bit in second byte means there's no more bytes.

Worst case scenario, this means your 48 bits would be encoded in 6 bytes instead of 4 bytes, but if value  is often within 14 bits you can pack in just 2 bytes saving space.

 

 

Why you want to work with 32 bit or 64 bit numbers, is because you have a 32 bit or 64 bit processor, with 32 bit or 64 bit wide registers - the processor moves around data in chunks of 4 or 8 bytes, it reads data from ram in 32 bit or 64 bit or 128 bit chunks, everything is at least 32 bit wide, even the caches inside the processor are in multiples of 32 bit.

Using just 24 bit results in wasted, unused, memory bandwidth, you're not using 25% of the amount available.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, WereCatf said:

This thread is a really wild ride!

A typical XY problem :)

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

In case anyone is wondering I'm currently generating the 32bit combination dictionary file. Previously I was using a 16 bit dictionary file.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Poet129 said:

In case anyone is wondering I'm currently generating the 32bit combination dictionary file. Previously I was using a 16 bit dictionary file.

After moving from 16 bit to 24 bit it had decreased file size so I expect the same this time.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Poet129 said:

what about generating every single combination on the fly would it still be faster than having it written?

Yes, a few orders of magnitude faster.

9 hours ago, Poet129 said:

Since all of them would be required for my program.

This is what computers are for my friend, handling arbitrary inputs without having to directly account for every single one.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

@Poet129 I'm not sure if your question is genuine or not :D

 

System calls take ages in CPU perspective and storing data in thousands of small files will have a huge burden on the file system. It's like riding an upside-down dead horse. Storing the dictionary in one file, putting all the bytes next to each other is a better idea. It can be read, parsed into an array and that array can be indexed witch is way faster than a file read.

 

As others said doing some computation could be faster than reading from memory, especially if the dictionary is hundreds of MB in size, which does not fit into cache. Measure the speed of different implementations if performance matters. If the algorithm is reasonably parallel and there are numbers to crunch, a GPU could do the heavy lifting but it won't make a difference if you can't feed it with enough data.

 

BTW I still don't get what's your encoding method is.

ಠ_ಠ

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, shadow_ray said:

System calls take ages in CPU perspective and storing data in thousands of small files will have a huge burden on the file system. It's like riding an upside-down dead horse. Storing the dictionary in one file, putting all the bytes next to each other is a better idea. It can be read, parsed into an array and that array can be indexed witch is way faster than a file read.

This is what I'm doing. I only have one dictionary file now.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Poet129 said:

This is what I'm doing. I only have one dictionary file now.

Unrelated but I think I stored 680 Bytes 85 sets of 64 bits in 651 Bytes. In the following picture.

85 Sets of 64 bits.png

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, shadow_ray said:

As others said doing some computation could be faster than reading from memory, especially if the dictionary is hundreds of MB in size, which does not fit into cache. Measure the speed of different implementations if performance matters. If the algorithm is reasonably parallel and there are numbers to crunch, a GPU could do the heavy lifting but it won't make a difference if you can't feed it with enough data.

I will likely push for higher speed but I would like to make sure I can compress good enough that is worth the time of optimization.

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, Poet129 said:

Unrelated but I think I stored 680 Bytes 85 sets of 64 bits in 651 Bytes. In the following picture.

 

You probably did,  but you have to keep in mind the size of the decompressor as well. If the decompressor has a built in dictionary that's tens or hundreds of MB, then you just hid the data from the file into the decompressor. 

 

If the decompressor has a reasonable size, like 1-5 MB, then it's fine, because a size of 5 MB doesn't matter that much when you compress a CD worth of files... even better for a DVD worth of files, or 50 GB worth of files. 

 

You fail if your decompressor needs tens, hundreds of MB of dictionary or tables built in.

Look at 7zip for example,  or command line unzip programs which are less than 100 KB in size, and can compress quite well, depending on algorithms chosen to compress. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, mariushm said:

You probably did,  but you have to keep in mind the size of the decompressor as well. If the decompressor has a built in dictionary that's tens or hundreds of MB, then you just hid the data from the file into the decompressor. 

 

If the decompressor has a reasonable size, like 1-5 MB, then it's fine, because a size of 5 MB doesn't matter that much when you compress a CD worth of files... even better for a DVD worth of files, or 50 GB worth of files. 

 

You fail if your decompressor needs tens, hundreds of MB of dictionary or tables built in.

Look at 7zip for example,  or command line unzip programs which are less than 100 KB in size, and can compress quite well, depending on algorithms chosen to compress. 

 

 

Okay but in this case there is no compression it is just raw binary. In a picture. Paired with binary compression it could be even smaller.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, mariushm said:

You probably did,  but you have to keep in mind the size of the decompressor as well. If the decompressor has a built in dictionary that's tens or hundreds of MB, then you just hid the data from the file into the decompressor. 

 

If the decompressor has a reasonable size, like 1-5 MB, then it's fine, because a size of 5 MB doesn't matter that much when you compress a CD worth of files... even better for a DVD worth of files, or 50 GB worth of files. 

 

You fail if your decompressor needs tens, hundreds of MB of dictionary or tables built in.

Look at 7zip for example,  or command line unzip programs which are less than 100 KB in size, and can compress quite well, depending on algorithms chosen to compress. 

 

 

Also this would be completely okay if your goal was network compression. With a calculable dictionary file.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Poet129 said:

Okay but in this case there is no compression it is just raw binary. In a picture. Paired with binary compression it could be even smaller.

PNG is compressed image, compresses the image using DEFLATE algorithm, same as zip.

 

To open the image, you need to decompress the image, it just so happens your Python has that decompression code built in, or the image library you use in Python has it

 

image.png.14382146ca9228ea818673b54d70189b.png

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, mariushm said:

To open the image, you need to decompress the image, it just so happens your Python has that decompression code built in, or the image library you use in Python has it

Okay but if my standard installation has it then wouldn't almost every non super cut down version have it too, so use of integrated files rather than files that are packaged during installation?

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Poet129 said:

Okay but if my standard installation has it then wouldn't almost every non super cut down version have it too, so use of integrated files rather than files that are packaged during installation?

There are unzip / un-deflate decompressors that only take 2-3 KB of disk space, here's just one example that's less than 2 KB in binary : https://github.com/pfalcon/uzlib

The standard zlib or whatever is as little as 15 KB.

 

Why would I go through the hassle of arranging bits in an image and compress the png with deflate and add the png file format headers and checksums (png also has adler32  crc, so your code also needs to calculate a checksum, the library does that for you without you realizing)  when I can just apply the deflate algorithm and end up the same place?

 

Did you actually try to simply ZIP those 680 bytes? they probably compress in around 500-600 bytes... your scheme got it down to 651 but your have a lot of overhead in the png signature, fields, checksums etc

The actual compressed data starts after IDAT ... you have almost 64 bytes of overhead at the start, and there's around 8-12 bytes of png specific stuff at the bottom .

 

 

image.png.79f1def4f7226423abb079a88540fb3d.png

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Speaking of generating images ... you can use that PPM trick I showed you above, but instead of rgb, save in PGM format which is grayscale, so each byte  becomes a pixel with color intensity between 0.. 255.

The only difference is having P5 in the header instead of P6.

 

For example, I copy-pasted a couple paragraphs in Notepad and saved the document as image.pgm  and checked the file size: 1140 bytes.

So I'm gonna make a picture with 8 bit color depth (grayscale, 256 shades possible) this means there's 1140 bytes = 1140 pixels to put in the picture... so let's find a  X * Y set that results in 1140 ... you could do 114 x 10  or you could do 285 x 4 or  57 x 20

 

So it's just a matter of adding the PGM header .. P5 (signature), width x height , color depth (255, for grayscale pixels, max 256

levels) ... that's it ... so add before the text and save.

P5
57 20
255

image.pgm

 

image.png.194290fc1afcba8eae4ddbc6f70478e4.png

 

Now you can open the image with Irfanview and save as PNG or GIF or whatever, and you should be able to save it back to PGM to recover the text or you can try to output to some uncompressed formats like maybe PCX, TGA ... BMP (but note it's gonna be messed up, BMP loads from bottom to top, so the words will be jumbled up when reading the bmp as a text file)

 

Funny enough, 7zip can shrink the image.pgm file to around 800 bytes... the text is not that much compressible.  Saving to png or gif will result in bigger size, because image is more or less noise.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, mariushm said:

Now you can open the image with Irfanview and save as PNG or GIF or whatever, and you should be able to save it back to PGM to recover the text or you can try to output to some uncompressed formats like maybe PCX, TGA ... BMP

Saving it as JPG (quality 100 for just a little bit of corruption) and converting back to PGM is MUCH more fun though 🤣

 

Quote

Her uniform stilk giu. Mottly.
Jackie bould ree a blurrfc version ofhesselfin she sfmipolished steeldooss nf the elfvbtos car. Gray miliuaqy uniform, black and bkue stripes on thd sleeves and oaot ldgs/ Black for the Spabe Force+ blue forthe Navy, gray for the Specibl Forcet, the actual branch shf had belonged to, once!upon a sime/ A slight hint of red to her tanned face, proofof a fdw hours too!nany soaling uq thesun onthe loc`l beadhes over the holidays/Long brown durls coimed ane pinned at theback oe hes head+ below hfr officfr’s cap.Thiny twinsilver bars on her lbpems+ shotlder boards, and shirt collar proclaimjng her pld sank of Lieuten`nt Commander. Medals cecoratinf her chest . . - amd the auttons of her jacket straining tp keep the coat psoperly closed.
Of course, it had been a dfcade sjnce JacarandaMacLenzie had been a Lieutenant Comnander io the United Planfts Space Force. For h`lf of that dfcace,her job had been to sit in a chair and trbnslase speecier for politicians . . . bnd for the otges hamf+ it h`d bdento sit ima bhairas b qnlitician, except for when she was standing and making speeche

 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×