Jump to content

Python, File to PNG Compressor.

Poet129
Go to solution Solved by Poet129,
2 minutes ago, WereCatf said:

No need to.

Well I would to thank everybody who participated in helping me make this compressor, helped me test it, and even showed me how it isn't better in most cases. Thanks again, will definitely look for help here again in my future projects.

43 minutes ago, Poet129 said:

I would do this but I can't right now because I can't upload that much to two websites.

I have now tried this however the first site crashes because it is to much data.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Kilrah said:

You are using a specific input text that happens to play very nice with PNG's filtering. More random data or text won't fair anywhere near as well, and you could implement the same algorithm yourself without the convoluted process of going through an image...

 

All you're showing is that some algorithms work better with some input data than others. If one were to tailor a compression algorithm for that specific file you could likely make it even smaller.

 

It wasn't just for that file it was for binary, plus this is what I'm asking for I would just like help with getting a version of those two sites or whatever that can be run in a script that does the same amount of compression.

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, Poet129 said:

It wasn't just for that file it was for binary

But your example achieving half size is only because of the specific input file you used. Proof of that is the example file you posted there with the same name but that's different and didn't compress near as well.

 

BTW both those files seem to show understanding of that filtering already but that was never mentioned so this thread is just trolling at this point. 

 

16 minutes ago, Poet129 said:

I would just like help with getting a version of those two sites or whatever that can be run in a script that does the same amount of compression.

So just read the png spec, reimplement the way it filters the data first, compress the result with DEFLATE like it does (e.g. using zlib), drop all the image-specific headers and stuff and you're done.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Oh wow

You took an extremely nice example of an input, which happens to compress very well as an image.

Regular file compressors are not optimized for specific niche cases, they're designed for lots of inputs.

 

Further, you also "helped" the png compressor by giving it the ideal format to work with, you made the image nicely aligned at 8 pixels wide. Had you made the image something that's not a multiple of 8, it would have failed.

 

PNG has some "filters" and also looks for "tiles" of pixels that repeat in an image, in order to compress well areas of an image that have the same color : think of a screenshot of this forum, or think of an excel chart, or whatever.

You gave the compressor the ASCII table, which has big chunks of 0s and 1s, for example your first four lines would be:

 

00000000

00000001

00000010

00000011

 

The compressor finds a tile that 2x2 pixels , or 2x4 tile or even 4x4 that will then appear again in the image, so it can put such tile in a "dictionary" and when it detects this tile again, it just uses a couple of bits to point to it in dictionary.

 

00000000 00000000

00000001 00000001

00000010 00000010

00000011 00000011

 

So like I said... this particular input example works so well because between each line, more or less only 1 bit or more changes, a lot of the bits are the same, so PNG finds a lots of "tiles" that repeat so they compress well.

If you made it 10 bit wide, or some other weird shape that's not multiple of 8, the pattern may be distorted enough to not compress as well.  

 

In real world, an input text will not be so "consecutive" it will be more random, a book or some executable file will be full of random bits, they're not all gonna align to have big squares or rectangles of  0s and 1s that compress well, your picture will be like snow on old analogue TVs, the png filters won't find a lot of "tiles" that repeat themselves enough times to be worth adding that tile to an internal dictionary. 

 

anyway, for kicks, I reproduced your example by making my own ASCII codes file (without newlines) and compressing it.

So you get 256 x 8 bytes = 2048 bytes  - and by the way your input file only had 254 ascii codes .

7zip compressed it down to 419 or 478 bytes, depending on the algorithm used - 7zip supports BZip2 which is optimized for text, and PPMd which can work great for some inputs but is single threaded and slow, and LZMA and LZMA2 which are the defaults, but which are optimized for multi-threading and BIG files so they have a lot of overhead, small files don't compress as well.

So using Bzip2 got 2048 bytes down to  419 bytes (20.45%) and PPMd got it down to 478 bytes (23.33%)

 

The original ASCII code table of 256 bytes won't compress well, because compressors typically look for bytes and sequences of bytes that repeat.... but there are compressors which works with bits and have extremely complex compression schemes, like let's say the PAQ compressors. 

Download one of the latest versions and play with it - the PAQ page is here : http://mattmahoney.net/dc/paq.html   and here's direct download link to PAQ8 : http://mattmahoney.net/dc/paq8jc.zip

 

c:\Temp\topng>paq8jc.exe -7 input_ascii.paq  input_ascii.txt
Creating archive input_ascii.paq.paq8jc with 1 file(s)...
input_ascii.txt 256 -> 173
256 -> 207
Time 0.11 sec, used 1017681280 bytes of memory

c:\Temp\topng>paq8jc.exe -7 input.paq  input.txt
Creating archive input.paq.paq8jc with 1 file(s)...
input.txt 2048 -> 109
2048 -> 138
Time 0.16 sec, used 1017681297 bytes of memory

 

So it managed to compress those 2048  1s and 0s into 138 bytes, better than PNG... and even managed to shrink the 256 bytes with characters that don't repeat down to 207 bytes.  Actually, it's  173 bytes and 109 bytes but final size includes the headers (which include the input file name, so had i used a.b as file name the final file size would have been a few bytes smaller)

 

And this PAQ is just one of the latest, I didn't pick one that's specially optimized for text input, not sure this one is.

 

Again... it only works because you chose a particularly bad test case, that has a lot of 1s and 0s that repeat nicely, and also because you made the picture a specific format - nicely aligned at 8 pixels or 8 multiples  -

 

Here's what I suggest... get any book, I'd suggest getting some Jules Verne book from Project Gutenberg, it's a good author and you may actually enjoy reading the book.  Go to any chapter and extract enough text to have something like 4096 bytes in your text file.... that will be 32768  1s and 0s

 

Now arrange that as 512 x 64 picture or some other pattern, and I'd also suggest something that's not multiple of 8 ... like let's say 198 x 165 - this will make it a total of  32670  1s and 0s  but that's fine, it only means you'll lose a few characters at decompression (you want to see how well such resolutions compress, 100% perfection is less important here)

 

You can make any resolution you want with that PGM file format. Just get your  1s and 0s input, do a search and replace and replace 1 with " 255" , so your image will be "0 255 0 0 255 ... " and add the 3 lines of header at the top, if I remember correctly "P2 [newline]  x  y  [newline] 255 [newline] 

 

 

Now try to save these images as png and see how well they compress... pay particular attention to those that don't have the width

 

input_ascii.txt input.txt input_bz2.7z

Link to comment
Share on other sites

Link to post
Share on other sites

Alrighty. So I've been playing around and I've figured out a way that you can persist every combination of your 224 colors while only having to store any given color one time: SQL.

If you put all of your colors into a single table, say Colors, then you can run the query below to get 224 sized chunks of color combinations:

SELECT *
FROM Colors
CROSS JOIN Colors AS Alternate
WHERE Alternate.Color = @chunkIndex;


Effectively, what this does is save every possible combination of two colors on disk, but uses a reasonable amount of space.

Chunk sizes are 16M integers, as this query returns every record of the cross join where the alternate color is equal to chunkIndex.

This chunk size is a little large, so make sure that you have your RDBMS properly configured to run the table in memory, and if possible, use the query as a stored procedure (so that it is properly optimized). Make sure to index the column, so that it is faster to access.

I still say that it's faster to generate the data on the fly to do your testing with, but you insist on persisting it, so there you go: A way to persist a MASSIVE amount of data in a reasonable amount of space and computational power.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

I created a small Java application that converts an arbitrary file into a black and white PNG (1 bit color depth). A pixel is black if the corresponding bit in the input file is zero and white if the corresponding bit is one. The output should match your manual process from the video.

 

Minor deviation: The application attempts to make the resulting image file as close to a square as possible, for easier viewing.

Note: The application will crash if the resulting image size is too large! (max. PNG size is ~2,500 Megapixel)

 

Spoiler


import javax.imageio.ImageIO;
import java.awt.Color;
import java.awt.image.BufferedImage;
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.stream.IntStream;


public class BinaryToImage {

	public static void main(final String[] args) {
		if (args == null || args.length == 0) {
			System.out.println("Please provide a file name");
			return;
		}

		final String fileName = args[0];
		final File file = new File(fileName);

		if (!file.exists()) {
			System.out.println("The specified file does not exist: " + fileName);
			return;
		}

		try (final InputStream inputStream = new FileInputStream(file);
			 final BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream)) {
			int octet;

			final long size = file.length();
			final long bits = size * 8;
			System.out.format("Input file size is %,d byte (%,d bits)\n", size, bits);

			final int width = getMostSquareDivisor(bits);
			final int height = (int) bits / width;

			System.out.format("Create image with %,d x %,d pixel (%,d pixels)\n", width, height, width * height);
			final BufferedImage image = new BufferedImage(
					width,
					height,
					BufferedImage.TYPE_BYTE_BINARY);

			int y = 0;
			int x = 0;

			while ((octet = bufferedInputStream.read()) != -1) {
				for (int n = 0; n < 8; n++) {
					final int value = (octet >>> n & 0x01) == 0
							? Color.BLACK.getRGB()
							: Color.WHITE.getRGB();

					image.setRGB(x, y, value);
					x++;

					if (x >= width) {
						x = 0;
						y++;
					}
				}
			}

			final File output = new File(fileName + ".png");
			ImageIO.write(image, "png", output);
			System.out.format("Output file size is %,d byte (%,d byte smaller)\n", output.length(), file.length() - output.length());
		} catch (final IOException e) {
			e.printStackTrace();
		}
	}

	private static int getMostSquareDivisor(final long size) {
		final double sqrt = Math.ceil(Math.sqrt(size));
		if (sqrt * sqrt == size) {
			return (int) sqrt;
		}

		return IntStream.range(8, (int) sqrt)
				.filter(divisor -> size % divisor == 0)
				.max()
				.orElse(8);
	}
}

 

Using the application's own source code for testing, I get the following results:

 

The original .java file is 2,184 bytes.

The resulting image file is 1,224 bytes.

Running the PNG through Zopfli further reduces this to 1,196 bytes(1)

 

Using different compression algorithms:

.zip: 1,117 bytes

.7z: 1,092 bytes

.xz: 1,076 bytes

 

I'm sure you can find other cases where "image compression" wins, but keep in mind that zip/7x/xz are general purpose compressors that aren't optimized for ASCII text. And did I mention the application crashes if the PNG is too large? You can't compress files larger than a few MB this way.

 

1: Zopfli is an improved Deflate compressor by Google engineers that often achieves better compression. The resulting PNG is 100% compatible with existing image libraries. The downside is that for larger images it can take hours to recompress the image.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Yeah, so here's my contribution of  file to png and reverse, all in 50 lines of code :

 

<?php 
$data = file_get_contents(__DIR__.'/input.txt');
$pixels = strlen($data)*8+64; // encode the length in first 64 bits
// determine width and then increase it to be a multiple of 8
$width = intval(ceil(sqrt($pixels))); 
while ($width % 8 != 0) $width++;
$height = intval(ceil($pixels/$width));
echo "\nCreating a $width x $height image to encode $pixels bits of information.";
// create image 
$image = imagecreate($width,$height) or die('Failed to create image');
// create the two colors and put them in color pallette 
$colors = [];
$colors[0] = imagecolorallocate($image,255,255,255);
$colors[1] = imagecolorallocate($image,0,0,0);
$sizestring = str_pad(strlen($data),8,' ',STR_PAD_LEFT);
$data = $sizestring.$data;
$y=0;
$x=0;
for ($i=0;$i<strlen($data);$i++) {
	// extract byte from string, convert to ascii code, 
	// convert to binary, pad with 0s to get 8 bits
	$bincode = str_pad(decbin(ord(substr($data,$i,1))),8,'0',STR_PAD_LEFT);
	for ($j=0;$j<8;$j++) {
		$bit = intval(substr($bincode,$j,1));
		$result = imagesetpixel($image,$x,$y,$colors[$bit]);
		$x++; if ($x==$width) { $y++;$x=0;}
	}
}
imagepng($image,__DIR__.'/output.png');

// The decoding part
$data  = '';
$buffer= '';
$image = imagecreatefrompng(__DIR__ .'/output.png');
$width = imagesx($image);
$height= imagesy($image);
for ($y=0;$y<$height;$y++) {
	for ($x=0;$x<$width;$x++) {
	 $color = imagecolorat($image,$x,$y);
	 $buffer .= ($color=='0') ? '0' : '1';
	 if (strlen($buffer)==8) {
		$data .= chr(base_convert($buffer,2,10));
		$buffer='';
	 }
	}
}
if (strlen($buffer)>0) { $buffer = str_pad($buffer,8,'0',STR_PAD_RIGHT); $data .= chr(base_convert($buffer,2,10));}
$size = intval(trim(substr($data,0,8)));
file_put_contents(__DIR__.'/output.txt',substr($data,8,$size));
?>

 

A virtual cookie to anyone who can guess what the input text in the zip below is from.

 

Anyway, my input is 10664 bytes... made the code above create widths multiple of 8 to have bytes aligned vertically for even better compression in png.

 

5128 bytes - The default png compressor in php

4940 bytes - optimized with optipng (don't have zopfli on my pc)

4258 bytes - 7zip bzip2

3895 bytes - 7zip ppmd

3606 bytes - paq8jc

 

So even giving it the best scenario (aligning bytes) the image does not result in best compression

 

Code and input text and compressed examples in attached archive.

 

output.png.41280503b4410d855d0755d3fe906321.png

 

text_to_png.zip

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, mariushm said:

A virtual cookie to anyone who can guess what the input text in the zip below is from.

 

Can you message me your input?

I want to compare your numbers to Brotli.

 

Sucked it up and got the .zip

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, Poet129 said:

Compared to 7-Zip: 50.4KB

Wait, you first "compressed" the picture, then you compress the "compressed" picture with 7zip? What's that supposed to demonstrate? If you wanted to do a comparison, you'd 7zip the "uncompressed" picture!

24 minutes ago, Poet129 said:

What picture compressed looks like: 49.63KB

You saved it as a compressed PNG. It's the PNG-compression that makes your file that size. You still haven't grasped the fact that PNG already does compression by default. Compressing a compressed PNG with 7zip will obviously make it larger, not smaller, and you are only demonstrating even further that you have no clue what you're doing.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, WereCatf said:

Wait, you first "compressed" the picture, then you compress the "compressed" picture with 7zip? What's that supposed to demonstrate? If you wanted to do a comparison, you'd 7zip the "uncompressed" picture!

That's what I did.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Poet129 said:

That's what I did.

No, you didn't. The 7-zip file contains the same picture as above, not an uncompressed one.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, WereCatf said:

No, you didn't. The 7-zip file contains the same picture as above, not an uncompressed one.

Sorry I made a mistake. 48.88KB

Picture.7z

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Poet129 said:

Sorry I made a mistake. 49KB

Your "compressed" file there, with PNG-compression set to 0, is 56.8KB, so 7-zip wins by a country-mile. You only managed to demonstrate that it's the PNG-compression that does all the work, not your code.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, WereCatf said:

PNG-compression set to 0, is 56.8KB

Okay but I built this into the script so does that make any difference?

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Poet129 said:

Okay but I built this into the script so does that make any difference?

Irrelevant. Your script saves the files with PNG-compression enabled.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, WereCatf said:

Irrelevant. Your script saves the files with PNG-compression enabled.

Who said I couldn't use other compression methods to help mine be better like if you could add the file savings of windows zip and 7Zip but you can't.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Poet129 said:

Who said I couldn't use other compression methods to help mine be better like if you could add the file savings of windows zip and 7Zip

The point is that your "compression" actually makes the file larger, not smaller, so it's not actually "compressing" anything.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, WereCatf said:

The point is that your "compression" actually makes the file larger, not smaller, so it's not actually "compressing" anything.

Here is it compressing the file after it has been png compressed.

Compression-1.png

Compression-1.png.png

Picture.7z

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Poet129 said:

Here is it compressing the file after it has been png compressed

You are just repeating the same mistake.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, WereCatf said:

You are just repeating the same mistake.

Now I'm confused, I started with png compressed image and still made it smaller not smaller than 7zip but still smaller than og png compressed image.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Poet129 said:

Now I'm confused, I started with png compressed image and still made it smaller not smaller than 7zip but still smaller than og png compressed image.

Well, if you wanted to make a compressor that's reliant on other compressors to do the hard work..

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, WereCatf said:

Well, if you wanted to make a compressor that's reliant on other compressors to do the hard work..

Yes perhaps but it still made it smaller than the original file even if it was png compressed to begin with. This is my point. But I do understand that unless you are going to go out of your way to get potential smaller files depending on the files because 7zip can be better in most instances there is little point in it. But I think that's pretty good for a sixteen year old to figure out... With help.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×