Jump to content

How to extract exactly n-bits from file?

ClobberXD

This question is related to this topic:

Now, I'm trying my hand at file compression, and would like to extract exactly 5-bits from the file, for each iteration. I know that one has to extract a full-byte at a time (can someone explain this too?), but I'm looking to use a direct solution. Is it possible? How else can this be achieved?

 

Thanks! :)

 

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

Read byte by byte and use the first 5 bits.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Nineshadow said:

Read byte by byte and use the first 5 bits.

If I read a full-byte, and take the first 5, I'll only have 3 out of the next 5 bits, left in the byte...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

create a buffer

read a full byte, transfer the 1st 5 bits from the buffer to a new string (or however you call it in C)

when the buffer has bits left in it less than 5, read a new byte and append it to the buffer

 

this method has overhead tho, not that much

 

but I think this method of your of reading byte by byte will trash the I/O,

wouldn't be better to create a large buffer, like 64K or 128K and work with that? when that empties, read a new set of 64K

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, Nineshadow said:

Read byte by byte and use the first 5 bits.

And how do I separate the first-5 bits? Should I individually check each bit? Will "bit-fields" be useful?

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, Anand_Geforce said:

If I read a full-byte, and take the first 5, I'll only have 3 out of the next 5 bits, left in the byte...

Manage it. It won't be too hard to do.

1 minute ago, Anand_Geforce said:

And how do I separate the first-5 bits? Should I individually check each bit? Will "bit-fields" be useful?

Depends. What are you storing them in? A structure? A character/integer? An array of booleans?

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

The computer work with bytes, not bits. Get this in your head. A byte is the minimum possible to read or write. 

I've said it in a previous post, you need to learn and understand that how data is actually stored in a file doesn't have to be exactly how you keep the data in memory and how you work with the data.

 

You don't read 5 bits, you don't even read one byte at a time from the file, it's not efficient. 

You need to write some code that when needed, will read a bunch of data from the file, keep it in memory in an array and when your code needs a few bits, a function will go in the array and position itself on the byte from where you need to extract bits and take 5 bits from that byte and potentially the next byte in the array (and if that one isn't read, read it from the file)

If you're lazy, just make up an array that will hold only 0 or 1, one for each bit, read a few bytes from the file and put the 8 bits of each byte into 8 positions of that array holding 0 or 1

Then when your code needs next 5 bits, a function just goes in that array and reads those 5 bits and puts them in a byte variable and keep track of where you were last positioned in the array, to continue reading the next 5 bits from that position or if there's less than 5 bits left in the buffer, to read more bytes from the file and populate the array with more information.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, mariushm said:

You don't read 5 bits, you don't even read one byte at a time from the file, it's not efficient. 

You need to write some code that when needed, will read a bunch of data from the file, keep it in memory in an array and when your code needs a few bits, a function will go in the array and position itself on the byte from where you need to extract bits and take 5 bits from that byte and potentially the next byte in the array (and if that one isn't read, read it from the file)

If you're lazy, just make up an array that will hold only 0 or 1, one for each bit, read a few bytes from the file and put the 8 bits of each byte into 8 positions of that array holding 0 or 1

Then when your code needs next 5 bits, a function just goes in that array and reads those 5 bits and puts them in a byte variable and keep track of where you were last positioned in the array, to continue reading the next 5 bits from that position or if there's less than 5 bits left in the buffer, to read more bytes from the file and populate the array with more information.

that's more or less what I suggested

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Nineshadow said:

What are you storing them in? A structure? A character/integer? An array of booleans?

What do you suggest? I'm converting every 5-bits into 8-bit chars...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Anand_Geforce said:

What do you suggest? I'm converting every 5-bits into 8-bit chars...

That's probably the best way of doing it.

Anyway, use binary operators. (eg. '<<' and '>>')

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, mariushm said:

The computer work with bytes, not bits. Get this in your head. A byte is the minimum possible to read or write. 

I've said it in a previous post, you need to learn and understand that how data is actually stored in a file doesn't have to be exactly how you keep the data in memory and how you work with the data.

 

You don't read 5 bits, you don't even read one byte at a time from the file, it's not efficient. 

You need to write some code that when needed, will read a bunch of data from the file, keep it in memory in an array and when your code needs a few bits, a function will go in the array and position itself on the byte from where you need to extract bits and take 5 bits from that byte and potentially the next byte in the array (and if that one isn't read, read it from the file)

If you're lazy, just make up an array that will hold only 0 or 1, one for each bit, read a few bytes from the file and put the 8 bits of each byte into 8 positions of that array holding 0 or 1

Then when your code needs next 5 bits, a function just goes in that array and reads those 5 bits and puts them in a byte variable and keep track of where you were last positioned in the array, to continue reading the next 5 bits from that position or if there's less than 5 bits left in the buffer, to read more bytes from the file and populate the array with more information.

 

 

 

I understand... thanks!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Anand_Geforce said:

This question is related to this topic:

Now, I'm trying my hand at file compression, and would like to extract exactly 5-bits from the file, for each iteration. I know that one has to extract a full-byte at a time (can someone explain this too?), but I'm looking to use a direct solution. Is it possible? How else can this be achieved?

 

Thanks! :)

 

What compression algorithm are you trying to implement?

Link to comment
Share on other sites

Link to post
Share on other sites

This is the exact same problem we talked about in the other thread. If you want 5 bits out of 1 byte, you just do what was talked about in your Bitwise I/O thread. Just as a refresher though: https://en.wikipedia.org/wiki/Bit_array

If you're really smart about it, you'll load all the bits you might want to look at into your bitArray (that you should have made by now, that other thread is quite old) and then you can just index the bits you want using the methods you've already made. 

Please learn that modern computers are not bit-addressable, nor are they word addressable, they are only byte addressable

 

As for how to figure out how many bytes from the file to extract when you only want n consecutive bits, that's simple, you extract n/8 (any decimal is rounded up) bytes from the file.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, Unimportant said:

What compression algorithm are you trying to implement?

It's my own, but it's pretty simple - instead of having 8-bits for each char, I write only 5-bits each char. It's certainly not fool-proof, but then again, I do this for the love of coding, and not for creating efficient compression algos. This is just the beginning - I'll try making it more efficient later...

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, straight_stewie said:

This is the exact same problem we talked about in the other thread. If you want 5 bits out of 1 byte, you just do what was talked about in your Bitwise I/O thread. Just as a refresher though: https://en.wikipedia.org/wiki/Bit_array

As for how to figure out how many bytes from the file to extract when you only want n consecutive bits, that's simple, you extract n/8 (any decimal is rounded up) bytes from the file.

I remember... It's the last part I was confused about (and am still...) Let's say n=5, how do I extract 5/8 bits from a file, when the tiniest unit we can address is a byte? I thought of using

ifstream.read(&bit_array,sizeof((5/8)*char));

But I'm apprehensive about it... What do you think?

 

Thanks! :)

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Anand_Geforce said:

It's my own, but it's pretty simple - instead of having 8-bits for each char, I write only 5-bits each char. It's certainly not fool-proof, but then again, I do this for the love of coding, and not for creating efficient compression algos. This is just the beginning - I'll try making it more efficient later...

It's not about efficiency, that just plain does not work. You can't just throw away 3 bits out of every byte and call it "compression". That's destruction: The original data is lost forever.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Anand_Geforce said:

 


ifstream.read(&bit_array,sizeof((5/8)*char));

 

That's just wrong on so many levels. For starters "5/8" will yield 0. And again, you cannot read bits from a file, only bytes. Read the whole file to a buffer (probably in chunks, 64K at a time for example) and then work with the data from the buffer. 

Link to comment
Share on other sites

Link to post
Share on other sites

Without knowing exactly what compression he invented, you can't say that it won't work.  For example, let's say he "invented" a compression algorithm optimized for TEXT documents, where he uses 5 bits to store any character between "a" and "z" and maybe a couple of other characters like space (27 characters if I'm not wrong) and more bits if it's outside this range.

 

You can have 32 possible values in 5 bits, from 0 to 31.  Let's say we use values 1 to 30 for lowercase ascii characters, we have ascii code 97 for "a" and 122 for "z" so we convert this to 1..26 for "a".."z", we use 27 for space and we have 3 other positions for other characters and we use the last position (31 or 11111) to indicate a character outside this range follows, which means the following 8 bits will contain the uncompressible character. (the 0 will be unused because of reasons, which you'd figure out if you really care)

 

So if a file contains only lowercase characters, then the compression ratio for this algorithm would be 5/8 * 100 = 62.5%  However, for any character that can't be compressed, you'll use 13 bits to store it, so you have a compression ratio of 13/8 * 100 = 162%

 

It can be further improved.. for example one of those 4 extra positions could be something like this ... 5 bits holding value 28 followed by 3 bits means up to 8 characters that follow this code are all uppercase so convert them to lowercase and compress so the text "TESTING" could be compressed as  [5 bits value 28] [ 3 bits value 6, meaning 7 characters ] [ 5 bits "t" ] [ 5 bits "e" ] [ [ 5 bits "s" ] [ 5 bits "t" ] [ 5 bits "i" ] [ 5 bits "n" ] [ 5 bits "g" ]   - so you just compressed a 7 byte (56 bits) text into 43 bits but you'll have to add 5 bits of padding to "round up to" 48 bits (6 bytes), because the minimum you can write to a file is a BYTE .. so you achieved a compression ratio of 6/7*100 = 85%

 

It's a very bad compression algorithm, but then again so is RLE if it's used outside what it was designed for..

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Anand_Geforce said:

I remember... It's the last part I was confused about (and am still...) Let's say n=5, how do I extract 5/8 bits from a file, when the tiniest unit we can address is a byte? I thought of using

You dont. You round up to the nearest byte, then use the bit shifting methods we discussed in your previous thread. So for n=5 bits, that's 5/8's of a byte, round up is 1 byte, extract one byte.

Most likely, however, what you will want to do is read the entire file into some type of data structure so that it can be easily and logically accessed. Like a byte array, then you can index whichever byte you want easily.

 

THE SMALLEST UNIT OF MEMORY THAT YOU CAN WORK WITH IS A BYTE!!

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Anand, I was bored and decided to write a small code to show you how you would go about reading data from a file and returning 5 bits at a time to some other part of your program.

The code below is in PHP which doesn't have data types so it may be a bit harder to follow but the basic concept is this :

 

Try to read one byte from the file.  If succeeded, convert this byte into a string made out of 8 characters "0" or "1" and add this string at the end of a variable which is kept in memory.

When your program needs 5 bits, a function checks if there are at least 5 characters in that string stored in memory and if so, takes the first 5 characters and converts them back to a number, "000101" is converted back to the value 5, and the first 5 characters are removed from that string. If there's fewer than 5 characters stored in memory, a byte is read from the file, the 8 characters 1 or 0 are appended to that string and the first 5 characters are then converted to number.

There's one special case, when an end of file is detected, you may have any number of bits left in that string in memory, between 0 and 4 characters of "0" or "1".

In this special case, "0" characters are added at the beginning of the string to have an exact number of 5 characters.

If we're at the end of file and we don't have any bits still left in memory, the function simply returns 255 which is obviously an invalid value because it's outside the range possible to represent using 5 bits (0..31) .. the code outside the function needs to worry about it.

 

Again, this is a stupid way to code, very unoptimized, a lot of extra cpu cycles wasted, in an ideal world you read at least a big number of bytes from the file at a time, not one byte at a time. It's common to read multiples of 512 or 4096 bytes because that's the sector size of your hard drives so the operating system would read that much data in one shot from your hard drive anyway. You would also NOT do any conversions to string or char arrays, you'd just go in the array of bytes and position yourself on the byte from where you want to extract bits and if needed read bits from the byte that immediately follows so that you'd have

 

Here's the code, hope it's easy enough to follow:


 

<?php

class BitReader {
 private $handle;
 private $bits;
 private $bitscount;
 
 // this function is automatically executed every time the class is initialized, basically sets safe initial values for variables. 
 function __construct() {
     $this->handle = FALSE; // no file opened
     $this->bits = "";
     $this->bitscount = 0;

 }

 public function OpenFile($filename) {
     $this->handle = fopen($filename,'r'); // open file for reading
     // should add error checking of course
}
 public function CloseFile() { // close the file
     fclose($this->handle);
     $this->handle = FALSE;
     $this->bits = "";
     $this->bitscount = 0;
 }
 
 // converts a byte to a 8 character string, for example "0101001"
 // php has a built in function for this called decbin but let's make it
 // simple to convert to other programming languages
 private function GetBits($text) {
    $value = ord($text); // php thing, force conversion to BYTE
    $s = "";
    for ($i=0;$i<8;$i++) {
        $s = (($value & 1) ? "1" : "0") . $s;
        $value = $value >> 1;
    }
    //echo "getBits $text $value $s \n";
    return $s;
 }
 // convert a 5 character string like "01010" to a byte
 private function ConvertStringToNumber($text) {
   $value = 0;
   for ($i=0;$i<5;$i++) {
       $value = $value * 2; // same as << 1
       if (substr($text,$i,1) == '1') $value = $value +1;
   }
   echo "convert $text $value \n";
   return $value;
 }
 
 public function ReadBits() {
     if ($this->handle == FALSE) return 255; // file not opened, so just reply as if we're at end of file and there's no bits to return

     if ($this->bitscount >= 5) {
       // we have at least 5 bits still left in memory, no need to read from file
       $text = substr($this->bits,0,5); // get first 5 characters;
       $this->bits = substr($this->bits,5); // remove them from string
       $this->bitscount = $this->bitscount - 5;
       return $this->ConvertStringToNumber($text);
     }
     // we have less than 5 bits stored in memory, maybe no bits at all
     // can we read one more byte from file? are we at the end?
     $eof = FALSE;
     $value = fread($this->handle,1); // read one byte from file
     if ($value==FALSE) $eof = TRUE; // error reading byte, assume end of file
     if (feof($this->handle)==TRUE) $eof = TRUE;
         
     if ($eof == TRUE) {
        // end of file so can't read more bits, just return whatever is left
        // add "0" bits to the left, if there's less than 5 bits in memory
        if ($this->bits!="") {
            while (strlen($this->bits) !=5) $this->bits = "0".$this->bits;
            $value = $this->ConvertStringToNumber($this->bits);
            $this->bits = "";
            $this->bitscount = 0;
            return $value;
        }
     } else {
        //var_dump($value);
        // add the bits to memory
        $this->bits = $this->bits . $this->GetBits($value);
        $this->bitscount = $this->bitscount + 8;
        // get 5 bits
        $value = substr($this->bits,0,5); // take the first 5 characters from string
        $this->bits = substr($this->bits,5); // remove the first 5 characters from memory string
        $this->bitscount = $this->bitscount - 5;
        return $this->ConvertStringToNumber($value); // convert to number and return it
     }
     // if we're at end of file and no more bits are in memory and this
     // function is called, return a number that's obviously bigger than what
     // could be stored in 5 bits, let's say 255;
     return 255;
 }
 
}

$count = 1;
$filereader = new BitReader();

// tip : create a text file with 2 characters in it, this means 16 bits, so 4 function calls 
// 3 x 5 bits + 1 x 1 bit alligned to the right
// the 5th time the function is called, it will return 255 telling you end of file is reached

$filereader->OpenFile('y:/temp/test.txt'); // <-- replace this with your text file name

// read first 5 bits
$value = $filereader->ReadBits();
// while there's no error (no end of file), show the 5 bits with a counter and read the next 5 bits
while ($value !=  255) {
    echo $count. ": " . $value . "\n";
    $value = $filereader->ReadBits();
    $count++;
}
$filereader->CloseFile();

?>

 

Here's the output for a text file containing the letters "Az"  0x41 , 0x7A or "0100 0001 0111 1010" :

 

convert 01000 8
1: 8
convert 00101 5
2: 5
convert 11101 29
3: 29
convert 00000 0
4: 0

 

The lines with "convert" are "debug" messages I intentionally left uncommented in a function to make it easier to see what happens.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

23 hours ago, Unimportant said:

It's not about efficiency, that just plain does not work. You can't just throw away 3 bits out of every byte and call it "compression". That's destruction: The original data is lost forever.

I think you've misunderstood... Keep reading...

23 hours ago, mariushm said:

Without knowing exactly what compression he invented, you can't say that it won't work.  For example, let's say he "invented" a compression algorithm optimized for TEXT documents, where he uses 5 bits to store any character between "a" and "z" and maybe a couple of other characters like space (27 characters if I'm not wrong) and more bits if it's outside this range.

 

It's a very bad compression algorithm, but then again so is RLE if it's used outside what it was designed for..

Exactly! :) - 00001 for 'a', 00010 for 'b', 00011 for 'c', and so on... 5 bits are enough for basic ascii characters (26 values for a-z, and around 25-30 for other symbols). So text file compression is a go! Of course as I mentioned before, this is a very basic stupid idea, which is not fool-proof, but it's a start!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

Fair enough. Although 5 bits only allows for 32 different combinations so you probably want 6 bits (64 combinations) if you want to store capital letters, normal letters and numbers (26 capital letters + 26 small letters + 10 numbers = 62) Leaving 2 for a space character and '.'

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Unimportant said:

Fair enough. Although 5 bits only allows for 32 different combinations so you probably want 6 bits (64 combinations) if you want to store capital letters, normal letters and numbers (26 capital letters + 26 small letters + 10 numbers = 62) Leaving 2 for a space character and '.'

 

OK... Will do. Thanks!

Nothing to see here ;)

Link to comment
Share on other sites

Link to post
Share on other sites

With the observation that since the minimum quantity you can write to a file is ONE byte, there may be a situation where you're left with 1-3 bits that must be written to file, so you have to pad those 3 bits with 5 extra bits.  But how to tell the decompressor to ignore these 5 extra bits when it decodes 5 bit chunks?.

 

One idea would be to make all those 5 bits "0" and just enforce the rule that the decoder will assume it reached the end of file when it reads 5 consecutive bits that are "0". 

 

Another idea would be to invent a sort of file format for your compression, for example just say that the compressed file is split into blocks of compressed data which can have varying lengths, and each block has two bytes at the beginning which signifies how many 5 bits chunks are stored in this block. PNG and other formats do this sort of thing, they're made out of blocks each with a name and length and other key values after which data follows.

 

So, for example, if you have 10 lowercase characters to compress, the compressed file will be  [ byte 1 = value 0 ] [ byte 2 = value 10 ] [  7 bytes = 56 bits , 50 bits of actual compressed data and 6 bits of padding ]

Byte 1 * 256 + Byte 2 = 0 * 256 + 10 = 10 , which tells your decoder that this block contains 10 chunks of 5 bits , which equals 50 bits, but since decoder can only read multiples of 8 bits, the decoder will know that it needs to read  the next multiple of 8 after 50, which is 56 bits ( / 8 bits = 7 bytes) 

So on one hand even for the smallest text file, you'll have two extra bytes in the compressed file, but on the other hand you gain the ability to use the "00000" code which could be used to improve the compression if used like a sort of wildcard, like i mentioned using a special code before uppercase letters to encode them as lowercase.

With two extra bytes, you'll have up to 65536 chunks of 5 bits in a "block", which equals to  327,680 bits or 40960 bytes or 40 KB , so two extra bytes every 40 KB is really not such a big deal.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Here's a little example program that does what i think you want.

It takes pure text ASCII files only and only supports Capital letters (A-Z), small letters (a-z), numbers (0-9), space and period '.' It removes carriage returns and new lines because there are not enough combinations left in 6 bits to store those.

 

The program will stop if it encounters any other, unsupported, characters.

 

#include <iostream>
#include <fstream>
#include <memory>
#include <cstring>

//Reads a block of n size from the given file, returns the actual amount of bytes read.
int 
ReadBlock(std::ifstream& InFile, std::unique_ptr<char[]>& pInBuffer, int n)
{
	if (!InFile.read(pInBuffer.get(), n))
		return InFile.gcount();

	return n;	
}

//Re-encodes the given buffer contents.
//space = 0
//A-Z = 1 - 26
//a-z = 27 - 52
//0-9 = 53 - 62
//'.' = 63
//Removes CR, LF because there are no combinations left in 6 bit.
//Returns false if there were unsupported characters in the input.
bool
ReEncode(std::unique_ptr<char[]>& pInBuffer, int n)
{
	//Loop trough all bytes in buffer
	for (int i = 0; i < n; i++)
	{
		//Get current character to handle.
		char c = pInBuffer[i];
		
		//Handle capital letters...
		if (c >= 'A' && c <= 'Z')
		{
			pInBuffer[i] = c - 'A' + 1;
			continue;
		}

		//Handle small letters...
		if (c >= 'a' && c <= 'z')
		{
			pInBuffer[i] = c - 'a' + 27;
			continue;
		}

		//Handle numbers...
		if (c >= '0' && c <= '9')
		{
			pInBuffer[i] = c - '0' + 53;
			continue;
		}

		//space...
		if (c == ' ')
		{
			pInBuffer[i] = 0;
			continue;
		}

		//period...
		if (c == '.')
		{
			pInBuffer[i] = 63;
			continue;
		}

		//Replace CR-LF with space.
		if (c == '\n')
		{
			pInBuffer[i] = 0;
			continue;
		}

		//Unsupported character.
		return false;	
	}

	return true;
}

//
int
RepackBytes(std::unique_ptr<char[]>& pInBuffer, std::unique_ptr<char[]>& pOutBuffer, int n)
{
	//Bits used in current output byte.
	int BitsUsed = 0; 

	//Current output byte.
	int i_out = 0;

	//Loop trough all characters in the buffer.
	for (int i = 0; i < n; i++)
	{
		char c = pInBuffer[i];

		//Use the bits that are left in the current output byte for the low bits of the input byte.
		pOutBuffer[i_out] |= c << BitsUsed;
		BitsUsed += 6;

		//If 8 or more bits used, shift to next output byte
		if (BitsUsed >= 8)
		{
			BitsUsed -= 8;
			i_out++;
			pOutBuffer[i_out] = c >> (6 - BitsUsed);
		}
	}

	return i_out + 1;
}

int 
main()
{
	try
	{
		//Size of chunks we will read from file. Reading single bytes is inefficient so we read blocks.
		const int InBlockSize = 1024 * 16;

		//Size of output block. 
		const int OutBlockSize = (InBlockSize / 8) * 6;

		//Allocate buffers for these blocks.
		std::unique_ptr<char[]> pInBuffer(new char[InBlockSize]);
		std::unique_ptr<char[]> pOutBuffer(new char[OutBlockSize]);

		//Clear output buffer.
		memset(pOutBuffer.get(), 0, OutBlockSize);
	
		//Open input and output file.
		std::ifstream	InFile("in.txt", std::ios::binary | std::ios::in);
		if (!InFile.is_open())
		{
			std::cout << "Failed to open input file!" << '\n';
			return 1;
		}

		std::ofstream	OutFile("out.txt", std::ios::binary | std::ios::out);
		if (!OutFile.is_open())
		{
			std::cout << "Failed to open output file!" << '\n';
			return 1;
		}

		//Loop as long as we read full blocks. When a incomplete block is read (end of file) process it and exit.
		int BytesRead;		
		do
		{
			//Read a block.
			BytesRead = ReadBlock(InFile, pInBuffer, InBlockSize);

			//Encode the block to our 6 bit format.
			if (!ReEncode(pInBuffer, BytesRead))
			{
				std::cout << "Unsupported characters in input!" << '\n';
				return 2;
			}

			//Repack the bytes.
			int PackedBytes = RepackBytes(pInBuffer, pOutBuffer, BytesRead);

			//Write to output file.
			if (!OutFile.write(pOutBuffer.get(), PackedBytes))
			{
				std::cout << "Failed writing output file!" << '\n';
				return 3;
			}	
		} while (BytesRead == InBlockSize);

	}
	catch (std::bad_alloc&)
	{
		std::cout << "Failed to allocate buffer!" << '\n';
		return 4;
	}
	catch (std::exception& e)
	{
		std::cout << "Exception: " << e.what() << '\n';
	}

	return 0;
}

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×