Programming language with native support for direct bitwise file I/O

ClobberXD · July 1, 2016

It'd be nice to have a programming language that natively supports direct bitwise file I/O... to output bits, we could simply:

int main()
{
  Bit my_bit = 01010011;
  ofstream outfile(filepath);
  Bit_write(outfile, my_bit);
  return 0;
}

(Note the 'Bit' data type...)

Is there any such language? Can we create such a function? Can we create such a data type (or template, STL container, etc.)?

Nuluvius · July 1, 2016

In C++ there is the std::bitset that you could have a look at.

Unimportant · July 1, 2016

This looks like a xy problem to me ( http://mywiki.wooledge.org/XyProblem )

What are you really trying to do ? Because it doesn't make much sense. If you want to put a literal value in your code how hard is it to do the conversion to a hex or dec value yourself?

ClobberXD · July 1, 2016

2 hours ago, Unimportant said:

This looks like a xy problem to me ( http://mywiki.wooledge.org/XyProblem )

What are you really trying to do ? Because it doesn't make much sense. If you want to put a literal value in your code how hard is it to do the conversion to a hex or dec value yourself?

If we could directly read/write binary data ( 1s and 0s only, not ios::binary) with the help of an auxiliary data-type it'll be very convenient...

And about the XY-Problem... Nice one!!!

Unimportant · July 1, 2016

This might help:

http://stackoverflow.com/questions/537303/binary-literals

straight_stewie · July 2, 2016

You can use a bit-array.

As an example, let's assume you only need to manipulate one byte at a time:

BitArray^ myBits = gcnew BitArray( 8, true ); // 8 bits, all 1, or 255 if viewed as a byte.
BitArray^ mySecondBits = gcnew BitArray( false, true, false, false, false, false, false, true ); // 8 bits, representing the number 65 when viewed as byte

Nuluvius · July 2, 2016

13 hours ago, Anand_Geforce said:

If we could directly read/write binary data ( 1s and 0s only, not ios::binary) with the help of an auxiliary data-type it'll be very convenient...

You keep repeating that... Yet it's still no clearer what exactly you are trying to do.

If none of the existing types match your expectations then simply make your own. You can encapsulate whatever functionality you want...

What is the issue?

ClobberXD · July 2, 2016

9 hours ago, Nuluvius said:

You keep repeating that... Yet it's still no clearer what exactly you are trying to do.

If none of the existing types match your expectations then simply make your own. You can encapsulate whatever functionality you want...

What is the issue?

Then can you help me with creating a data type 'Bit' that contains only 1s and 0s? How is it possible - templates, typedefs?

And a write function that can directly write those 1s and 0s to a file, and a read function to read data directly as 1s & 0s...

"Simply make your own." - How exactly do I achieve that?

fizzlesticks · July 2, 2016

2 hours ago, Anand_Geforce said:

Then can you help me with creating a data type 'Bit' that contains only 1s and 0s? How is it possible - templates, typedefs?

And a write function that can directly write those 1s and 0s to a file, and a read function to read data directly as 1s & 0s...

"Simply make your own." - How exactly do I achieve that?

You can't make a type that is only 1 bit, the smallest you can get is 1 byte (there are bitfields but don't even try to use them.) To do that you can use an unsigned char and use bitwise operators or use a bitset which is essentially a vector of bits.

As for reading and writing to files, that's how the file.read() and file.write() already work. Write takes a pointer to where ever your data starts and size and writes the 1s and 0s. Read does the same thing in the other direction.

straight_stewie · July 2, 2016

Just now, Anand_Geforce said:

Then can you help me with creating a data type 'Bit' that contains only 1s and 0s

A type that logically represents a binary value already exists: A "boolean". A boolean value is either true or false (1 or 0). Refactor your logic into boolean logic, but beware, each boolean can take up to 4 bytes of memory.

TBH there is no way to actually get bit level control (of the type you are talking about). Even a BitArray is really just a boolean array with some methods on top of it that allow it to be easily converted to other data types, and treated abstractly as a collection of bits.

Regardless, what you keep asking for is impossible, unless you're willing to build memory just for that task. This is simply because modern memory is byte addressable and not bit addressable. But IMO it's unnecessary: If bit level control was really that helpful, modern processors would support it.

ClobberXD · July 3, 2016

7 hours ago, fizzlesticks said:

You can't make a type that is only 1 bit, the smallest you can get is 1 byte (there are bitfields but don't even try to use them.) To do that you can use an unsigned char and use bitwise operators or use a bitset which is essentially a vector of bits.

As for reading and writing to files, that's how the file.read() and file.write() already work. Write takes a pointer to where ever your data starts and size and writes the 1s and 0s. Read does the same thing in the other direction.

file.read() and file.write() can directly write 1s & 0s?!

ClobberXD · July 3, 2016

5 hours ago, straight_stewie said:

A type that logically represents a binary value already exists: A "boolean". A boolean value is either true or false (1 or 0). Refactor your logic into boolean logic, but beware, each boolean can take up to 4 bytes of memory.

TBH there is no way to actually get bit level control (of the type you are talking about). Even a BitArray is really just a boolean array with some methods on top of it that allow it to be easily converted to other data types, and treated abstractly as a collection of bits.

Regardless, what you keep asking for is impossible, unless you're willing to build memory just for that task. This is simply because modern memory is byte addressable and not bit addressable. But IMO it's unnecessary: If bit level control was really that helpful, modern processors would support it.

So then the BitArray is my best bet? Right... I'll play around with BitArray for some time.

Thanks!

mariushm · July 3, 2016

The minimum amounts you can read or write are 1 BYTE (8 bits)

The programming language has several data types which can hold anything from just two states ( true or false,0 or 1) to billions of possible values (float, double data type).

The simplest data type called boolean in most programming languages has just two states, true or false, so in theory it can be stored in memory as 1 bit. However, since everything in computer world is in multiples of 8 bits, all programming languages use 1 byte (or more) to store one boolean variable in the memory or when saved to disk.

In some programming languages false is -1 and true is 0 ( -1 is 10000000 in binary), in other programming languages false is 0 and true is 1 so basically just the position of the 1 in the series of 8 bits changes. To prevent this confusion, most programming languages just recommend to use unsigned byte instead of boolean to store information that can have just two states, true or false, because this way you'd know for sure false is 0 and true is 1 (or whatever you define it), because using a byte instead of boolean won't really cause more memory to be used by the application and the processors are optimized anyway to work with bytes.

If for some reason you absolutely need to store to disk 8 variables that can be true or false in a tiny amount of disk space (1 byte), this is very easy to do.

final byte = (0|1) x 2^7 + (0|1) x 2^6 + (0|1) x 2^5 + (0|1) x 2^4 + (0|1) x 2^3 + (0|1) x 2^2 + (0|1) x 2^1 + (0|1) x 2^0

final byte = (0|1) x 128 + (0|1) x 64 + (0|1) x 32 + (0|1) x 16 + (0|1) x 8 + (0|1) x 4 + (0|1) x 2 + (0|1) x 1

And then to determine if one bit inside a byte is 0 or 1 inside a byte you can just use bitwise operators: https://en.wikipedia.org/wiki/Bitwise_operation or http://www.tutorialspoint.com/cprogramming/c_operators.htm

You start with the & operator which allows you to compare a value with another value and if they're the same it returns true:

0 & 0 = 0

0 & 1 = 0

1 & 1 = 1

1 & 0 = 0

So let's say we want to know if the last bit of a byte is 1 or not, we can easily determine this by doing this:

bool is_last_bit_set = number & 1

Now if you want to figure out if a random bit from inside that byte is set to 1 or not, you can just use the >> operator, which moves all bits to the right by a specific number of bits and puts 0 bits on the left side:

#define bitGet(v, n) ((v >> n) & 1) where v = value and n is number of bits

So let's say your v is 206 ( 11001110 ) , n can be between 0 and 7 where 7 is the first bit. If you want to figure the state of the third bit from the right, you just say bitGet(value,2) and then v is shifted to the right by two positions and becomes 00110011 (51) and now your last bit is 1 so when you now use & 1 , 1 and 1 are compared and the result is true, or 1.

To set a bit to 1 or 0 inside a byte you can use the bitwise operator OR , for example to set a bit to 1 (same rules apply, n between 0 and 7) :

#define bit_set(v,n) (v |= (1<<n) )

ps. Just use UNSIGNED BYTE (i think it's the default but doesn't hurt to specify it). In signed bytes, each byte can have values between -128 and 127 (256 values in total) and the first bit in the 8 bits is the sign bit. the Unsigned Byte data type is from 0.. 255 so the

straight_stewie · July 3, 2016

Just now, Anand_Geforce said:

So then the BitArray is my best bet? Right... I'll play around with BitArray for some time.

What exactly are you trying to do by manipulating things at the bit level? That would really help us point you in the right direction.

A BitArray may not be the best option for you because it has extra overhead that you probably don't need.

ClobberXD · July 3, 2016

1 hour ago, straight_stewie said:

What exactly are you trying to do by manipulating things at the bit level? That would really help us point you in the right direction.

A BitArray may not be the best option for you because it has extra overhead that you probably don't need.

It may be helpful for many applications, but right now I'm just experimenting with bit I/O, and also trying implement Huffman coding algorithm using the above idea...

mariushm · July 3, 2016

Just use unsigned byte data type in your code to store 0 or 1 (false or true) and when writing data to disk pack eight bytes at a time into a single unsigned byte variable and write that byte to disk. If you need to write a number of bits that aren't a multiple of 8, just modify your code to pad the last byte with 0 bits until you get a full byte to write to drive.

It doesn't matter how your bits are represented in the code and in memory and how they're actually written to disk, don't force yourself to keep things in memory exactly how they're supposed to written to disk at the end.

ClobberXD · July 3, 2016

7 minutes ago, mariushm said:

Just use unsigned byte data type in your code to store 0 or 1 (false or true) and when writing data to disk pack eight bytes at a time into a single unsigned byte variable and write that byte to disk. If you need to write a number of bits that aren't a multiple of 8, just modify your code to pad the last byte with 0 bits until you get a full byte to write to drive.

It doesn't matter how your bits are represented in the code and in memory and how they're actually written to disk, don't force yourself to keep things in memory exactly how they're supposed to written to disk at the end.

Can you demonstrate to me how to pack the eight byte objects into a single unsigned byte?

straight_stewie · July 3, 2016

Just now, Anand_Geforce said:

Can you demonstrate to me how to pack the eight byte objects into a single unsigned byte?

Sorry @mariushm

Just now, mariushm said:

The minimum amounts you can read or write are 1 BYTE (8 bits)

The programming language has several data types which can hold anything from just two states ( true or false,0 or 1) to billions of possible values (float, double data type).

The simplest data type called boolean in most programming languages has just two states, true or false, so in theory it can be stored in memory as 1 bit. However, since everything in computer world is in multiples of 8 bits, all programming languages use 1 byte (or more) to store one boolean variable in the memory or when saved to disk.

In some programming languages false is -1 and true is 0 ( -1 is 10000000 in binary), in other programming languages false is 0 and true is 1 so basically just the position of the 1 in the series of 8 bits changes. To prevent this confusion, most programming languages just recommend to use unsigned byte instead of boolean to store information that can have just two states, true or false, because this way you'd know for sure false is 0 and true is 1 (or whatever you define it), because using a byte instead of boolean won't really cause more memory to be used by the application and the processors are optimized anyway to work with bytes.

If for some reason you absolutely need to store to disk 8 variables that can be true or false in a tiny amount of disk space (1 byte), this is very easy to do.

final byte = (0|1) x 2^7 + (0|1) x 2^6 + (0|1) x 2^5 + (0|1) x 2^4 + (0|1) x 2^3 + (0|1) x 2^2 + (0|1) x 2^1 + (0|1) x 2^0

final byte = (0|1) x 128 + (0|1) x 64 + (0|1) x 32 + (0|1) x 16 + (0|1) x 8 + (0|1) x 4 + (0|1) x 2 + (0|1) x 1

And then to determine if one bit inside a byte is 0 or 1 inside a byte you can just use bitwise operators: https://en.wikipedia.org/wiki/Bitwise_operation or http://www.tutorialspoint.com/cprogramming/c_operators.htm

You start with the & operator which allows you to compare a value with another value and if they're the same it returns true:

0 & 0 = 0

0 & 1 = 0

1 & 1 = 1

1 & 0 = 0

So let's say we want to know if the last bit of a byte is 1 or not, we can easily determine this by doing this:

bool is_last_bit_set = number & 1

Now if you want to figure out if a random bit from inside that byte is set to 1 or not, you can just use the >> operator, which moves all bits to the right by a specific number of bits and puts 0 bits on the left side:

#define bitGet(v, n) ((v >> n) & 1) where v = value and n is number of bits

So let's say your v is 206 ( 11001110 ) , n can be between 0 and 7 where 7 is the first bit. If you want to figure the state of the third bit from the right, you just say bitGet(value,2) and then v is shifted to the right by two positions and becomes 00110011 (51) and now your last bit is 1 so when you now use & 1 , 1 and 1 are compared and the result is true, or 1.

To set a bit to 1 or 0 inside a byte you can use the bitwise operator OR , for example to set a bit to 1 (same rules apply, n between 0 and 7) :

#define bit_set(v,n) (v |= (1<<n) )

ps. Just use UNSIGNED BYTE (i think it's the default but doesn't hurt to specify it). In signed bytes, each byte can have values between -128 and 127 (256 values in total) and the first bit in the 8 bits is the sign bit. the Unsigned Byte data type is from 0.. 255 so the

fizzlesticks · July 3, 2016

46 minutes ago, Anand_Geforce said:

Can you demonstrate to me how to pack the eight byte objects into a single unsigned byte?

That was a typo, it's 8 bits in 1 byte.

ClobberXD · July 3, 2016

1 minute ago, fizzlesticks said:

That was a typo, it's 8 bits in 1 byte.

Oops! Thanks!

Nuluvius · July 3, 2016

19 hours ago, Anand_Geforce said:

"Simply make your own." - How exactly do I achieve that?

What I am meaning is that you can pack whatever functionality you like behind some interface. For instance if you just want to be able to visualise and manipulate the bit patterns then you could just surface up something that consists of 0s and 1s however the encapsulated implementation detail may allow you to read and write that structure via the use of unsigned chars and file IO.

Without knowing what exactly it is that you are trying to do, and I suspect that you may even be confused about that yourself, it is so arbitrary to make any suggestions. That's why there's such a variance in the responses. You are asking a question that makes little sense and you seem unable to clarify.

6 hours ago, straight_stewie said:

A BitArray may not be the best option for you because it has extra overhead that you probably don't need.

Incidentally that's VC++ and not C++.

ClobberXD · July 3, 2016

6 minutes ago, Nuluvius said:

What I am meaning is that you can pack whatever functionality you like behind some interface. For instance if you just want to be able to visualise and manipulate the bit patterns then you could just surface up something that consists of 0s and 1s however the encapsulated implementation detail may allow you to read and write that structure via the use of unsigned chars and file IO.

Without knowing what exactly you are trying to do, and I suspect that you may even be confused about that yourself, it is so arbitrary to make any suggestions. That's why there's such a variance in the responses. You are asking a question that makes little sense and you seem unable to clarify.

Incidentally that's VC++ and not C++.

You mean to say that we can simply call a function to read/write bits, but the function definition has all the dirty bitwise operations? I understand the idea, and consider it to be an excellent one, but I don't know how... As you suspect, I'm totally in the dark about bits and manipulating them, and I'm trying to learn.

Nuluvius · July 3, 2016

17 minutes ago, Anand_Geforce said:

You mean to say that we can simply call a function to read/write bits, but the function definition has all the dirty bitwise operations? I understand the idea, and consider it to be an excellent one, but I don't know how... As you suspect, I'm totally in the dark about bits and manipulating them, and I'm trying to learn.

Yes you are going to hide the implementation mechanics of reading, writing, storing and manipulating behind something - a function or a whole class. You can then surface something to the outside world such as a string representing that data in 0s and 1s for example.

However from the end of your comment:

17 minutes ago, Anand_Geforce said:

As you suspect, I'm totally in the dark about bits and manipulating them, and I'm trying to learn.

It seems to imply that this may not be what you are after but instead perhaps you need to focus on the mechanics of reading, writing, sorting and manipulating bits - that you will achieve in C++ with the use of the unsigned char type: 8 bits: 0000 0000 = 0 ... 255

ClobberXD · July 3, 2016

2 hours ago, Nuluvius said:

Yes you are going to hide the implementation mechanics of reading, writing, storing and manipulating behind something - a function or a whole class. You can then surface something to the outside world such as a string representing that data in 0s and 1s for example.

However from the end of your comment:

It seems to imply that this may not be what you are after but instead perhaps you need to focus on the mechanics of reading, writing, sorting and manipulating bits - that you will achieve in C++ with the use of the unsigned char type: 8 bits: 0000 0000 = 0 ... 255

It is what I'm after! Thanks a ton! I'll learn everything to do with bits, and create a class to make it simple to work with bits.

Unimportant · July 3, 2016

Bit manipulation is already baked into the language since the early days of C. No need to build a class around it. I'd even advice against it because any decent C/C++ programmer can easily read and understand the normal way for bit manipulation, if you encapsulate it in a class you're obfuscating something that is already standardized, simple and clear.

The most common used operators for bit manipulation are AND (&), OR (|), NOT (~) and XOR (^) (Not to be confused with logic AND (&&), OR (||), NOT (!)), along with the bit shift operators (<<, >>).

Examples:

//Data is 8 bit unsigned.
uint8_t Data;

//Set a single bit, for example bit 3
Data = (1 << 3);

//Set multiple bits, for example bit 4 and 6
Data = (1 << 4) | (1 << 6);

//Set all bits, except bit 4 and 6
Data = ~((1 << 4) | (1 << 6));

//Set bit 2 while keeping all other bits as they were
Data |= (1 << 2);  

//Clear bit 3 while keeping all other bits as they were
Data &= ~(1 << 3);

//etc...

However, it is considered bad practice to put literals throughout your code. It's called 'magic numbers' (https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants) and is very confusing to other programmers and yourself later to remember what all these stray numbers are. It's also hard to update code with literals all over the place, especially if a certain literal is used multiple times.

The common practice is to define your literals as const values. If a literal is to be composed from individual bitvalues then create bitmasks and use them to compose the literal. This way the bits have names and meaning so anyone can see at a glance what is happening.

For example:

/*
Consider writing code for a HD44780 LCD display 
(https://en.wikipedia.org/wiki/Hitachi_HD44780_LCD_controller#Instruction_set)
Where the display on/off command consists of following bit pattern:
0 0 0 0 1 D C B  where:
D = display on or off
C = cursor on or off
B = cursor blink on or off
*/

//We could do this to turn on the display and cursor and blink off
Command = 14;  //14 decimal = 00001110 binary

//However, this tells us nothing, you yourself won't know what this does in 3 months
//Therefore let's create bitmasks
const uint8_t bDisplayCmd = (1 << 3);
const uint8_t bDisplayOn = (1 << 2);
const uint8_t bCursorOn = (1 << 1);
const uint8_t bBlinkOn = (1 << 0);
 
//Now we can compose our command from meaningfull bitnames:
Command = bDisplayCmd | bDisplayOn | bCursorOn;

Sign In

Programming language with native support for direct bitwise file I/O

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites