Big Endian

Wictorian · October 15, 2022

I’m coding the SHA-256 algorithm and I need to use big endian representation. I’ve done research about it and understood how it works however I don’t know how I will integrate it to SHA-256. Do you have an idea?

Eigenvektor · October 15, 2022

Little endian and big endian are just the way a CPU handles multi-byte numbers internally. I'm not quite sure how it factors into SHA-256 for your case (sources?), but I assume it is to ensure that two systems using the algorithm are using the same representation, so they arrive at the same hash for the same source.

For example the hexadecimal value 0A 0B 0C 0D would be represented as 0A 0B 0C 0D in the memory of a big endian system, while little endian would store it as 0D 0C 0B 0A instead. That would change the hash of it depending on how the algorithm approaches hashing the number. So the algorithm would have to take the endianness of the underlying system into account to ensure it processes the bytes in the same order on both.

Wictorian · October 15, 2022

2 hours ago, Eigenvektor said:

Little endian and big endian are just the way a CPU handles multi-byte numbers internally. I'm not quite sure how it factors into SHA-256 for your case (sources?), but I assume it is to ensure that two systems using the algorithm are using the same representation, so they arrive at the same hash for the same source.

For example the hexadecimal value 0A 0B 0C 0D would be represented as 0A 0B 0C 0D in the memory of a big endian system, while little endian would store it as 0D 0C 0B 0A instead. That would change the hash of it depending on how the algorithm approaches hashing the number. So the algorithm would have to take the endianness of the underlying system into account to ensure it processes the bytes in the same order on both.

I have to convert the lenght of input to big endian and store it using 48 bytes. In the example it had bunch of bytes with 8 zeros to fill the space. However it doesn’t exactly explain what I am supposed to do.

wanderingfool2 · October 16, 2022

19 hours ago, Wictorian said:

I have to convert the lenght of input to big endian and store it using 48 bytes. In the example it had bunch of bytes with 8 zeros to fill the space. However it doesn’t exactly explain what I am supposed to do.

Can you be specific on why you are needing big endian/what you are using it for? If you are trying to implement SHA256 you really shouldn't be needing to care about the endianness of the system.

e.g. Bitshifting is endianness independent. e.g 1024 >> 2 should always equal 256 whether you use little or big endian.

If you are trying to save things into a file, and trying to represent the binary data as big endian vs little endian that might be a thing...but then again from my knowledge Apple computers has been using little-endian and x86 is little endian (so unless programming for a RISC arch. then it's unlikely you would need to worry as much)....well I guess some languages do actually internally do things as big endian, but bigger thing is to learn more first before you start thinking about that kind of thing.

So, if you could be more specific we can at least guide you...as saying it's for SHA256 doesn't make much sense (internally iirc it stores values in a 32 bit unsigned ints which means again any bit shifting won't be affected by the endianness).

22 hours ago, Eigenvektor said:

Little endian and big endian are just the way a CPU handles multi-byte numbers internally. I'm not quite sure how it factors into SHA-256 for your case (sources?), but I assume it is to ensure that two systems using the algorithm are using the same representation, so they arrive at the same hash for the same source.

I'm not actually sure I've ever seen a SHA-256 even differ due to endianness; since the bit shifting is done on 32 bit unsigned ints...it's in the spec of SHA-256, so honestly I am struggling to think of anything (unless literally saving and processing the file...but then again, I think I've only really had to care about endianness maybe a few times in all the years of projects I've done)

Eigenvektor · October 16, 2022

55 minutes ago, wanderingfool2 said:

I'm not actually sure I've ever seen a SHA-256 even differ due to endianness;

That's why I asked for sources, because I've never had to account for it, but I wasn't sure I might be missing something.

mariushm · October 16, 2022

From the very little I read, sha-256 works with 64 bit (or whas it 64 bytes?) chunks even when hashing a stream of bytes , so if your last chunk is less than 64 bits (bytes?) it has to be padded with 0s

But these 64 bit chunks are kept internally in big endian no matter the architecture so if you're on a little endian architecture (ex x86, x64) you would probably read the bytes remaining and reverse them and add padding to 64 bit.

I'd suggest finding some open source sha-256 code for windows/linux and some code for mac and see the differences.

anyway, considering how little programming experience you have, why do you tackle such difficult projects (for you)?

edit: here, maybe check core / sha256.js in this open source javascript package: https://github.com/bitwiseshiftleft/sjcl/

wikipedia also has pseudocode for sha-256 on its page : https://en.wikipedia.org/wiki/SHA-2

Wictorian · October 16, 2022

4 hours ago, mariushm said:

anyway, considering how little programming experience you have, why do you tackle such difficult projects (for you)?

I am not sure if implementing sha256 is hard. And honestly I'm lost.

Wictorian · October 16, 2022

7 hours ago, Eigenvektor said:

That's why I asked for sources, because I've never had to account for it, but I wasn't sure I might be missing something.

@wanderingfool2 @mariushm I said in the previous answer. I need to convert a number to 64 bit length representation. I am not entirely sure what endianness is.

source: https://blog.boot.dev/cryptography/how-sha-2-works-step-by-step-sha-256/

Eigenvektor · October 16, 2022

1 hour ago, Wictorian said:

I am not entirely sure what endianness is.

A system's "endianness" simply tells you whether it is using big-endian or little-endian.

As you said, the number 88 converted to binary is 1011000. The 64 bit representation of that would be

63| 0000 0000 0000 0000 |48

47| 0000 0000 0000 0000 |32

31| 0000 0000 0000 0000 |16

15| 0000 0000 0101 1000 |0

For brevity such numbers are usually represented in hexadecimal instead. In this case it would be

00 00 00 00 00 00 00 58

A 32/64 bit system that is using big-endian would store the number in memory exactly this way: 00 00 00 00 00 00 00 58

The least significant bit is on the right — 63…0 (64 bit) — 63…32, 31…0 (32 bit)

On a 64 bit system using little-endian on the other hand, it would be stored as 58 00 00 00 00 00 00 00

The least significant bit is on the left — 0…63

(and on 32 bit, it would be 00 00 00 00, 58 00 00 00, since it would store this a 2x 32 bit values, each inverted)

The least significant bit is on the left — 32…63, 0…31

So the instructions are simply telling you to attach the length as "00 00 00 00 00 00 00 58" at the end.

Wictorian · October 16, 2022

5 minutes ago, Eigenvektor said:

A system's "endianness" simply tells you whether it is using big-endian or little-endian.

As you said, the number 88 converted to binary is 1011000. The 64 bit representation of that would be

63| 0000 0000 0000 0000 |48

47| 0000 0000 0000 0000 |32

31| 0000 0000 0000 0000 |16

15| 0000 0000 0101 1000 |0

For brevity such numbers are usually represented in hexadecimal instead. In this case it would be

00 00 00 00 00 00 00 58

A 32/64 bit system that is using big-endian would store the number in memory exactly this way: 00 00 00 00 00 00 00 58

The least significant bit is on the right — 63…0 (64 bit) — 63…32, 31…0 (32 bit)

On a 64 bit system using little-endian on the other hand, it would be stored as 58 00 00 00 00 00 00 00

The least significant bit is on the left — 0…63

(and on 32 bit, it would be 00 00 00 00, 58 00 00 00, since it would store this a 2x 32 bit values, each inverted)

The least significant bit is on the left — 0…31, 32…63

So the instructions are simply telling you to attach the length as "00 00 00 00 00 00 00 58" at the end.

ok I think I get it. thanks

Eigenvektor · October 16, 2022

8 minutes ago, Wictorian said:

ok I think I get it. thanks

Here's a source that might explain it a bit better: https://www.section.io/engineering-education/what-is-little-endian-and-big-endian/

The reason "endianness" needs to be specified is because some systems (e.g. AMD/Intel) natively use little-endian while others use big-endian. So if e.g. you don't specify how your size is stored and you transfer it across the network to a system with a different endianness you could run into issues.

If your machine natively stores numbers in big-endian byte order but the receiving system expects little-endian instead, it would interpret the value as a different size. If it knows the value uses big-endian byte ordering, then it can "reverse" their order to arrive at the expected value.

Wictorian · October 16, 2022

1 minute ago, Eigenvektor said:

Here's a source that might explain it a bit better: https://www.section.io/engineering-education/what-is-little-endian-and-big-endian/

The reason "endianness" needs to be specified is because some systems (e.g. AMD/Intel) natively use little-endian while others use big-endian. So if e.g. you don't specify how your size is stored and you transfer it across the network to a system with a different endianness you could run into issues.

If your machine natively stores numbers in big-endian byte order but the receiving system expects little-endian instead, it would interpret the value as a different size. If it knows the value uses big-endian byte ordering, then it can "reverse" their order to arrive at the expected value.

I dont think I have to worry about that. What I will be doing is storing the binary numbers in a text file or something.

Eigenvektor · October 16, 2022

Just now, Wictorian said:

I dont think I have to worry about that. What I will be doing is storing the binary numbers in a text file or something.

Depends on what you're doing. It's usually not going to matter as long as you work with higher level languages and use them to write and parse these values rather than going directly to memory.

However, imagine that your program is able to run on different architectures. You write the size on a big-endian system to your file as a hex string: "00 58". Let's say for speed reason you parse these as "bytes" directly into memory before you start working with them.

If the system you open the file on is using little-ending, it would then see "00 58" in memory, and interpret this as the number "58 00" instead, because it is reading right to left. So now your size is suddenly 22528 instead of the expected 80. That's a buffer overrun waiting to happen at some point. It's definitely important to make sure every system treats values the same way. So your program would have to make sure to write "00 58" into memory as "58 00", so the system interprets it as "00 58" again.

mariushm · October 16, 2022

It does matter,depending how you store the numbers in the file.

Let's say you have 3 numbers : 100 , 30000 and 500000

You could write them as you read them, with characters, either one number on each line, or as numbers separated by a character (tab, comma, |, space whatever) but this would take a lot of space :

100[space]30000[space]500000

That's 16 bytes used to store 3 numbers into a text file, and you would have a hard time reading the numbers back into your program when you need to, because you would basically have to read one character at a time until you encounter space or the end of file, and then add that character to the end of the number.

So instead of using characters why not output actual bytes to your file?

So to store 100, you would need a single byte because 1 byte (8 bits) can hold any value between 0 and 255

To store 30,000, two bytes would be enough because 16 bits can represent numbers between 0 and 65535 = you can write 30,000 as 117*256 + 48 so that's your two bytes : 117, 48

To store 500,000 you could use three bytes because 24 bits can hold any value between 0 and 16,777,216 so you could write 500,000 as 7*65536 + 161*256 + 20 so your 3 bytes are 7, 161, 20

So you could store your 3 numbers as this sequence of bytes : 100 , 117, 48 , 7, 161 , 20

But if you store them like this, you have no way of knowing where the first number ends and next number starts ... will your program read 1, 2 or 3 bytes for the first number.

So to avoid this, you can decide to use a fixed number of bytes for every number, even if you could actually use less bytes.... so let's say you decide to use 4 bytes for each number, because at some point you may have a value bigger than 16.7 millions.

Then your 3 numbers become :

100 : 0 , 0 , 0 , 100 = 0 x 256³ + 0 x 256²+ 0 x 256¹+ 100 x 256⁰ (256⁰ =1)

30,000 : 0 , 0 , 117, 48

500,000: 0 , 7 , 161, 20

and because you know any value is always using 4 bytes, you no longer have to use a separator between numbers and your data in the file would be :

0,0,0,100, 0,0, 117, 48, 0,7, 161,20

Now, the way I wrote the numbers is called Big Endian because the byte that has the most influence over the value is the one at the lowest address (at the start).

In Little Endian, the byte that has the most influence over the value will be at the highest address, so in memory your numbers will be stored like this:

100 : 100 , 0 , 0 , 0

30,000 : 48 , 117, 0, 0

500,000: 20 , 161, 7, 0

If you hold these 3 numbers into an array of unsigned 32 bit (4 byte) integers and you use the proper file write functions to write the numbers to an open file, the numbers will be written how they're stored in memory - if you're on Mac with a PowerPC processor, they'd be written in Big Endian, or they'd be written in Little Endian if on Intel cpu... if you're running your code on Windows they'd be written in Little Endian

With Python, you can tell what endian it's used by checking sys.byteorder

import sys

if 'little' == sys.byteorder:
     # little
 else:
     # big

You can write some values in a specific way to file by converting them to little endian or big endian

#Here is how to pack little < or big > endian:

import struct

# Little Endian
struct.pack('<L', 1234)
'\xd2\x04\x00\x00'
# Big Endian
struct.pack('>L', 1234)
'\x00\x00\x04\xd2'

In PHP, you would have the same issue ... let's say I have the 3 numbers into an array, then

You can tell the endianness of the system it's running on with a code snippet like this :

function isLittleEndian() {
    return unpack('S',"\x01\x00")[1] === 1;
}

and I'd write the values like this :

<?php 

$numbers = [100,30000, 500000];

$handle = fopen('output.bin','w'); // open file for writing

for ($i=0;$i<3;$i++) {
  /* see more at https://www.php.net/manual/en/function.pack.php
  L 	unsigned long (always 32 bit, machine byte order - WHATEVER THAT IS)
  N 	unsigned long (always 32 bit, big endian byte order)
  V 	unsigned long (always 32 bit, little endian byte order)
  */

  $bytes = pack('N',$numbers[$i]);	// returns a 4 byte string, number in big endian, 32 bit representaiton
  fwrite($handle,$bytes,4); // write the 4 bytes
}

fclose($handle); // close the file

?>

Quoting wikipedia :

A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest.

A little-endian system, in contrast, stores the least-significant byte at the smallest address.

edit : also worth keeping in mind that a lot of stuff uses network byte order in their specification .. for example PNG format defines all multi byte integers as network byte order (Big Endian) , JPG uses big endian , MKV video format uses network byte order

Older formats ex GIF use little endian, PCX is little endian, TGA is little endian, BMP is little endian (because they made Windows for little endian CPUs and optimized everything for least processing needed) ... older compression schemes used in DOS games use little endian a lot (because the code could just copy into a ram buffer and unpack with least byte manipulation as possible)

Wictorian · October 16, 2022

19 minutes ago, mariushm said:
It does matter,depending how you store the numbers in the file.

Let's say you have 3 numbers : 100 , 30000 and 500000

You could write them as you read them, with characters, either one number on each line, or as numbers separated by a character (tab, comma, |, space whatever) but this would take a lot of space :

100[space]30000[space]500000

That's 16 bytes used to store 3 numbers into a text file, and you would have a hard time reading the numbers back into your program when you need to, because you would basically have to read one character at a time until you encounter space or the end of file, and then add that character to the end of the number.

So instead of using characters why not output actual bytes to your file?

So to store 100, you would need a single byte because 1 byte (8 bits) can hold any value between 0 and 255

To store 30,000, two bytes would be enough because 16 bits can represent numbers between 0 and 65535   = you can write 30,000 as 117*256 + 48 so that's your two bytes : 117, 48

To store 500,000 you could use three bytes because 24 bits can hold any value between 0 and 16,777,216 so you could write 500,000 as 7*65536 + 161*256 + 20 so your 3 bytes are 7, 161, 20

So you could store your 3 numbers as this sequence of bytes : 100 , 117, 48 , 7, 161 , 20

But if you store them like this, you have no way of knowing where the first number ends and next number starts ... will your program read 1, 2 or 3 bytes for the first number.

So to avoid this, you can decide to use a fixed number of bytes for every number, even if you could actually use less bytes.... so let's say you decide to use 4 bytes for each number, because at some point you may have a value bigger than 16.7 millions.

Then your 3 numbers become :

      100 :   0 , 0 , 0 , 100   = 0 x 256³ + 0 x 256²+ 0 x 256¹+ 100 x 256⁰   (256⁰ =1)

30,000 :   0 , 0 , 117, 48

500,000:   0 , 7 , 161, 20

and because you know any value is always using 4 bytes, you no longer have to use a separator between numbers and your data in the file would be :

0,0,0,100, 0,0, 117, 48, 0,7, 161,20

Now, the way I wrote the numbers is called Big Endian because the byte that has the most influence over the value is the one at the lowest address (at the start).

In Little Endian, the byte that has the most influence over the value will be at the highest address, so in memory your numbers will be stored like this:

      100 :   100 , 0 , 0 , 0

30,000 :   48 , 117, 0, 0

500,000:   20 , 161, 7, 0

If you hold these 3 numbers into an array of unsigned 32 bit (4 byte) integers and you use the proper file write functions to write the numbers to an open file, the numbers will be written how they're stored in memory - if you're on Mac with a PowerPC processor, they'd be written in Big Endian, or they'd be written in Little Endian if on Intel cpu... if you're running your code on Windows they'd be written in Little Endian

With Python, you can tell what endian it's used by checking sys.byteorder
import sys

if 'little' == sys.byteorder:
     # little
 else:
     # big
You can write some values in a specific way to file by converting them to little endian or big endian
#Here is how to pack little < or big > endian:

import struct

# Little Endian
struct.pack('<L', 1234)
'\xd2\x04\x00\x00'
# Big Endian
struct.pack('>L', 1234)
'\x00\x00\x04\xd2'
In PHP, you would have the same issue ... let's say I have the 3 numbers into an array, then

You can tell the endianness of the system it's running on with a code snippet like this :
function isLittleEndian() {
    return unpack('S',"\x01\x00")[1] === 1;
}
and I'd write the values like this :
<?php 

$numbers = [100,30000, 500000];

$handle = fopen('output.bin','w'); // open file for writing

for ($i=0;$i<3;$i++) {
  /* see more at https://www.php.net/manual/en/function.pack.php
  L 	unsigned long (always 32 bit, machine byte order - WHATEVER THAT IS)
  N 	unsigned long (always 32 bit, big endian byte order)
  V 	unsigned long (always 32 bit, little endian byte order)
  */

  $bytes = pack('N',$numbers[$i]);	// returns a 4 byte string, number in big endian, 32 bit representaiton
  fwrite($handle,$bytes,4); // write the 4 bytes
}

fclose($handle); // close the file

?>
Quoting wikipedia :

A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest.

A little-endian system, in contrast, stores the least-significant byte at the smallest address.

edit : also worth keeping in mind that a lot of stuff uses network byte order in their specification .. for example PNG format defines all multi byte integers as network byte order (Big Endian) , JPG uses big endian , MKV video format uses network byte order

Older formats ex GIF use little endian, BMP is little endian (because they made Windows for little endian CPUs and optimized everything for least processing needed) ... older compression schemes used in DOS games use little endian a lot (because the code could just copy into a ram buffer and unpack with least byte manipulation as possible)

yeah thanks I guess I will do this then. Actually as I will be storing literal bytes allocating 1 byte is better than 3 bytes.

What's funny is that I wrote the first half of the code on Windows and was planning to continue on Mac. However my Mac has Intel cpu so it would be ok apparently.

Franck · October 17, 2022

16 hours ago, Wictorian said:

yeah thanks I guess I will do this then. Actually as I will be storing literal bytes allocating 1 byte is better than 3 bytes.

What's funny is that I wrote the first half of the code on Windows and was planning to continue on Mac. However my Mac has Intel cpu so it would be ok apparently.

The language your are using really doesn't have any bug free 20 years old version baked in ? what language is this ?

Wictorian · October 17, 2022

14 minutes ago, Franck said:

The language your are using really doesn't have any bug free 20 years old version baked in ? what language is this ?

Python

Franck · October 17, 2022

2 hours ago, Wictorian said:

Python

Unless they removed it, Python had encryption baked in 25 years ago

wanderingfool2 · October 17, 2022

https://docs.python.org/3/library/hashlib.html

Sign In

Big Endian

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account