Jump to content

C - Variable storage size (bytes)

Hi P
Go to solution Solved by Mira Yurizaki,

char, short, int, and long were the original names of integer data types. int and long can be different depending on the implementation of the compiler as int is only guaranteed to be at least 16-bit and long is only guaranteed to be at least 32-bit. That is to say the short and int data types can mean the same thing.

 

In any case, you shouldn't be using the original names for the data types. You should be using the C99 names as they are clear of the size and signage of the data type.

Correct me if I'm wrong, but aren't data types such as short int used for embedded systems? (due to memory limitations, I think)

 

With that in mind, is there any reason to ever use such data types for desktop development? if so, under which scenario would it be useful?

 

Thank you :)

 

Link to comment
Share on other sites

Link to post
Share on other sites

char, short, int, and long were the original names of integer data types. int and long can be different depending on the implementation of the compiler as int is only guaranteed to be at least 16-bit and long is only guaranteed to be at least 32-bit. That is to say the short and int data types can mean the same thing.

 

In any case, you shouldn't be using the original names for the data types. You should be using the C99 names as they are clear of the size and signage of the data type.

Edited by Mira Yurizaki
Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Hi P said:

Correct me if I'm wrong, but aren't data types such as short int used for embedded systems? (due to memory limitations, I think)

 

With that in mind, is there any reason to ever use such data types for desktop development? if so, under which scenario would it be useful?

Driver-development, for example, and interacting with certain kinds of devices. Emulation. Virtual machines. Hell, even just for saving some memory -- imagine e.g. having a large database with 500 million records; if all those records were 32bit integers, the database would weigh in at 2TB in size (plus whatever header-magic and such one would need in addition), but if one only needed e.g. 16bit integers for those records, one could immediately drop the database's size in half!

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Mira Yurizaki said:

char, short, int, and long were the original names of integer data types. int and long can be different depending on the implementation of the compiler as int is only guaranteed to be at least 16-bit and long is only guaranteed to be at least 32-bit. That is to say the short and int data types can mean the same thing.

This is why I, personally, always use the types that specifically say the variables' size, like e.g. uint32_t, or int8_t. I really, really do not like the ambiguousness of using int and such.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

Yes, useful for defining various structures (here's an example), for arrays of bytes or fixed length records, storing strings ex utf-16 in windows...

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, WereCatf said:

This is why I, personally, always use the types that specifically say the variables' size, like e.g. uint32_t, or int8_t. I really, really do not like the ambiguousness of using int and such.

That's interesting and it actually makes a lot of sense, is that a good coding practice or just a habit of yours?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Hi P said:

That's interesting and it actually makes a lot of sense, is that a good coding practice or just a habit of yours?

Both.

Hand, n. A singular instrument worn at the end of the human arm and commonly thrust into somebody’s pocket.

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Hi P said:

That's interesting and it actually makes a lot of sense, is that a good coding practice or just a habit of yours?

I edited my response.

 

C99 defined unambiguous integer data types for the very reason the original names were convoluted. For example the following can mean a signed 16-bit integer

  • short
  • short int
  • signed short
  • signed short int
  • signed
  • int
  • signed int

Whereas the new C99 naming convention only has:

  • int16_t
Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Hi P said:

Correct me if I'm wrong, but aren't data types such as short int used for embedded systems? (due to memory limitations, I think)

 

With that in mind, is there any reason to ever use such data types for desktop development? if so, under which scenario would it be useful?

 

Thank you :)

 

Types like short int aren't used much anymore indeed. If you need a fixed-width integer for some reason then use the C99 fixed-width integers as already mentioned. (But keep in mind these are optional - For example, if CHAR_BIT > 8, then there is no (u)int8_t, because it would have to have padding bits and the fixed-width integer types are defined as having no padding.)

 

However, plain int is often used (as it should.)

The reason int has no fixed width (but must be able to hold at least  [−32,767, +32,767]) is that it gives implementers the opportunity to give int a width that is handled best by the target platform. For example, if the standard had forced int to be 32 bit, then all 16 bit machines would be inefficient at working with ints. Vice versa, if the standard had forced int to be 16 bit then any 32 bit platform that cannot handle unaligned access would have to constantly mask values.

 

Now, compiler implementers just make int on the 16 bit platform 16 bit and 32 bit on the 32 bit platform.

 

Thus, one should use int wherever he can, when sure int will be large enough to hold all possible values. Many programmers for desktop applications assume int to be at least 32 bit (the google style guide also does) as it has been 32 bit on PC's for the longest time and the chance that a modern desktop application would ever have to be ported to some small 16 bit system is unlikely. Only on 8-bit systems, such as tiny microcontrollers, does this not apply, because even the smallest possible allowed int width is too much for a 8-bit system to handle efficiently.

 

  • Use signed int wherever you can, you may assume int is at least 32-bit for desktop platforms [−2,147,483,647, +2,147,483,647].
  • If you get any unsigned values (returned from a API-function call or whatever) safely convert it to signed asap.
  • If you need the extra signed bit, use a larger datatype instead, don't go unsigned.
  • Only use unsigned and fixed width types when there's a reason, for example: when doing bitwise operations.
Link to comment
Share on other sites

Link to post
Share on other sites

@Unimportant: You raise some valid points, namely that the size of int (and unsigned int) have been left implementation defined for a reason, but I think your suggestions overshoot by quite a bit.

E.g.:

Quote

If you get any unsigned values (returned from a API-function call or whatever) safely convert it to signed asap

Why? If the API returns an unsigned value it's likely because there's no possible way for the value to be negative (size, number of elements, age, there's a whole lot of types/values out there that logically never can be negative). Why would you convert them to a less fitting (and possibly slower, as unsigned integer arithmetic can be faster in some cases) data type? Also how do you go about converting the value safely? To convert safely from signed to unsigned you'll always need the next bigger signed type or make some assumptions about the possible/reasonable range of return values you expect. Which opens a whole new can of worms, like safely figuring out what the next biggest sized signed type is (the standard makes no guarantee that a long int is actually bigger than an [unsigned] int) or handling the possible error case that the conversion was not possible.

 

Ideally we'd all be using (u)int(8|16|32|64|...)_fast_t types when caring about performance while having requirements about the possible range of values, as these are supposed to be the fastest data-type large enough to handle the requested size (but may be larger), but unfortunately these also come with their own set of problems like not necessarily being correct for compatibility reasons.

 

In the end, there's no 100% clear cut solution. Especially not shunning unsigned integers completely.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, zhick said:

Especially not shunning unsigned integers completely.

Working with embedded systems, it would make no sense to shun anything. If I was working with a range of values that just happens to be say 0-150, I'm not going to throw it in a 16-bit value to store it when an unsigned 8-bit value would do.

 

Yes it's a byte. But bytes add up when your memory size is less than L1 cache on desktop processors.

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Mira Yurizaki said:

Working with embedded systems, it would make no sense to shun anything. If I was working with a range of values that just happens to be say 0-150, I'm not going to throw it in a 16-bit value to store it when an unsigned 8-bit value would do.

 

Yes it's a byte. But bytes add up when your memory size is less than L1 cache on desktop processors.

Mind if I jump-in with a question?

 

Couple days ago I tried to store an integer in an int8_t in C++, turns out when I attempted to print the value it was printing its ASCII value (character)

 

How do you properly store an integer value in an 8-bit variable? (in C++)

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Hi P said:

Mind if I jump-in with a question?

 

Couple days ago I tried to store an integer in an int8_t in C++, turns out when I attempted to print the value it was printing its ASCII value (character)

 

How do you properly store an integer value in an 8-bit variable? (in C++)

It was probably treating it as a character, because int8_t is usually typedef'd from unsigned char.

 

If you want to print it out as a number and you were using cout, prepend the variable with a +.

 

EDIT: That may work, but you can also find other ways to do it like casting it to an int and I realized that's not the most intuitive method.

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, zhick said:

Why?

Several reasons:

1) The arithmetic conversions make it so that mixing signed and unsigned is the source of a whole class of bugs.

Let's take the classic example:

unsigned int i = 1;
if (i < -1)
{
	std::cout << "Apparantly, 1 is smaller then -1";
}

Which prints:

Quote

Apparantly, 1 is smaller then -1

Or:

unsigned int j = 1;
int i = 0; //<-signed!
std::cout << i - j;

Which prints:

Quote

4294967295

And does not give warnings on gcc, even with -Wconversion set.

Both of these problems caused by the fact that the signed value will be converted to unsigned prior to the operation.

 

Of course, if the signed type is wider the rules change:

unsigned int j = 1;
long long i = 0; //<-still signed!
std::cout << i - j;

Which prints:

Quote

-1

as expected.

 

Of course, in some small trivial examples like these it's easy to see what's going on and where the problem is. But in a real application, where the hardcoded numbers might themselves by variables or equations, it's easy to lose track quickly and introduce bugs.

 

2) Unsigned ints model modular arithmetic, not non-negative integers.

When it's not okay to have negative numbers, having wraparound is also not okay in most cases. It's much safer to have underflow at INT_MIN, a value we won't even get close to most of the time, then to have it below 0, a value we work near all the time. This ties in with:

16 hours ago, zhick said:

If the API returns an unsigned value it's likely because there's no possible way for the value to be negative (size, number of elements, age, there's a whole lot of types/values out there that logically never can be negative). Why would you convert them to a less fitting (and possibly slower, as unsigned integer arithmetic can be faster in some cases) data type?

Which sounds all fine and dandy until you need the difference between two things that can never be negative.

Imagine we're writing some code to regulate the speed of a electric motor. Also imagine the motor can only spin one way. It's mechanically impossible to spin the other way so we don't need negative speed values. Let's document that using unsigned values - you get gems like this:

unsigned int targetSpeed = 3000; //RPM
unsigned int measuredSpeed = 3005; 
std::cout << "We need to adjust speed by " << targetSpeed - measuredSpeed << " RPM"; 

Which prints:

Quote

We need to adjust speed by 4294967291 RPM

Yes, you can code around this by checking which number is bigger first, and then only subtracting the small number from the big number, and then .... Does not sound like you're making the code any clearer and more maintainable this way...

 

There's a reason pointer arithmetic returns a ptrdiff_t, which is a signed type.

 

3) It's easier to debug.

It's much better to catch some value that should not be negative being negative red-handed then to have it wrap around to some huge value and possibly escape detection.

Note that using unsigned types to document that a value should not be negative does not enforce anything:

void
i_only_take_unsigned_values(unsigned int i)
{
	std::cout << "But not really, I just turn them into a really large number! " << i;
}

int 
main()
{	
	i_only_take_unsigned_values(-1);	

	return 0;
}

Which prints:

Quote

But not really, I just turn them into a really large number! 4294967295

So it does not really help much. But worse, you've lost the information you needed to catch the bug, namely that a negative number was passed originally.

Why not just accept signed values and actually check ?

void
i_only_take_unsigned_values(int i)
{
	MyAssert(i >= 0); //Throws when condition not met.

	//... 
}

C++20 Will give us some nice new toys in the form of contracts to improve even further.

 

16 hours ago, zhick said:

Why would you convert them to a less fitting (and possibly slower, as unsigned integer arithmetic can be faster in some cases) data type?

In most cases, signed arithmetic is faster, because signed overflow is undefined behavior, while unsigned overflow is perfectly defined.

The fact that signed overflow is undefined opens up a whole range of optimization possibilities for the compiler.

 

16 hours ago, zhick said:

To convert safely from signed to unsigned you'll always need the next bigger signed type or make some assumptions about the possible/reasonable range of return values you expect.

The latter makes sense in a whole lot of cases. If I have a std::vector that holds all the open documents for my application, is it any problem if I safely cast the vector size to int ? Having 2+ billion documents open at a time is ridiculous and won't ever happen. For things that can truly be large there's int64_t. Stuff even bigger then that is probably something very special that would benefit from using some very large number class anyway.

 

Safe casts are available pre-made, for example gsl::narrow<> from the guideline support library.

 

Some people will argue one should just use signed values when required and unsigned values elsewhere, and "simply" don't mix them. Which is poor advice because in a real program, values and their types propagate throughout the program. And somewhere, someplace those values will meet in some comparison or equation.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Mira Yurizaki said:

Working with embedded systems, it would make no sense to shun anything. If I was working with a range of values that just happens to be say 0-150, I'm not going to throw it in a 16-bit value to store it when an unsigned 8-bit value would do.

 

Yes it's a byte. But bytes add up when your memory size is less than L1 cache on desktop processors.

Fair enough, on systems that small the application is probably so small anyway that it's still manageable to keep track of all the things and avoid the bugs I've described in my previous post. Lot's of "best practices" go out the window on tiny systems because memory is too precious, but luckily with tiny systems come tiny programs.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Hi P said:

Mind if I jump-in with a question?

 

Couple days ago I tried to store an integer in an int8_t in C++, turns out when I attempted to print the value it was printing its ASCII value (character)

 

How do you properly store an integer value in an 8-bit variable? (in C++)

It did properly store the integer in the 8-bit variable (assuming it fits).

It's just that operator << for streams has overloads for all basic types. Since int8_t is often a alias for char, it chose the overload to print a char.

Static cast it to int to have it call the int overload.

int8_t i = 5;
std::cout << static_cast<int>(i);

 

Link to comment
Share on other sites

Link to post
Share on other sites

Using small types can be advantageous for performance. Consider a large array of values that you iterate over, if you can utilize a smaller type, many more array entries fit into a single cache line, resulting in fewer memory fetches and cache misses -- which can be a huge detriment to performance.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...
On 7/20/2019 at 4:04 AM, ShaneC said:

Using small types can be advantageous for performance. Consider a large array of values that you iterate over, if you can utilize a smaller type, many more array entries fit into a single cache line, resulting in fewer memory fetches and cache misses -- which can be a huge detriment to performance.

Also, since modern practices (cstdint) basically forces you to be specific in regards to bitcount AND signage, there is no reason to not optimize this everywhere you can. I always think ahead and use types with boundaries that I know I wont exceed (even in future), but also no more than that. It's very easy to do, and if you do it everywhere, you can be sure to have some measurable gain, as well as produce much cleaner API's (dont you just hate it when API's use signed integers for indices?).

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Fredrik Svantesson said:

dont you just hate it when API's use signed integers for indices?

No, see my post above.

 

Even the comity and Bjarne Stroustrup have repeatedly admitted the STL got it wrong when they used unsigned for subscripts and sizes. (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf)

 

Quote

The original use of unsigned for the STL was a bad mistake and should be corrected (eventually).

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Unimportant said:

No, see my post above.

 

Even the comity and Bjarne Stroustrup have repeatedly admitted the STL got it wrong when they used unsigned for subscripts and sizes. (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf)

 

 

 

Thanks for the perspective! I guess I still disagree though, even if it means I'm disagreeing with Stroustrup himself. Didn't read the document you posted, however your reasons  seem to basically come down to laziness. Just as you type, the difference between two unsigned integers can be found by getting the max of the two numbers and subtracting the other from that one. Now you could argue that it's "tedious", however I just feel like it's the proper way to do things.

 

I completely disagree that it would somehow automatically make code unclear; most big codebases have some kind of math functions already (especially for getting min/max), writing a small function that basically does

uint64_t diff(uint64_t x, uint64_t y) { return max(x, y) - min(x, y); }

 is pretty clean and clear imo.

Link to comment
Share on other sites

Link to post
Share on other sites

A side of me was thinking after reading some comments about this is... why bother with data types of multiple sizes or signage? Just make the data type size equal to the word size of the processor and use only one signage convention. Or even better, just use a single data type for a number.

 

(I'm not really advocating this, by the way).

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Mira Yurizaki said:

A side of me was thinking after reading some comments about this is... why bother with data types of multiple sizes or signage? Just make the data type size equal to the word size of the processor and use only one signage convention. Or even better, just use a single data type for a number.

 

(I'm not really advocating this, by the way).

Maximizing performance/size (which can sometimes be the same thing).

 

If you don't need a 32 bit integer, then using one is wasting space in both the cpu's cache.  And if you decide to code some parts of your code in assembly, you can get really creative with how you store data in registers to squeeze out extra performance.  Remember,  while the cache is faster to access than main memory, it still takes time to access.

 

Obviously with the rise in higher level languages optimizations like that don't occur as often as they used to.  But if you're looking to improve the performance of your program that is one avenue that you'll likely eventually consider.

 

Probably the most common way different sized data types are used today that affect people day to day (even if they don't know it) is with databases.  Wasting storage space can add up really, really quickly, and that storage can be expensive.  Databases have to know exactly how data is written on disk so that it can read records in quickly.

 

 

Plus there's just the fact that if you're working with lower level languages, that's half the fun.  I work as a web developer, but I've been using C/C++ long before that just because I love knowing how computers work behind the scenes. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, JacobFW said:

-snip-

I appreciate the insight, but my post was a mostly rhetorical question type argument.

 

I've worked with embedded systems. Systems that have less RAM than your typical CPU has L1 cache. So I understand the need to have variable data sizes.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×