Jump to content

84% PERFORMANCE INCREASE?

Gat Pelsinger
Go to solution Solved by Eigenvektor,
59 minutes ago, Gat Pelsinger said:

@Eigenvektor @Kilrah

So that's it? I can allocate about 7 variables in the registers, which always might change depending on the platform, OS and the situation?

More likely compiler and target CPU architecture, but yes.

 

As you said, the keyword is deprecated, which means future compilers might complety ignore it or no longer support it altogether (i.e. refuse to compile because the keyword is unknown)

 

Quote

Will I be able to allocate more if I use a smaller size variable?

Possible, but again depends on how the compiler treats the code.

 

You should probably try this with a more complex algorithm. And set the compiler to full optimization.

 

It should already use registers if it thinks it appropriate.

 

Incrementing a register's contents is naturally quite fast, most modern CPUs should be able to do this in one clock cycle. With vector instructions you could possibly even do it for multiple at once.

Using the register keyword in C which let's us store a variable in a CPU's register gave me an 84.6% performance improvement. But my code is dumb. It just increments a ULL variable, so the ratio between compute bound and memory bound is going to be huge and so I wasn't surprised. But my questions are that there are so many algorithms and data structures where memory optimization is necessary, so why have I never seen anyone using this also so called "deprecated" keyword, when it gives clear performance benefits? And my other question is that how many register variables can I create? A CPU doesn't have as many registers as cells in RAM, right?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

Because it would be extremely difficult to optimize any reasonably large program that way. You might just as well be writing assembler if you do that. It also makes the code less portable. Ideally any modern optimizing compiler should take care of such things for you already.

 

There are very few registers and they are 8, 16, 32 or 64 bit in size (x86_64). E.g. there's 8x64 bit general purpose registers on x86_64, that can be subdivided into 32, 16 or 8 bit registers. This is extremely tiny compared to the gigabytes of RAM you have available today.

 

Here's an overview of the registers available on modern x86_64: https://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

@EigenvektorBut I didn't get any compiler warnings or errors when I allocated many register variables at once. Probably it is switching those variables between the CPU and the memory? I probably have to do performance testing to see what really happens.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Gat Pelsinger said:

@EigenvektorBut I didn't get any compiler warnings or errors when I allocated many register variables at once.

How many is "many" and what size were they? Technically a 64 bit register like RAX can hold up to 8x8 bit values. While you can only directly access two of these (AL/AH) you can use bit shift operations to make full use of it. But that adds a lot of mental complexity/overhead to keep track of when you want to write anything remotely non-trivial.

 

Also keep in mind that the "register" keyword is just a hint for the compiler that you would like to store that value in a register. The compiler is free to ignore you. A modern compiler will generally be better at figuring out which value is the best to keep in a register (assuming you've turned on optimizations), so the performance difference is most likely negligible once you write more complex things (and you actually know what you're doing, otherwise I would expect the compiler's code to outperform you, by a lot).

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Also if you really want to understand what happens you should... look at the assembly output of the compiler to check if it really did what you think it did, or what it did if you don't think it matches.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

@Eigenvektor @Kilrah

 

So I allocated like 51 unsigned long long variables with register keyword and wrote all the while loops for each of them (thankfully chatgpt helps with automation). Note that I am using time.h to measure the performance. Looks like the first 7 variables are allocated in registers as their runtime lasts about 2.4 seconds, except the 5th variable for some reason which lasts 4.9 seconds, and this was consistent every time I ran my program, so something was going on here on the 5th variable. But after the 7 variables, the execution time jumps to 12.4 seconds for all the rest of the variables, which I think is the same time I had got without using register keyword. So that's it? I can allocate about 7 variables in the registers, which always might change depending on the platform, OS and the situation? Note that I was using an unsigned long long variable. Will I be able to allocate more if I use a smaller size variable? Also if you are curious, all my program does is iterate a variable to 10 billion.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, Gat Pelsinger said:

@Eigenvektor @Kilrah

So that's it? I can allocate about 7 variables in the registers, which always might change depending on the platform, OS and the situation?

More likely compiler and target CPU architecture, but yes.

 

As you said, the keyword is deprecated, which means future compilers might complety ignore it or no longer support it altogether (i.e. refuse to compile because the keyword is unknown)

 

Quote

Will I be able to allocate more if I use a smaller size variable?

Possible, but again depends on how the compiler treats the code.

 

You should probably try this with a more complex algorithm. And set the compiler to full optimization.

 

It should already use registers if it thinks it appropriate.

 

Incrementing a register's contents is naturally quite fast, most modern CPUs should be able to do this in one clock cycle. With vector instructions you could possibly even do it for multiple at once.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×