Jump to content

How much memory do functions take?

Gat Pelsinger

As programmers, we are only said to take care about the memory of variables and arrays and vectors and lists and stuff like that. But how much memory do functions take? How does their memory management work in the background?

 

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

The code itself is usually negligible, "what a function takes" is really what you allocate in that function, i.e what you mentioned first.

Your favorite dev environment may have profiling tools / listing files to find code size.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Since you asked about specifically the function itself and not the variables inside it, we can check that pretty easily using godbolt.org. Here's an empty function and the assembly code it compiles to without optimization:

 

image.png.0f4cae1dc6669f0dfbf7b41f444338ba.png

image.png.da376a6930900ec3878846a587e4f118.png

 

The only instructions here that deal with memory are push and pop. Specifically, the push instruction pushes the value of the base pointer register onto the stack which uses 64 bits or 8 bytes on a 64-bit machine. Thus, a function call requires 8 bytes of memory on the stack at a minimum.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Gat Pelsinger said:

@dcgreen2k So, 8 bytes with unlimited code or...

Simply calling the function requires 8 bytes of memory on the stack, assuming the compiler has not inlined it. If you create any variables inside that function, the memory usage will increase.

 

If you're asking whether the instructions themselves take up space in memory - yes, they do. They will be loaded into memory when the program starts, but this space is generally negligible compared to how much space data takes up.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

@dcgreen2k of course variables is a different thing. But what I was just asking are the raw instructions that the function holds. How don't they require memory? Or they do, but very less? Or they get freed after one instruction has been completed? These is what I am confused in. Look, You got your .exe file. You gonna run it. The system is going to read the instructions, and have to put them in memory for execution by CPU, right? Or, it doesn't need to put them in memory (actually I think this one correct.)? Like look, the CPU calls the storage for reading the file because the memory says so, which the CPU itself made that memory. The storage system passes the instructions to the CPU, so like, does the CPU just right away execute it and discard that instruction when it has done what it says, or does put that instruction in memory and then perform the action?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

I see what you're asking now. Yes, the instructions themselves require memory. Those instructions stay in memory until the program stops.

 

At a simplified level, this is what the memory layout of a C program looks like:

memoryLayoutC.jpg

 

The text section is where the program's instructions are placed, and they get read from the .exe file when the program starts.

 

Why do the instructions get placed in memory instead of just being read from disk whenever they're needed? Put simply, getting them from disk would be way too slow. In fact, the instructions are often copied from memory into the CPU's instruction cache so that they can be accessed fast enough to keep the CPU fed. RAM is surprisingly slow compared to cache that's very close to the CPU.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

@dcgreen2k So, the instructions do stay in the memory till the whole program ends? Isn't this bad for optimization? Say I am running a function, and now I have executed a lot of instructions and I am well below in the function right now, and I will be not needing the instructions that were at the beginning of the function, so why are they occupying memory?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

It's because there's no way to know for sure that previous instructions won't be executed again. As a simple example, here's how a loop is written in assembly:

 

image.png.534f03e5224015596ab16bd84d0f733a.png

image.png.d4869fdba6e90913ecd6bd283279b752.png

 

Close to the bottom of the function, we can the instruction jg .L3. This means that the CPU will jump back to previously-executed instructions if the correct conditions are met. There is no way for the CPU to know this until it executes the cmp instruction, so those earlier instructions must stay in memory.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

@dcgreen2k Brooo, of course loops are a different thing. But if I just had x = 2 + 2 in the function, and have no code that will reuse that instruction, which the compiler knows, can't it optimize it and remove that instruction from the memory? Also, exactly how much memory is used in functions with some code (with example)?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

The point is if you look at the size of the code in a typical exe it's going to be a few megabytes at most, which is nothing compared to the data.

Some are larger, but that'll just be because the same file packs some data in the same file.

 

  

33 minutes ago, Gat Pelsinger said:

which the compiler knows, can't it optimize it and remove that instruction from the memory? Also

Technically it's the OS that's loading the whole code in memory, not the compiler.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

Once the functions get turned into machine code and made into an executable, the compiler can't do any more optimization. We know that instructions must be loaded into memory before they're executed, so any possibility that instructions could be removed from memory would need to be supported by both the CPU and operating system.

 

Let's see how much space a program's instructions actually take up. Here's a large program I wrote a while back. It's an IDE for MIPS assembly with a built-in interpreter and debugger.

 

image.thumb.png.ca746f80b4f95e2c58e31689317ac332.png

 

The executable file takes up 288.4kB of space on disk while it's source code takes up 194.3kB. Using the readelf -S command, I can see how much space each of the executable's sections take up.

 

image.png.cb35002f06a2af41a97425931016ce6e.png

 

The .text section is where the actual instructions reside, and they will be placed into memory when the program starts. hex 27dde equals decimal 163294, which means the instructions take up just over 163kB.

 

Keeping that in mind, let's see how much RAM this program uses, shown in the RES column below.

image.thumb.png.9003fd5ac974688b43f4a6ca8e654839.png

 

Ah. The program takes up 113MB in RAM while its instructions take up only 163kB.

 

To conclude, it would simply be a waste of time to optimize how much memory the instructions use.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

@dcgreen2k I see. But what about embedded systems who have very limited resources? Can they benefit from this? Anyways, in OOP languages, if the object of the function is destroyed, so are the instructions, right? What about non OOP languages?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Gat Pelsinger said:

Anyways, in OOP languages, if the object of the function is destroyed, so are the instructions, right?

In OOP you can have 1000 objects in RAM that each hold their own data, but the code that gets executed on all of them is common, that's the whole point. so there's only one copy of the code, and as many instances of the data as needed, and it's obviously the latter that uses RAM.

 

5 minutes ago, Gat Pelsinger said:

But what about embedded systems who have very limited resources?

Embedded will typically run code from flash memory and not be copied in RAM, but that's again an OS thing, or lack thereof.

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Kilrah said:

Embedded will typically run code from flash memory and not be copied in RAM, but that's again an OS thing, or lack thereof.

Correct, embedded systems (at least the ones I work with) usually have entirely separate memory for the program code.

Computer engineering grad student, cybersecurity researcher, and hobbyist embedded systems developer

 

Daily Driver:

CPU: Ryzen 7 4800H | GPU: RTX 2060 | RAM: 16GB DDR4 3200MHz C16

 

Gaming PC:

CPU: Ryzen 5 5600X | GPU: EVGA RTX 2080Ti | RAM: 32GB DDR4 3200MHz C16

Link to comment
Share on other sites

Link to post
Share on other sites

99% of the cases, they take up such spaces that it doesn't really matter. writing readable and functional code is more important. Writing embedded and memory-constraint applications is a different matter.

Sudo make me a sandwich 

Link to comment
Share on other sites

Link to post
Share on other sites

On embedded systems the binary is kept in the flash memory. A lot of microcontrollers will have flash "accelerators" caching or reading in advance pages of flash memory to make reading the program as fast as possible.

 

on SOME microcontrollers, it's possible to use certain keywords to keep some critical functions in memory so that some super tight part of the code can be kept all the time in ram. Some microcontrollers even have segmented ram, allowing you to put some chunks of ram to "sleep" or turn off to reduce power, keeping the portions of ram that store the critical functions working.

 

On Windows binaries are loaded in ram... in the case of very big executables (ex 500 MB game executable), if the game needs ram, the OS may dump parts of the executable to page file to free space... but a lot of the executable is not instructions, it's resources that don't have to be loaded in ram.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Given there is things such as dynamic memory management (ASLR) and “recycling” or “efficiency” and “throttling”…

that requires more context to answer

 

whilst a page file might contain more faults if the drive is slower, the fault tolerance on RAM is higher.

 

getting an elephant into a car will take longer than waiting for a lorry to transport it?

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

@dcgreen2k I just wanted to come back to this thread and ask, that your example includes a very large program, which might have really large vectors, and code that only runs in a loop and acts on the data, in which case, the instructions to data memory ratio would be huge. But I want to get results of a very small program which doesn't have much array like data, only a bit of stack allocated variables, and a lot of logical instructions. Now I want to see what the ratio will be.

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/29/2024 at 4:09 AM, Gat Pelsinger said:

@dcgreen2k Now I want to see what the ratio will be.

What kind of answer do you expect to get here?

 

If you reduce the amount of variables the program has then obviously the ratio of code to stack would would increase.

 

If you have a program that is pure logic at best it would start to approach 1:0.

 

That doesn't mean the size of code in memory isn't still small enough to be irrelevant. Especially compared to the data it may be working on.

 

Even when your program is mostly logic (say a video encoder), it might still load tons of data into heap at runtime to do it's job.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

@Eigenvektor

 

No look, I was kind of sad because of this unoptimization that instructions cannot be removed from memory when the program is still executing. But as @dcgreen2k said, the ratio between memory used by instructions and variables is big (variables use more memory), but that could be because he might have say a program, which multiplies 2 really large matrices. Now the actual instructions will be very less, running in a loop, but the data is going to be massive, taking a lot more storage. But first, I wanted to see the ratio and actual memory usage for a simple hello world program, and then a more real world scenario where you have a program that like actually does stuff and has a lot of logic and different instructions but particularly not a lot of data. And also, in his program, the instructions did take 163 KB of memory, and that will only add up the more instructions your program has. 

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

22 hours ago, Gat Pelsinger said:

No look, I was kind of sad because of this unoptimization that instructions cannot be removed from memory when the program is still executing.

Unoptimization implies somebody decided to make it worse on purpose. If anything, you could call it a lack of optimization. But I dare say that is far from the truth. It is a classical tradeoff. The common implementation is optimized towards less complexity and faster execution time.

 

Spoiler

Loading a function into memory on demand means there is a pause any time a new function is called, before it is loaded from disk. This would add unnecessary I/O, plus regularly fetching small amounts of data is fairly inefficient (due to overhead). You could mitigate some of the drawback with a pre-fetcher that loads functions that may soon be needed. But that requires additional logic and a look-ahead to determine which functions are likely to be called in the near future.

 

Unloading functions from memory is even more complex, because you need logic that can determine which functions can be safely removed. Ideally you would also want a caching algorithm to avoid constantly loading and unloading the same function over and over again, e.g. when a function is called in a loop.

 

And as you should know, loading and unloading small bits of data quickly causes memory fragmentation, which is another can of worms.

 

Furthermore, who is responsible for loading/unloading these functions? The CPU, the OS, some runtime, the program itself? You need a layer that has insight into your program's code as it is executing to determine what needs to be loaded/unloaded, reserve or free memory, then perform the actual loading.

 

I'm sure there are more things I didn't consider that make this a bad idea for most use cases. It is much simpler to load and cache the program as a whole on startup and unload it when it stops running.

 

If you, as a developer, know there's a part of your program that has a large memory footprint and is rarely needed, you can always perform that optimization on your own. Split your program into multiple executable files where the main executable is responsible for invoking the others as needed. Of course that comes with its own form of tradeoff, since you'll likely need to use IPC.

 

Android does this to some degree. If an Activity goes into the background, that OS may kill (and recreate) it at any time. But that is much simpler than doing it at the function level, because there is a well defined entry and exit point (Activity visible?). However, it is still something a developer must be aware of. If you think you can cache something in a static variable, think again, because it may be null the next time around.

 

22 hours ago, Gat Pelsinger said:

But first, I wanted to see the ratio and actual memory usage for a simple hello world program, and then a more real world scenario where you have a program that like actually does stuff and has a lot of logic and different instructions but particularly not a lot of data.

The more logic your program contains, the more variables it is likely going to need. On average the ratio will probably be fairly similar for most typical programs.

 

For a simply hello world program, at the very least you'll have the string "Hello world" on the stack. Printf will need to loop over it at least once, to check for placeholders such as %s. So it will temporarily create at least one pointer on the stack, while the function is in scope.

 

I would say a video encoder is a good example of a program where the ratio can vary wildly. The encoder should contain very little hard coded data and is mostly pure logic. The data to work on is loaded from disk into memory as needed, converted, then written back to disk (and possibly cached).

 

If you encode a tiny video file, the ratio of logic to data will trend towards 1:0, if you encode a multi-terabyte video it'll trend towards 0:1. That doesn't mean it matters either way, because the amount of memory used by its functions is effectively irrelevant on a machine with gigabytes or memory.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

Program memory will either be constrained or it won't. On any consumer oriented operating system these days, it's not going to matter. For more specialized purposes where it does matter (in which case the data memory will have its own constraints completely independent of program memory constraints), you'll either be forced to adhere to those constraints, or you'll need to utilize functionality built into the architecture to let you work around it.

 

Writing firmware for an IoT device is a good example were your program memory will be constrained. The system will have a limited ROM space where your program will live, and it will need to fit within that space.

 

Another example would be something like the NES or N64. Both of these systems' architecture gave the developer more control over how memory was used. On the N64, there was something called a TLB (Table Lookaside Buffer). Using this, a developer could move program memory around, loading and unloading it as needed and then pointing to it using the TLB.

 

The NES had something similar. The console itself only supports 40KB of ROM (only 32KB of which is reserved for program memory), yet there are games far larger than this that work just fine (Kirby's Adventure has 512KB of program data). This was achieved using an MMC (Multi-Memory Controller) chip. Similar to the N64's TLB, this let the programmer point the NES's 32KB of program memory to different banks of ROM storage.

 

TL;DR: If your program memory matters, there will be specific and well-defined reasons that it matters and/or platform specific mitigations for it.

Link to comment
Share on other sites

Link to post
Share on other sites

@Eigenvektor

 

I agree with your opinion. But in my view, not needed memory is waste of memory, and memory is expensive. It is okay if using more memory makes your code run faster, but the memory that you will just not need afterwards is waste. The overhead you mentioned might be true but it doesn't seem as dramatic as you expressed, and upon that, the programmer is the one who will look after this optimization and the system doesn't need to care about it, so if a function is deallocated and it is called in the program later, the program is free to crash.

 

Anyways, I cannot change what has been a standard for more than 40 years. As a programmer, if I really want to optimize my code at that degree, which is equivalent to slamming an axe into my foot because I am a C programmer and CPU and memory are my overlords, what are the ways I can get rid of such memory that I do not need? I heard earlier reading from DLLs can help me with this? And one more question I have, if a stack allocated variable goes out of scope, does it get deallocated? Does that mean I can create multiple code blocks (which I didn't even know you could do in C. I thought code blocks are only used for C keywords like for, if, while, etc), and the memory will be freed after the block exits?

Microsoft owns my soul.

 

Also, Dell is evil, but HP kinda nice.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×