Stack or heap? :confused:

Back then when I first started writing programs in C, I honestly was always confused about why sometimes I had to manually allocate and deallocate memory for some of my variables, while in most other cases I just declared a type and an identifier and I was done right then and there. Also, I never quite understood why in some languages (like Java, JavaScript, C#) I never had to do such a thing. However, recently I dug deep and tried to understand what really was going on under the hood, and it took me right back to how a program really stores it’s execution data, using a stack and a heap.

_config.yml

Figure 1: Simplistic representation of the stack and heap in a typical program

Understanding the differences between the two, in terms of the execution of a program is crucial in understanding the memory management that makes all the magic possible. Note that in general, the stack is used for static memory access, while the heap is used for dynamic memory access, and that both of them usually live in the RAM (except in cases where the heap is big enough to need some space on the hard disk).

The stack is basically sort of like a specialized butler that a thread of execution within our program uses for keeping track of what, how and when to execute certain block of functions. The variables associated with specific functions along with some other book-keeping data is stored within the stack. The access to this type of storage is very fast and the allocation within this memory structure is done at compile time. Note that a stack is a “LIFO” (last in, first out) data structure, which is specially managed and optimized by the CPU directly. Every time, when a function is called (and yes this includes the main() too), a block at the top of the stack is reserved with all the variables associated with that function. Right after the execution ends for that function, the respective block along with all its data is popped off the stack (and this space is now free for reservation by another function anytime now) and the stack continues the thread of execution onto the next reserved block in the LIFO queue. This is the reason why we have specific scopes for variables within our programs. And why when a function has returned, all the variables within its scope are no longer accessible. Note that there is a way to keep the variable around by using the static keyword while declaration if needed. But note that it will still ever be accessible by the function it is scoped within.

Now the heap, is basically a not-so-specialized and very generic region of the memory, which is not really managed by the CPU directly. It is a somewhat free-for-all region of memory in terms of access from all the threads of execution within our program. Note that the size of this memory region is significantly bigger than the stack, hence you will see people use them when variables are expected to be containing large data structures and changing on the fly during the program execution. Note that while using the heap it is the responsibility of the program itself to allocate and de-allocate the blocks of memory required. This is done by calls like malloc() and calloc() in C, along with free() for avoiding memory leaks. Now, since the heap is not really restricted by size (it is though sort of initialized at compile time with a specified size), it can grow and shrink based on the program logic and the data structures it’s storing. This makes the access to such memory region slower, remember those pointers, yes they are slower! And finally unlike the stack, the scope of the data stored in the heap is global and is accessible by the whole program hence it is perfect for situations where there is crucial multi-thread interaction.

But wait…what to use and when?

Before you decide what to use in a specific situation, it would be helpful to know what exactly are the manifest differences between the two in terms of performance, usability, limits and efficiency. As noted above allocation and access to the stack is generally faster than using the heap. This is because the access routines make allocation and de-allocation of the memory very easy (since it’s just a pointer being incremented or decremented). Also, there is a lot of repetition in terms of how the individual bytes within a stack are used, and hence processor caching helps make things faster too. While with a heap there is a lot of overhead, during the allocation and the de-allocation. And also since heap data is globally accessible it needs to be synchronized for all the execution threads of a program. However, since the allocation within the heap is dynamic, there are no virtual size limits, and that also makes resizing variables easier (like using realloc()). Note that with a stack variable there is no explicit need for de-allocation since the CPU handles that (based on the declarations during compile time). This however, does mean that there is a limit on the stack size and resizing of the variables is not possible. Also, since by nature a stack is more organized, unlike in a heap which can be very fragmented with no specific structure, the use of memory space is very efficient which helps with the access speed.

So based on all these characteristics and their implications, how do you decide what to use in a specific scenario? Well, I personally have some general rules that I use most of the time. If I need a large block of memory (for a very large data structure), and need access to that from a global scope (sometimes local if it’s to big for a stack), then I go with the heap. On the flip side if the variable sizes are relatively smaller, and are only needed for specific function blocks then I stick with the stack allocation. In cases where, the size is really an issue (and you are worried about stack overflow), just stick with the heap. Also, note that if you are changing the variable dynamically during the program execution (like a long array being constantly changed), it would be better to stick with allocating memory on the heap and manually managing it using the language functions available for negotiating the space with the OS.

TL;DR

Stick with the stack, if you are not dealing with large dynamic variables (like big arrays and structs), but if you need a dynamically allocated variable with global access (and you are fine with taking a hit on the access speed), then use the heap. When using the heap make sure to de-allocate memory to avoid memory leaks. Be careful with large variables on the stack, they can cause stack overflow.

Written on February 13, 2019