How Programs Remember: Stack and Heap

#unsorted

Introduction

Learning high-level programming languages usually involves the concept that “unused variables are automatically garbage collected”. However, as one gets closer to the lower levels or starts exploring performance issues, it becomes apparent that the core memory concepts of Stack and Heap are not so clear. How do programming languages allocate and manage memory? Specifically, what does garbage collection (GC) do?

Why Do We Need Stack and Heap?

Programs need to store variables in memory while the operating system, acting as a mediator between the program and hardware, uses Stack and Heap to manage that memory:

Efficiency: Modern CPUs use caches to store frequently accessed memory segments. When memory changes, the cache must be updated, which incurs a cost. From a performance standpoint, the more contiguous and compact the data in memory, the higher the CPU cache hit rate, leading to better performance. Using the predictable memory allocation of the Stack allows for extremely high access efficiency.
Flexibility: For data that needs to change dynamically and for large storage demands, the flexible storage characteristics of the Heap complement the weaknesses of the Stack.

Stack

The Stack is a “last in, first out” data structure used to store information during function execution. Imagine a stack of pancakes eaten from the top downward, with the last pancake put on the stack being the first to be eaten, and content is modified through push (adding) and pop (removing) operations. The principle is as follows:

The system creates a new Stack Frame on the Stack.
It places the function’s parameters, local variables, return address, register state, etc., into it.
After the function completes, the entire frame is popped off.

Advantages:
- Fast and Frequent Operations: Structurally, there’s no need to find positions or determine whether data is still in use, leading to high operational efficiency due to predictable outcomes.
- Compact Storage: Efficient use of memory leads to convenient caching.
Disadvantages:
- Inability to Handle Dynamic Size Data: Structurally, it is challenging to elegantly handle data with dynamic sizes.
- Limited Space: Stack requires fixed size allocation, usually several MB, which may lead to Stack Overflow in cases of large data or recursion.

Heap

Process
 ├─ Thread A → Stack A
 ├─ Thread B → Stack B
 └─ Heap (shared)

The Heap is an area of memory used for storing “dynamically allocated data”. It does not have strict entry/exit order constraints and can adopt different allocation strategies (First Fit, Best Fit, Worst Fit).

Due to the Stack’s smaller fixed size and the need to compactly push data during writes, the Stack typically only holds pointers to the Heap, with actual data allocation occurring on the Heap to avoid small Stack capacity and lifecycle limitations.

Advantages:
- Flexible Writing and Release: Can dynamically request more space from the OS.
- Can Store Data That Lives Across Multiple Functions: Since the Stack has fixed capacity and limited data lifetimes, data that needs to persist across function calls or sizes that cannot be determined at compile time is usually allocated on the Heap, with the Stack only saving pointers to the Heap.
Disadvantages:
- Slower Allocation Speed: Requires searching for suitable memory blocks to store data.
- Requires Additional Management Mechanisms: In high-level languages, this is usually handled by garbage collectors (GC). In C, it is done through:
  - malloc / calloc / realloc: Allocating memory from the Heap.
  - free: Releasing allocated memory.
- Can Cause Memory Fragmentation: Long-term allocation and deallocation of differently sized blocks can create non-contiguous available space.

Go: Observing Variables Escaping to the Heap

The Go compiler can use escape analysis to decide whether variables should be allocated on the Stack or Heap. If the compiler cannot prove that a variable will not be used after a function returns, the variable will “escape” to the Heap:

go build -gcflags="-m" main.go

./heap-test.go:5:5: moved to heap: x
./heap-test.go:11:5: moved to heap: count
./heap-test.go:20:9: moved to heap: arr
./heap-test.go:27:12: 100 escapes to heap

package main

import "fmt"

// Example 1: Returning a pointer to a local variable
func createPointer() *int {
    x := 42  // x is initially on the stack
    return &x  // Returning the pointer causes x to escape to the heap
}

// Example 2: Variable being captured by closure
func createClosure() func() int {
    count := 0  // count escapes to the heap
    return func() int {
        count++
        return count
    }
}

// Example 3: Variable too large
func largeArray() *[10000]int {
    var arr [10000]int  // Too large, escapes to the heap
    return &arr
}

// Example 4: Interface type
func useInterface() interface{} {
    x := 100  // Assigning to interface causes escape
    return x
}

func main() {
    // Example 1
    ptr := createPointer()
    fmt.Println("Pointer value:", *ptr)

    // Example 2
    counter := createClosure()
    fmt.Println("Count:", counter()) // 1
    fmt.Println("Count:", counter()) // 2

    // Example 3
    arr := largeArray()
    fmt.Println("First element of array:", arr[0])

    // Example 4
    val := useInterface()
    fmt.Println("Interface value:", val)
}

Site Search

How Programs Remember: Stack and Heap

Introduction

Why Do We Need Stack and Heap?

Stack

Heap

Go: Observing Variables Escaping to the Heap

Further Reading