Skip to main content

How Programs Execute: CPU, RAM & Memory Management

Overviewโ€‹

When you run a program, an intricate dance occurs between the CPU, RAM, and storage. This document explains exactly how your computer transforms code into actions.

Computer Architecture Fundamentalsโ€‹

CPU Architectureโ€‹

Inside the CPUโ€‹

Key CPU Componentsโ€‹

ComponentPurposeSpeed
RegistersStore immediate data for processing1 cycle (~0.3 ns)
L1 CacheFirst-level cache, fastest memory3-4 cycles (~1 ns)
L2 CacheSecond-level cache10-20 cycles (~3 ns)
L3 CacheThird-level cache, shared30-70 cycles (~10 ns)
RAMMain system memory100-300 cycles (~100 ns)
SSDSolid state storage~50,000 ns
HDDHard disk drive~5,000,000 ns

Program Execution Flowโ€‹

From Storage to Executionโ€‹

Memory Hierarchyโ€‹

CPU Instruction Cycle (Fetch-Decode-Execute)โ€‹

Example: Adding Two Numbersโ€‹

Instruction: ADD R1, R2, R3  (R1 = R2 + R3)

1. FETCH:
- PC = 0x1000 (program counter points to instruction)
- Load instruction from memory address 0x1000
- PC = 0x1004 (move to next instruction)

2. DECODE:
- Opcode: ADD
- Operand 1: R2 (register 2)
- Operand 2: R3 (register 3)
- Destination: R1 (register 1)

3. EXECUTE:
- Read value from R2 (e.g., 10)
- Read value from R3 (e.g., 20)
- ALU performs: 10 + 20 = 30

4. STORE:
- Write result (30) to R1
- Update flags (zero flag, carry flag, etc.)

RAM Organization for a Programโ€‹

Memory Layout of a Processโ€‹

Memory Segments Explainedโ€‹

1. Text/Code Segmentโ€‹

  • Contains compiled machine code (instructions)
  • Read-only and executable
  • Shared among multiple instances of same program
  • Fixed size at load time

2. Data Segmentโ€‹

  • Initialized Data: Global and static variables with initial values
    int globalVar = 100;  // Stored in data segment
    static int count = 0; // Stored in data segment

3. BSS Segment (Block Started by Symbol)โ€‹

  • Uninitialized global and static variables
  • Automatically initialized to zero
  • Doesn't occupy space in executable file
    int globalArray[1000]; // Stored in BSS
    static int flag; // Stored in BSS

4. Heapโ€‹

  • Dynamic memory allocation
  • Grows upward toward higher addresses
  • Managed by programmer (malloc/free, new/delete)
  • Exists until program ends or explicitly freed
    int* ptr = malloc(sizeof(int) * 100); // Allocated on heap

5. Stackโ€‹

  • Automatic memory allocation
  • Grows downward toward lower addresses
  • Stores local variables, function parameters, return addresses
  • Automatically cleaned up when function returns
    void function() {
    int localVar = 10; // Stored on stack
    }

Variable Storage in Memoryโ€‹

Example Program Analysisโ€‹

#include <stdio.h>
#include <stdlib.h>

int globalVar = 100; // Data segment
static int staticVar = 200; // Data segment
int uninitGlobal; // BSS segment

void function(int param) { // param on stack
int localVar = 10; // Stack
static int staticLocal = 5; // Data segment
int* heapVar = malloc(sizeof(int)); // Pointer on stack, data on heap
*heapVar = 20; // Value stored on heap

printf("Address of param: %p\n", &param);
printf("Address of localVar: %p\n", &localVar);
printf("Address of heapVar: %p\n", heapVar);

free(heapVar);
}

int main() {
int mainLocal = 5; // Stack
function(mainLocal);
return 0;
}

How CPU Executes Instructionsโ€‹

Assembly to Machine Codeโ€‹

CPU Registers During Executionโ€‹

Complete Program Execution Exampleโ€‹

Simple C Programโ€‹

int main() {
int a = 5;
int b = 10;
int c = a + b;
return c;
}

Step-by-Step Executionโ€‹

Memory Access Patternโ€‹

Function Call Stackโ€‹

How Function Calls Workโ€‹

Function Call Exampleโ€‹

void function2(int x) {
int local2 = x * 2;
return;
}

void function1(int y) {
int local1 = y + 1;
function2(local1);
return;
}

int main() {
int a = 5;
function1(a);
return 0;
}

Dynamic Memory Allocationโ€‹

Heap Managementโ€‹

malloc/free Processโ€‹

CPU Pipelineโ€‹

Modern CPUs Execute Multiple Instructions Simultaneouslyโ€‹

Pipeline Stagesโ€‹

  1. Fetch (IF): Get instruction from memory
  2. Decode (ID): Interpret instruction and read registers
  3. Execute (EX): Perform operation in ALU
  4. Memory (MEM): Access memory if needed
  5. Writeback (WB): Write result to register

Cache Memoryโ€‹

How Cache Worksโ€‹

Cache Line Exampleโ€‹

Virtual Memoryโ€‹

Virtual to Physical Address Translationโ€‹

Page Table Structureโ€‹

Complete System Viewโ€‹

Performance Comparisonโ€‹

Access Time Comparisonโ€‹

Human-Scale Time Analogyโ€‹

If accessing a CPU register took 1 second, here's how long other operations would take:

Memory LevelActual TimeIf Register = 1 Second
CPU Register0.3 ns1 second
L1 Cache1 ns3 seconds
L2 Cache3 ns10 seconds
L3 Cache10 ns33 seconds
RAM100 ns5.5 minutes
SSD50 ฮผs1.9 days
HDD5 ms6.4 months

Key Takeawaysโ€‹

  • Speed vs Size Trade-off: Faster memory is exponentially more expensive and smaller
  • Locality Matters: Programs that access nearby memory locations run faster due to caching
  • Cache is Critical: Modern CPUs spend significant silicon area on cache to bridge the speed gap
  • RAM is Slow: Despite being "fast" by human standards, RAM is ~100x slower than L1 cache
  • Disk is Extremely Slow: SSDs are 500,000x slower than registers; HDDs are 16 million times slower