Binary Exploitation: Understanding What Happens When an Executable Runs

Table of Contents

Landscape When you write a C program, such as:

#include <stdio.h>

int fun() {
    int a = 10;
    return a * 10;
}

int main() {
    int b = fun();
    b = b + fun();
    printf("%d", b);
}

The code outputs 200. But what exactly happens from writing this code to seeing the output? Let’s break down the process of compiling and running this program.

What is Compiling?

Compiling converts human-readable C code into machine code that the computer can execute. Here’s how you can compile your code:

  • 64-bit Binary: gcc main.c -o main
  • 32-bit Binary: gcc -m32 main.c -o main

The Compilation Process

  • Preprocessing: During preprocessing, the compiler handles all preprocessor directives, which are lines that start with #. For example, #include <stdio.h> tells the compiler to include the standard input-output library.
  • Compilation: After preprocessing, the modified C code is compiled into assembly code, which is a low-level representation of the program. Assembly code uses mnemonics and labels to represent the basic instructions that the CPU can execute.
  • Assembly: The assembler translates the assembly code into machine code, which is composed of binary instructions that the CPU can directly execute. This machine code is often represented in hexadecimal format for readability, but fundamentally, it’s just a series of bits (0s and 1s).
  • Linking: Linking stage combines all the machine code files, including the code from libraries and other modules the program uses, into a single executable file. The linker resolves references to functions and variables that are declared in different files, ensuring that all parts of the program can work together.

Key Elements of an Executable

  • Heap: A memory area for dynamic memory allocation, managed with functions like malloc(), calloc() and free(). Memory is allocated on the heap when dynamic allocations or static/global variable declarations occur.

  • Registers: Small, fast storage areas within the CPU used to store data and memory addresses:

    • General-Purpose Registers (e.g., EAX, EBX): Used for calculations and temporary storage.
    • Base Pointer (EBP or RBP): Points to the base of the current stack frame, essential for accessing function parameters and local variables.
    • Stack Pointer (ESP or RSP): Points to the top of the stack, managing the current position in memory.
# Example
ESP for 32 bit CPU
RSP for 64 bit CPU
  • Stack: Stack is a Data structure that supports Last In-First Out(LIFO). And only supports two operations Push(to enter) and POP(to remove).
    • The below images is a good representation of how stack work.
    • But why stack is used?

The stack is used in memory to manage function calls and local variables efficiently. When a function is called, a stack frame is created, storing the function’s return address, parameters, and local variables. The stack follows a LIFO structure, which means the most recently added items are the first to be removed. This organization allows for quick allocation and deallocation of memory, when a function is called, the stack quickly allocates a stack frame for that function’s local variables and execution context. When the function completes, the stack can instantly deallocate the memory by adjusting the stack pointer, freeing up space for the next function call. This fast memory management helps maintain performance, especially in programs with many function calls or recursion, ensuring smooth and efficient execution. {: .prompt-tip }

How the Stack and Registers Work

When a function like fun() is called in your program, the CPU performs several steps to manage memory and execute instructions:

  1. Stack Frame Setup:

    • The stack pointer (ESP or RSP) adjusts to create space for fun’s stack frame, including the local variable a.
    • The base pointer (EBP or RBP) points to the base of fun’s stack frame, helping access local variables.
  2. Instruction Execution:

    • The instruction pointer (EIP or RIP) holds the address of the next instruction to be executed, moving sequentially unless altered by control flow instructions.
  3. Using Registers:

    • General-purpose registers are used for calculations and storing return values. For example, fun() multiplies a by 10 and returns the result using a register like EAX.
  4. Returning from a Function:

    • The stack pointer resets to the previous stack frame’s position, deallocating fun’s frame and returning control to main().
  5. Completion of Execution:

    • As main() continues, it uses the return values from fun(), and upon completion, the stack is cleaned up, and control returns to the operating system.

Conclusion

Understanding the compilation process and the role of memory structures like the stack and heap helps demystify what happens when you run a compiled binary. This knowledge is crucial for optimizing program performance and managing resources efficiently.

Resources

  1. demystifying the secret structure you’ve been using all along - By Low Level Learning
  2. x86 Assembly Crash Course - By HackUCF

Feel free to reach out if you notice any errors or have suggestions for improvements. I’ll be making changes over time. {: .prompt-info }