This post has moved. Last time, we talked about including contextual information to help us
actually fix crashes that happen in the field. Minidumps are a great
way to easily save a snapshot of the most important parts of a running
(or crashed) process, but it's often useful to understand the
low-level mechanics of a C++ call stack (on x86). Given some basic
principles about function calls, we will derive the implementation
of code to walk a call stack.
C++ function call stack entries are stored on the x86 stack, which
grows downward in memory. That is, pushing on the stack subtracts
from the stack pointer. The ESP register points to the
most-recently-written item on the stack; thus, push eax
is equivalent to:
sub esp, 4
mov [esp], eax
Let's say we're calling a function:
int __stdcall foo(int x, int y)
The __stdcall
calling convention pushes arguments onto the stack from right to left
and returns the result in the EAX register, so calling
foo(1, 2) generates this code:
push 2
push 1
call foo
; result in eax
If you aren't familiar with assembly, I know this is a lot to absorb,
but bear with me; we're almost there. We haven't seen the
call instruction before. It pushes the EIP
register, which is the return address from the called function onto
the stack and then jumps to the target function.
If we didn't store the instruction pointer, the called function would
not know where to return when it was done.
The final piece of information we need to construct a C++ call stack is
that functions live in memory, functions have names, and thus sections
of memory have names. If we can get access to a mapping of memory
addresses to function names (say, with the
/MAP
linker option), and we can read instruction pointers up the call
stack, we can generate a symbolic stack trace.
How do we read the instruction pointers up the call stack?
Unfortunately, just knowing the return address from the current
function is not enough. How do you know the location of the caller's
caller? Without extra information, you don't. Fortunately, most
functions have that information in the form of a function prologue:
push ebp
mov ebp, esp
and epilogue:
mov esp, ebp
pop ebp
These bits of code appear at the beginning and end of every function, allowing you
to use the EBP register as the "current stack frame".
Function arguments are always accessed at positive offsets from EBP,
and locals at negative offsets:
; int foo(int x, int y)
; ...
[EBP+12] = y argument
[EBP+8] = x argument
[EBP+4] = return address (set by call instruction)
[EBP] = previous stack frame
[EBP-4] = local variable 1
[EBP-8] = local variable 2
; ...
Look! For any stack frame EBP, the caller's address is
at [EBP+4] and the previous stack frame is at [EBP].
By dereferencing EBP, we can walk
the call stack, all the way to the top!
struct stack_frame {
stack_frame* previous;
unsigned long return_address;
};
std::vector get_call_stack() {
std::vector call_stack;
stack_frame* current_frame;
__asm mov current_frame, ebp
while (!IsBadReadPtr(current_frame, sizeof(stack_frame))) {
call_stack.push_back(current_frame->return_address);
current_frame = current_frame->previous;
}
return call_stack;
}
// Convert the array of addresses to names with the aforementioned MAP file.
Yay, now we know how to grab a stack trace from any location in the
code. This implementation is not robust, but the concepts are
correct: functions have names, functions live in memory, and we can
determine which memory addresses are on the call stack. Now that you
know how to manually grab a call stack, let Microsoft do the heavy
lifting with the
StackWalk64
function.
Next time, we'll talk about setting up your very own Microsoft Symbol Server so you can
grab accurate function names from every version of your software.