Reporting Crashes in IMVU: Call Stacks and Minidumps

Feb 26, 2009 00:29


This post has moved.

So far, we've implemented reporting for Python exceptions that bubble out of the main loop, C++ exceptions that bubble into Python (and then out of the main loop), and structured exceptions that bubble into Python (and then out of the main loop.) This is a fairly comprehensive set of failure conditions, but there's still a big piece missing from our reporting.

Imagine that you implement this error reporting and have customers try the new version of your software. You'll soon have a collection of crash reports, and one thing will stand out clearly. Without the context in which crashes happened (call stacks, variable values, perhaps log files), it's very hard to determine their cause(s). And without determining their cause(s), it's very hard to fix them.

Reporting log files are easy enough. Just attach them to the error report. You may need to deal with privacy concerns or limit the size of the log files that get uploaded, but those are straightforward problems.

Because Python has batteries included, grabbing the call stack from a Python exception is trivial. Just take a quick look at the traceback module.

Structured exceptions are a little harder. The structure of a call stack on x86 is machine- and sometimes compiler-dependent. Fortunately, Microsoft provides an API to dump the relevant process state to a file such that it can be opened in Visual Studio or WinDbg, which will let you view the stack trace and select other data. These files are called minidumps, and they're pretty small. Just call MiniDumpWriteDump with the context of the exception and submit the generated file with your crash report.

Grabbing a call stack from C++ exceptions is even harder, and maybe not desired. If you regularly use C++ exceptions for communicating errors from C++ to Python, it's probably too expensive to grab a call stack or write a minidump every single time. However, if you want to do it anyway, here's one way.

C++ exceptions are implemented on top of the Windows kernel's structured exception machinery. Using the try and catch statements in your C++ code causes the compiler to generate SEH code behind the scenes. However, by the time your C++ catch clauses run, the stack has already been unwound. Remember that SEH has three passes: first it runs filter expressions until it finds one that can handle the exception; then it unwinds the stack (destroying any objects allocated on the stack); finally it runs the actual exception handler. Your C++ exception handler runs as the last stage, which means the stack has already been unwound, which means you can't get an accurate call stack from the exception handler. However, we can use SEH to grab a call stack at the point where the exception was thrown, before we handle it...

First, let's determine the SEH exception code of C++ exceptions (WARNING, this code is compiler-dependent):

int main() { DWORD code; __try { throw std::exception(); } __except (code = GetExceptionCode(), EXCEPTION_EXECUTE_HANDLER) { printf("%X\n", code); } }
Once we have that, we can write our exception-catching function like this:

void throw_cpp_exception() { throw std::runtime_error("hi"); } bool writeMiniDump(const EXCEPTION_POINTERS* ep) { // ... return true; } void catch_seh_exception() { __try { throw_cpp_exception(); } __except ( (CPP_EXCEPTION_CODE == GetExceptionCode()) && writeMiniDump(GetExceptionInformation()), EXCEPTION_CONTINUE_SEARCH ) { } } int main() { try { catch_seh_exception(); } catch (const std::exception& e) { printf("%s\n", e.what()); } }
Now we've got call stacks and program state for C++, SEH, and Python exceptions, which makes fixing reported crashes dramatically easier.

Next time I'll go into more detail about how C++ stack traces work, and we'll see if we can grab them more efficiently.

imvu, crashes

Previous post Next post
Up