Free Software / Tracking memory corruptions

Sep 07, 2005 16:38

This is one among the very important reasons why I like free software.

I use IMAPFilter to filter my email on a remote IMAP server. Turns out the recent glibc versions detect double frees and memory corruptions, and IMAPFilter was crashing with a "*** glibc detected *** corrupted double-linked list: 0xc001babe ***" error, rendering it useless. Somebody reported the bug which troubled me as soon as I upgraded glibc on Debian unstable. After poking around a bit, I have a patch to fix the issue.

Here is how you would go about tracking the root cause for errors like this:

1. Get the source, compile the program with debugging symbols and install it.

If you are running Debian, you would get the sources using apt-get source $software, install the packages required to build $software using apt-get build-dep $software, remove the dh_strip directive from the $software/debian/rules file, ensure that the CFLAGS has a -g option etc., build a $software.deb using dpkg-buildpackage -rfakeroot and install it using dpkg -i $software.deb

2. Run $software from under gdb, allow it to crash and look at the stack backtrace.

Though some would say that a live gdb session is for wimps and real programmers use it only on core files, I am of the opinion that they are being plain stupid. Use the right tools for doing the job at hand efficiently, always!

3. Place breakpoints at each of the individual functions that appear in the stack backtrace, run the program, and when a breakpoint is hit, call the glibc mcheck function and continue.

Calling mcheck(0) and continue-ing on each hit may be cumbersome if it is done manually, so you would like to attach commands to the breakpoints so that they are executed when the breakpoint is hit with the program continue-ing automatically.

(gdb) b request_login
Breakpoint 11 at 0x0804dadb
(gdb) command 11
Type commands for when breakpoint 11 is hit, one per line.
End with a line saying just "end".
>call mcheck(0)
>continue
>end

4. When the program aborts because of the problem ...

Breakpoint 11, 0x0804dadb in request_login ()
at request.c:51
51 if ((s = session_find(server, user)))
$57 = 0
block freed twice

Program received signal SIGABRT, Aborted.
0x401d29e7 in raise () from /lib/tls/libc.so.6
(gbd)

... examine the stacktrace and the code of the functions called at the suspect stack frames.

5. Fix the bug, share your code :-)

For a full example, look at the Debian bug#326007. There are other elaborate methods like running $software through valgrind, using electric-fence or libdmalloc etc. that upstream software developers should use.

freedom, tech, debian, linux, code, email

Previous post Next post
Up