It involved a crash with 32 bit optimized build of an executable which was running fine with 32 bit or 64 bit debug build and even with 64 bit optimized build. No crash with debug build, hence no debug symbols! I started printing out logs “inside this function : All looking good”. This exercise led me to the culprit function. Nothing outrageous was happening inside this function. A few more printf’s led me to the line where the crash occurred.
my_list *pList = NULL;
my_container->list = pList;
std::cout << "Size of list " << size ; // And here it crashed
Points to be noted here -
1. Valgrind didn't show any memory error in this area of code.
my_container->list. In fact they didn't make any change at all to
3. At the point of crash, the value of
my_container->list was different. While
my_container->list had the same value it was assigned, the memory address pointed to by
pList was showing some junk value. Conclusion -
pList got corrupt somewhere. But how could it be possible?
Initially, my guess was that the compiler optimized away the local variable pList. If that is the cause, then why didn't 64 bit optimized build fail? Or how can such a trivial bug happen in gcc? Therefore it was an utterly wrong guess.
Anyway, I made a temporary fix by replacing the crashing line with -
std::cout << "Size of list " << list->size ; // And here it stopped crashing.
Only it was not a fix exactly. Some memory address got corrupt and I avoided accessing it by accessing some other memory address containing the same value.
So how did I find out the problem?
I attached to gdb an optimized build which was not stripped off symbols. gdb has a handy command
disassemble which I used to 'diassemble'
The native code displayed by the
disassemble command showed that the compiler inlined
call_some_func(). I could get the memory location at which variable
pList's value was stored. You can locate it in the native code by looking for nearby function calls which the compiler didn't optimize. And then try to correspond the native code with actual C/C++ code. There is no formula to do this but only comparing the C/C++ code and the native code and trying to figure out which lines of native code correspond to which line in C/C++ code are the only ways.
But this didn't tell me how the variable pList got corrupt. What I needed was a watch on the memory location that stored the pointer
pList. I ran the executable in gdb and around the line
initialize_list (&pList); I started printing the native code.
The command I used -
x /10i $pc <-- This prints next 10 instructions.
To clarify things,
pList is a pointer to an address where the list is stored.
my_container->list points to the memory location where the field list of the struct my_container is stored.
For the line
"my_container->list = pList;", there would be instructions to move the value from one register to another. One of those registers would contain the memory address at which the list pointed to by the pointer pList is stored.
Once I got the memory address that stored the pointer pList, a
watch on that memory address revealed that a static buffer overflow caused the corruption. It was done by an
The bottom line is if you get a spooky crash and do not have a debug build or can not reproduce it in a debug build, do not panic. I was lucky to get some pointers on debugging assembly code from this guy. What the whole experience taught me was - make use of gdb as much as possible. Here's a list of a few handy gdb commands.