Hey kids, come here - grampa is telling a story from the war... ;)

I'm currently hunting a bunch of race conditions in a rather large C/C++ code base. Some of them manifest quite visibly as heap memory corruptions (the reason I'm searching), others are subtle and just visible in one-in-a-thousand slightly off output values.

Now ten years ago, when you were facing such bugs, probably all you could do was to read all thet code, verify the locking structure and look out for racyness. But things have changed substantially. We have valgrind.

Valgrind is a binary code instrumentation framework and a collection of tools on top of that. It will take your binary and instrument loads, stores and a bunch of important C library functions (memory management, thread synchronization, ...). It can then perform various checks on your running code. So it belongs in the category of dynamic analysis tools.

Luckily the buggy code also runs on linux where valgrind is available so I went on to try it. And boy did it find subtle bugs...

There are two different tools available for race conditions: helgrind and drd. I'm not yet 100% sure how both of them work but as far as I understand, they are tracking memory loads and stores together with synchronization primitives and build a runs-before graph of code segments. When code segments access shared memory in a conflicting way (reader/writer problem) without having a strong runs-before relationsship these are race conditions and will be reported.

Sadly I'm also finding real races in well known open source libraries that are supposed to be thread safe and work correctly ... seems they do not. This tells me, that the developers are not using all the tools available to check their code.

And this is my message: If you are writing an open source library or application that only remotly has something to do with threads, save yourself and your users a lot of trouble and use valgrind!