October 2015

Needles and haystacks: finding the one bad request among billions with tcpdump

This week, we had a few weird crashes with an HTTP server which we could not easily reproduce and we had a hard time pin-pointing the source of the issue. We knew the problems were triggered by bad input, but since the process received around 3000 requests per second at the time, it was pretty hard to isolate the exact request(s) that made it crash.

The idea we had was to capture HTTP requests data up to the point where the process crashed. Then, we would open the trace and look for the last successful requests, the faulty one would be in there somewhere.