So I’ve got this process. It’s a long-running process. It runs for months and months and months, daemonized, pulling messages off a queue and executing the jobs contained therein.
I was proud of my process. It was simple, elegant, and did its job without complaint, merrily humming away, without any human interaction at all for 98 days.
Then one of the things it does stopped working. Oh well, I can’t complain, this thing has worked perfectly for 98 days, something was bound to screw up sooner or later. So the first thing I do is tail the log file to see what broke.
“Hmmm, this is weird,” I thought to myself. The problem only appeared today, but the last entries in the logfile are from a month ago. WTF? What’s it been doing that past month?
So I check everything that was supposed to happen in the past month, and it all happened. So clearly something is screwy with the log file. Oh wait! The partition must be full of logfile cruft, so it got truncated. Nope. 12% full. Not even close.
So I run the module that broke manually so I can see what happens. Wait a minute – now there are new entries in the log, but dated last month? Wait…it can’t be…
Yes, ladies and gentlemen, my super-awesome, multi-buffered, multiple-filehandle-caching logging function. It gets the current timestamp. And I forgot that the standard time struct returns the month zero-indexed. EVERY log file EVER generated by this thing over the past 98 days has the month off-by-one.
I blame those confusing summer months, during which I wrote this thing, when I never know what number month it is.
BTW the thing that broke was really easy to fix. So was the log function.