Wednesday, February 6, 2008

Debugging in live environments

Had to work late because a customer called and wanted me to look at a problem with a failed installation. It's always kind of cool to see your software in a real live environment. It's not the same seeing your product running on a test box in the lab as it is seeing it running on a customer's server. Partly it's the notion that someone finds your code to be valuable. You can see it's serving a real purpose - solving a real problem.

The fun happens when they call because it's not working as expected. You don't have all the safety nets and conveniences of your development environment. For one thing, if I screw up my development PC, who cares? I could just reinstall it. That obviously wouldn't fly with a customer's server. And you don't have your tools at your disposal. When code crashes in my development environment or QA, I just attach with my remote debugger and analyze the core dump. No such luck when you're VPN'd in to a customer environment.

I always like a challenge. Hard to figure out problems are what keep me coming to work every day. The best is connecting to a customer environment and then saying "What the fuck? How on Earth did the program get in this state?" Things just don't behave the same way in the wild like they do in the confines of your nice controlled environment.

Tonight's episode I encountered the Windows version of what UNIX developers would refer to as "Your listening socket got inherited across a fork call because it didn't close all the open file descriptors". It's a common problem if you've ever written a UNIX daemon - the daemon runs some other program via fork/exec and because you didn't close the file handles before calling execve(), that program inherited all the file handles from the daemon itself. This can result in problems such as socket communication errors and an inability to start the server if the original daemon gets restarted (because the listening socket is already in use).

Win32 has an equivalent - whenever you call CreateProcess(), there is a flag that dictates whether the new process inherits the handles of the parent. Setting the flag to false prevents the above described behavior. The problem comes in though when there is sufficient abstraction in your program that you don't realize this has occurred. For example, your code calls some code called RunSomePerlCode which calls a class called SpawnProcess, which calls _spawnve(), which doesn't even offer you the opportunity to tell it not to inherit the handles (so you don't think about the implications in regard to handles).

Enough rambling for one evening. Off to bed...