Saturday, December 9, 2006

Shuttle clocks

Somebody asked for my opinion about this, and since I was going to write an entry on software quality today anyway, the timing couldn't have been better.

It was on CNN and Slashdot a couple of weeks ago. The space shuttle has an issue where they are concerned about the computer's clock rolling over to January 1st.

http://news.com.com/Computer+glitch+may+limit+next+shuttle+launch/2100-11397_3-6133088.html

To a layperson, it reeks of "Y2K"-ness. You know, the notion that it's stupid and if they can get people into outer space, why can't they make a clock that keeps track of time as well as the $6.99 clock on my nightstand?

To many software engineers (including many in my office), it seems stupid as well. They write code for a living and find it difficult to believe that "NASA could be so stupid".

I disagree with both groups. The reality is that the space shuttle was built of 1970's technology from hundreds of independent contractors. Some have asserted that it was the most complicated machine ever built by man. It has thousands of interdependent systems that have to run in synchronization in order to work properly.

NASA's onboard shuttle computers also run some of the most validated software on Earth (no pun intended).

http://www.fastcompany.com/online/06/writestuff.html

Want to talk about software quality? Try 17 errors found in the last 11 revisions of a 420,000 line codebase. That's phenominal. Average defect rate for commercial software is about 1 per 1000 lines of code (and that's highly conservative). For those unwilling to do the math, that means a commercial application of the same size would have 420 errors (42 times as many as the shuttle software). And then think about how often your PC crashes.

Also bear in mind that this was not an unknown issue. Most bugs are found as a result of running into them. In this case, it has been a known recognized issue for years. Given the fact that the issue was well understood and there are always bigger fish to fry, NASA made a conscience decision to not run the risk of introducing more bugs by making a change.

So, I guess the answer is "no", I can't fault NASA too much for choosing to not fix a known issue that has zero probability of occuring in normal operation (meaning not flying the shuttle over New Years), when there are plenty of other issues to worry about -- like faulty O-rings and insulating foam coming off the fuel tank at supersonic speeds.