One of the things I’ve always found amusing about Operations is that people think we spend all of our time fixing broken things. It’s true to some extent: something breaks, we fix it. But the most challenging issues in Operations aren’t that simple binary break/fix. The really interesting issues are performance issues, intermittent bugs, and transitory problems: “it’s a little slow” or “something is off” or the “it’s not quite right” issues. Not only are those problems really challenging, but they can also really interesting (and often fun) to solve.
Diagnosing and solving those problems, however, is very different from dealing with break/fix issues. You still need good diagnosis skills to deal with break/fix issues, but you are usually able to rely on the change-test-validate cycle: make a change, test it fixes the issue, and then validate you haven’t broken anything else. With less binary problems, you sometimes don’t even know where to start or are working from unsubstantive qualitative feedback: “Something doesn’t feel right.”
Operations Loves Data