We recently talked with Karl Matthias, site reliability engineer at New Relic. Karl works on the company's huge-scale database infrastructure, where over 300 billion metrics are written to disk every day.
In this podcast, Karl talks about the role of automation at New Relic, and how automation and configuration management enable consistency and reliability in the company's infrastructure, letting operations staff focus on higher-level problems, along with development engineers, and on managing capacity.
We also discuss:
- How the demand to ship multiple times a day has changed the way infrastructure is managed
- How to streamline with cloud computing
- The costs of managing extra capacity, and developing and maintaining tooling
- Tips for bringing about change within an organization
One of the hardest parts of bringing about change? Karl says it's convincing your organization that there's a problem. He suggests taking time to engage your team in the conversation before jumping to the solutions that you've brainstormed. Solicit a lot of feedback and hone your argument before approaching management, so that you've built support and have identified concrete, shared solutions.
Listen in, and tell us what you think. If you have a story about making positive changes on your ops team, email [email protected].
- Hear additional conversations with Gene Kim, Kelsey Hightower, Gareth Rushgrove and others.
- Watch video interviews with Puppet Labs customers.
Theme music: "Smirking" by BenSoundBeats