published on 2 January 2015

We recently talked with Karl Matthias, site reliability engineer at New Relic. Karl works on the company's huge-scale database infrastructure, where over 300 billion metrics are written to disk every day.

In this podcast, Karl talks about the role of automation at New Relic, and how automation and configuration management enable consistency and reliability in the company's infrastructure, letting operations staff focus on higher-level problems, along with development engineers, and on managing capacity.

We also discuss:

  • How the demand to ship multiple times a day has changed the way infrastructure is managed
  • How to streamline with cloud computing
  • The costs of managing extra capacity, and developing and maintaining tooling
  • Tips for bringing about change within an organization

One of the hardest parts of bringing about change? Karl says it's convincing your organization that there's a problem. He suggests taking time to engage your team in the conversation before jumping to the solutions that you've brainstormed. Solicit a lot of feedback and hone your argument before approaching management, so that you've built support and have identified concrete, shared solutions.

Listen in, and tell us what you think. If you have a story about making positive changes on your ops team, email [email protected].

Learn more

Theme music: "Smirking" by BenSoundBeats

Share via:
The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.