Published on 11 September 2012 by

Continuing our series of PuppetConf speaker interviews, we talked to Ryan Park about how Pinterest uses Puppet, his upcoming PuppetConf talk, and his advice for other sysadmins.

Puppet Labs: Tell me a bit about yourself and your background:

Ryan Park: I joined Pinterest about a year ago. When I arrived, I was the first full-time ops hire, and the first person really automating the infrastructure. Before that, servers weren't really standardized; we simply installed software and configured machines as needed.

One of the first things I did at Pinterest was bring in Puppet. I spent the majority of my first three months configuring the environment and converting the existing servers to be managed by Puppet.

Setting up a configuration management system was the most important thing to do when I arrived at Pinterest. Having Puppet meant we didn't need to do things a hundred times for a hundred servers. For example, using Puppet meant we could easily grant access to new employees—giving access to 20 servers here and 30 servers there. We could trust that our servers were all configured the same way and that nothing was different or screwy about any individual host.

Puppet Labs: What brought you to Puppet?

Ryan: I had previous Puppet experience prior to working at Pinterest. We also looked at another software solution because one of my co-workers had more experience with that software; however, we ultimately decided it would be faster to get Puppet up and running. My knowledge of the Puppet language helped us get started quickly.

Puppet Labs: Tell us a bit more about your talk at PuppetConf.

Ryan: My PuppetConf talk is titled "Puppet at Pinterest.” Pinterest is a heavy user of Amazon Web Services, primarily Amazon EC2 virtual server instances. Our infrastructure has grown rapidly and we currently run hundreds of EC2 servers.

At Pinterest, we chose to use the Puppet Dashboard as the "source of truth" in our environment — it tells us what servers we have, and what functions they perform. and we feed its data into many other systems. We built an internal tool which we call the Puppet API, which is a programmable REST API interface that we can use to pull information from the Puppet Dashboard database. This makes it easy build tools that integrate this data with the rest of our infrastructure.

For example, we feed this data to Amazon’s Route 53 in order to set up domain names for our servers. We run regular audits to confirm that EC2 and Puppet Dashboard match in terms of what servers are up and in a valid state. We use Puppet Dashboard data to monitor systems and to run security audits. In the talk, we'll show a couple of of these tools, and share how we've used it to integrate other systems like Amazon EC2.

One of the benefits of EC2 is that we can auto-scale some of our servers: ramp up to 100% of capacity during the day when we’re serving the most traffic, and ramp down during the night. Again this is done through our REST API: we can provision a new server and bring it up (or shut it down) without any manual configuration.

Why PuppetConf?

Puppet’s been an important piece of infrastructure at Pinterest. There’s a very active user community around Puppet, but there’s still a pretty big learning curve. If we’ve learned anything about Puppet here at Pinterest, we’d like to share that with the community, and to help young startups and established enterprises take advantage of what what we’ve learned.

Puppet Labs: What's the biggest lesson you've learned from using Puppet at Pinterest?

Ryan: It's been critical to keep a definitive server list in some kind of database, and use the same data to configure Puppet and all our other systems. In our case we're using Puppet Dashboard, which is a database, a front-end, and an External Node Classifier all in one. We could have used LDAP or a custom database app instead, but Puppet Dashboard was the easiest to get up and running. The central inventory database has allowed us to write tools that use this data in innovative ways. We've built dashboards of metrics and monitoring data coming from each of our servers, and that's only possible if we know what servers we have up and running. If we didn't have a database storing our configuration, it would be difficult or impossible for us to propagate this data to other systems.

Do you have any advice for other operations folks from your experiences?

Ryan: We’ve found it important to hire software engineers that understand operations, and operations engineers that understand our product. I have a CS degree focused in Human-Computer Interaction, and I was a front-end engineer for many years. I think this helps me understand the bigger picture about what we're building. At Pinterest, our mission is to help our users dream, plan, and prepare for the things they do in their life. This means we're not creating technology for technology's sake, but rather to deliver a great experience to our users. So we love to find operations engineers who have a wide CS background and a wide range of professional interests and experiences. Likewise, many of our software engineers have operational experience at previous companies, and that also helps us all stay on the same page.

Learn More:

Cross-posted from puppetconf.com.

Share via:
Posted in:
Tagged:
The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.