The recent Puppet Enterprise 2015.3 release included the first version of the new Code Management service, which makes it easier than ever to roll out changes to your Puppet code. Code Management handles the work of moving your code from a version control system like GitHub to your Puppet master. Check out this blog post by Lindsey Smith to learn more.
Along with Code Management, this release also included a new feature, file sync, which is utilized by Code Management to seamlessly distribute code changes to compile masters. This post will be a look inside file sync: what problems it solves, how it works, how to use it, and its role in the future of Puppet.
The synchronization problem
In the life of every Puppet infrastructure, there comes a point when a single Puppet master is no longer enough. The standard approach to scaling a Puppet infrastructure is to adopt a multi-master architecture, where the primary master is known as the "Master of Masters" ("MoM" for short), and its main responsibility is to host the certificate authority. The other masters are referred to as “compile masters” because they are tasked with compiling catalogs for agents. Agents communicate with a load balancer, which distributes traffic among the pool of masters. This is a battle-tested, proven way of scaling up a Puppet infrastructure to handle a larger number of agents.
The problem with this architecture, of course, is one of consistency: Each master will be serving catalogs based on the contents of its code directory —
/etc/puppetlabs/code by default on Puppet 4+ and PE 2015.2+ masters. If the manifests in the code directory differ between compile masters, it becomes impossible to predict what’s going to happen during an agent run, since requests from an agent may receive a different response depending on which master handled the request. This is a fundamental issue with any multi-master deployment, and there has not been a comprehensive solution to it … until now.
Code Deployment challenges
Along with the problem of keeping manifests in-sync across compile masters, there are a couple of other problems associated with deploying Puppet code which are addressed by File Sync.
The environment cache
The first of these issues relates to Puppet’s environment cache. The environment cache is an optional feature inside the Puppet master that uses an internal cache to store the contents of manifests on a per-environment basis. The benefit of this caching is that it can drastically speed up catalog compilation times — an important performance characteristic, especially as the size of your infrastructure grows. However, as with all caching, it comes with a cost: The cache must be invalidated when new code is deployed, because changes made to manifests will not be live until the cache expires, which can easily be a source of confusion.
The behavior of this cache is controlled by the
environment_timeout setting in puppet.conf, and by default, no caching is performed. You can configure
environment_timeout to use a fixed time interval (say, 5 minutes). But that sort of configuration is not recommended due to the unpredictability it introduces — especially when considering that most Puppet masters (including Puppet Server) use a pool of Ruby interpreters to efficiently handle requests from agents, and that timeout actually applies on a per-interpreter basis. Instead, if you want to use caching for performance reasons, the recommended setting is
environment_timeout = unlimited, which means that the cache will never auto-expire. You'll need to invalidate it using Puppet Server’s administrative API.
Now, if you’re like me, that sounds like a slightly annoying headache you would prefer not to have to worry about. Wouldn’t it be great if something inside the Puppet master would just take care of all of this for you? Well, file sync does exactly that, and I’ll get into the details on that a little bit later in this post.
Safe code deployment
The other problem we’ve solved with file sync is the issue of deploying new code safely onto a running Puppet master. The main issue is that changing manifests on a running Puppet master can result in the master reading those files while they are in an incomplete or inconsistent state. On a production system under significant load, this is likely to result in failed agent runs, which means an unpredictable Puppet infrastructure. File sync gives us a way to deploy code safely while avoiding this pitfall. We’ve been referring to this process as an “atomic code deployment.” There’s a lot more to say about atomic code deployment than I’ll cover in this post, but stay tuned for a future post by Chris Price, which will go into a lot more detail on this topic.
How it works
Operationally, the first thing added by file sync is that compile masters now “phone home” to the MoM: They ask if there is a new version of the code that needs to be deployed. Thus, the burden of pushing out changes to each of your masters simultaneously is eliminated — file sync takes care of this automatically.
The most noticeable change introduced by file sync is the staging directory: By default,
/etc/puppetlabs/code-staging. Gone are the days of simply dropping new code into the live code directory and hoping for the best — as mentioned earlier, that process is fraught with peril. Instead, new code is placed in the staging directory, and then committed to file sync. This commit happens on the MoM, and after the commit is complete, compile masters will immediately fetch the new version of the code from the MoM. The new code is fetched into file sync’s internal data directory, not straight to the live code directory. This allows file sync to get all of the expensive network I/O out of the way without interrupting the server, and then to safely deploy this new code to the live code directory via the following steps:
- Handling of incoming requests from agents is temporarily paused.
- Any requests that were already being handled when the new code was deployed are allowed to complete.
- At this point in time, file sync knows that is safe to go ahead and deploy the new code to the live code directory. This is done in an efficient manner — not simply copying the directory wholesale. This is important, since the system is effectively paused until this operation is complete, given that requests from agents are not being handled.
- That pesky environment cache is invalidated! This is done immediately after the new code has been deployed, and it’s handled automatically by Puppet Server, so you don’t have to worry about it.
- Finally, request handling is resumed and the system resumes normal operations.
This process takes place on both the MoM and the compile masters, so the benefits of automatic environment cache invalidation and atomic code deployment are realized on all of the masters. And all of this is built into Puppet Server, so there are no additional processes or services to worry about with the addition of file sync.
It’s worth noting that file sync does not completely remove the problem of compile masters being out of sync with the MoM, since each compile master checks in and synchronizes with the MoM independently. However, it drastically reduces the likelihood of any inconsistency, since compile masters are regularly checking for new code on the MoM. Furthermore, file sync gives us the ability to know exactly which version of the code was used to handle a request from an agent. We’ve got several exciting ideas for reporting and visualization tools that could be built on top of this, making it much easier to understand what’s going on in your Puppet infrastructure.
How do I use it?
If you read Lindsey Smith's recent blog post, you know that we've been thinking about deploying and managing Puppet code holistically - not just what happens to code once it lands on a single Puppet master. In support of that idea, file sync is tightly integrated with the new Code Management feature, which Lindsey describes in his post. Code Management knows about file sync, so it deploys new code to the staging directory and then triggers a file sync commit. This enables a workflow in which you can write Puppet code on your laptop, push it to GitHub (or Stash, or GitLab, etc.) and Code Management and file sync take it from there, automatically and safely deploying the new code to all your Puppet masters.
If Code Management is not your cup of tea, you can “roll your own” integration with file sync. Simply place new code in the staging directory by whatever means you prefer, and then commit it to file sync via the HTTP API when ready. Additional information on this can be found on our documentation site.
In the future, file sync is going to play an even more important role in Puppet’s technology stack. In particular, atomic code deployments are enabling us to do all sorts of interesting things that will make Puppet smarter, faster, and more efficient. File sync also lays the foundation for Puppet Server to be aware of how your code has changed over time, opening the door for future possibilities like the retrieval of historical content. Stay tuned — some of these changes will be shipping in a Puppet Enterprise release later this year.
For now, however, we’re very happy to have shipped the initial version of file sync in Puppet Enterprise 2015.3. Lots of work went into this feature over the course of a couple years. We’re really excited to see it out in action, and we'd love feedback on what's awesome about file sync and on how we can make it even better.
Kevin Corcoran is a senior software engineer at Puppet Labs.
- Official documentation on file sync
- What's going on in there? New metrics available in Puppet Server by Chris Price
- PuppetConf 2015 video: Puppet Server: 2015 and Beyond!. The middle portion of this talk is all about file sync.
- PuppetConf 2014 video: The Puppet Master on the JVM
- PuppetConf 2015 video: r10k and Code Management: Out of the Box
- Managing Infrastructure as Code — Now Easier than Ever by Lindsey Smith
- Puppet Server: Bringing SOA to a Puppet Master Near You by Chris Price