Inside Puppet: About Determinism
Why do we want configuration management? There are plenty of reasons, but at the core of them is that we want to streamline the configuration and deployment of systems. We want this process to be repeatable, well-understood, and predictable. We want to make it deterministic.
Determinism — the idea that a process should result in the same outcome every time it is applied — is not a new idea, but it’s also not one that system administrators apply to their daily work. Can you prove that the bash script you just wrote to deploy your new NoSQL database will work across all of your system configurations? Will it work the same way if you execute it again next week—or next year? Automating the deployment and configuration of systems is meaningless if we can't guarantee that these systems will complete the configuration process with predictable results.
For years, organizations have tried to handle this problem using scripts to deploy and manage their environments. These scripts evolve over years as they’re passed from admin to admin, growing to contain the personal quirks of their developers and long-irrelevant legacy workarounds. It’s not until a script is needed—usually when the server fails—that sysadmins discover it’s tightly pegged to a particular version of bash or sed, or requires a specific return code from an init script that no longer exists. The script's lack of predictability impacts organizations at the worst possible time in the development cycle.
Periodically, organizations recognize that these scripts have become unmanageable and will attempt to refactor or rebuild them from scratch, resulting in a whole new cycle of trial-and-error. This tug of war with scripts takes time from other, more valuable tasks. How many times have you fixed or updated the same process when you could have been learning or applying new information to your environment?
Your credibility as an admin is dependent on your ability to construct reliable, stable, consistent systems. It's critical that the process you employ works in a predictable way every time.
Puppet's View of the Server
What makes Puppet different from a library of shell scripts, and from other configuration management tools, is that it forms a model of your system configuration prior to performing any activity on the system itself.
Before a single command is executed on a host, Puppet has constructed what we refer to as a "resource graph" of that node's existing configuration, and determined how your changes will impact that graph. Below is a very simple example of such a resource graph:
It's a difficult leap for new users, particularly as we as sysadmins are trained to think procedurally and not deterministically. Admins tend to think of processes: managing the server as a group of activities that leave the system in a desired state when properly applied. We spend most of our careers mastering the processes (and trying to pass certification exams that prove we have).
Puppet asks you to think differently about what administration means, and to focus on what the server is supposed to look like after all processes are finished. It doesn't ignore process, but considers it a means to an end—you tell Puppet what you want the server to look like, and Puppet works out how to get the server there.
Noop and Dry Runs
One of the big advantages of the resource graph model is Puppet's unique and unrivaled "noop" functionality, which allows you to simulate a change before you deploy it. Other tools can run in some semblance of a dry run or do-nothing mode, but they often repeat back what your script says, without offering the impact on your system as a whole. Puppet understands what your environment currently looks like as of the last Puppet run (represented by the orange rectangle in the below example) and what the end state is supposed to look like (the green parallelogram). Puppet then tells you both what it will do when executed without noop, and how that action would change the overall configuration of your system. These steps are identical in noop mode and in regular execution mode. The only difference is whether or not Puppet performs the action in the red shape at the bottom, or merely reports to you what will happen if you run Puppet.
Where Determinism Poses a Challenge
This resource graph model can be problematic for you if you don’t know what a server is supposed to look like at the end of your administration process. Puppet insists that you understand an application’s impact will be before you begin the process of installing it. This is how it should be. Nothing should be placed on a production system without you, the administrator, being able to fully document and understand the changes that application will impose; yet we are all guilty of starting procedures on systems without fully understanding the impact.
While this deterministic core is at the heart of Puppet, there are still elements that can cause confusion. Until recently, there was an issue with resources being applied in a non-deterministic order on nodes, so that a manifest might compile and execute in one order on one node, and another on a second. This was based on the assumption that all resources are atomic, and is remedied in the Puppet 2.7 and Puppet Enterprise 2.0.
Determinism is not a panacea—you still have to validate that your Puppet manifests do what they're supposed to, and you should put them through the same rigorous testing you expect of dev environments. But you are able to increase your confidence that something working in your test environment will work in production, and that a proposed change will not interfere with a production environment because of a surprising interaction.
Further Advantages to a Deterministic Approach with Puppet
Aside from noop mode, what do you as an administrator get out of Puppet's deterministic nature? A few notable benefits:
- Your Puppet manifests describe what your systems look like and how to install them. While there is never a replacement for good long-form documentation, in the real world people rarely have the time to write it. Fortunately, Puppet manifests can serve as accurate documentation of a server's entire configured environment, from unusual post-install procedures to open ports—and in fact this can be better than written documentation, since it is by definition always up-to-date.
- When you do deploy, that process is automated. Your best and most senior engineer does not have to be the guy who pushes the button—and it's very unlikely that on deploy night that you will be writing or heavily modifying legacy bash or Perl scripts.
- How to look at your resource graph in Puppet
You can ultimately think of the choice between a deterministic and non-deterministic tool as the choice between predictability and randomness, or better, between configuration management and ad-hoc system administration. With Puppet, you think your problem through, saving you time and effort later on. Non-deterministic tools may seem to solve a short-term problem, but you’re merely accruing technical debt.