Scaling open source Puppet
Editor’s note: This blog post was originally published on ssconsultinggroup.net, and has been republished published here with the author’s permission.
In my Puppet travels over the last 10 or so years, one topic has continued to arise time and again, and that has been the ability to scale open source Puppet to thousands of nodes.
While the best route is to use Puppet Enterprise for solid support and a team of talented engineers to help you in your configuration management journey, sometimes the right solution for your needs is open source Puppet. What follows is the product of my resolving to get to the bottom of the procedure and make it easy to perform repeatedly and to assist in scaling open source Puppet implementations for larger environments.
Even though this article presents a somewhat rudimentary configuration, you can add PuppetDB, external instrumentation and telemetry, etc., and grow the environment to a truly large enterprise-class system.
The design of such an environment is highly flexible with many hundreds of potential configurations. In this specific scenario, an independent Puppet master performing CA duties for the architecture as well as several catalog compilers placed behind a TCP load balancer for catalog compilation is what I plan to cover. Once the specific moving parts are identified, modern system engineering practice can be applied to the environment to expand and scale the installation as needed.
The Puppet Master
Every Puppet implementation has one of these. Whether there are extra compilers or not, the primary master is tooled to be not only a CA master but also a catalog compiler in its own right. If you begin to tune the master and place it on beefy hardware, you can expect to eventually reach a limit to the number of nodes you can serve. If you add PuppetDB to the mix, there’ll be different requirements, but generally speaking, you will want to offload PuppetDB to a different server so as to keep the master free to serve CA requests to the environment.
The Load Balancer
For this function, you simply need a TCP load balancer. An Elastic Load Balancer in AWS would serve nicely as would HAProxy on a largeish machine (t2-xlarge). In short, this load balancer simply needs to be able to see that a port is up on the destination nodes in the serving pool, and then proxy the connections to a member of that pool that is in a healthy state. You may also wish to pull the Puppet health check API endpoint at the load balancer:
This check ensures the pool is healthy, and the load balancer only forwards requests to healthy catalog compilers.
Note also that for the purposes of this discussion, it is assumed you have set up this load balancer and assigned the name compile.example.com to the VIP IP (
These are simply Puppet Server installations that have had the CA utility turned off, and you have configured client nodes to look to the master for this information. These nodes will sit behind the load balancer and take catalog requests from Puppet agents as though they were the only Puppet Server and perform standard Puppet Server requests (minus the CA work).
My practice and work in this implementation was done in AWS. You can do the same work in Digital Ocean, Linode, or on physical hardware. The important part is not the size or location of the nodes I’ve used, but the configuration I will enumerate below. As long as the configuration is maintained, results should be relatively consistent from platform to platform.
I performed this installation several times as though the setup did not have DNS resolution. By this, I mean that I did all name resolution in host files. You can easily manage these in Route 53 or you can add "A Records” in your own DNS serving infrastructure. The process I outline here is the former, using the host files.
First, accommodate the names of your systems by laying out what the names will be and the structure of the addresses as needed. In the case of this reference implementation, the
/etc/hosts file is as follows:
For each node I provision, I immediately configure the
/etc/hosts file to contain this information so all nodes can reach each other by name. This is to satisfy the stated requirements of Puppet itself that name resolution/DNS needs to be configured and functioning.
Next, we need to install Puppet Server on the master. This is straightforward as mentioned in the Puppet docs here: https://puppet.com/docs/puppet/latest/puppet_platform.html#task-383
So, on RHEL 7, you would enable the Puppet platform repo
Or on Ubuntu Xenial:
Then you would install the Puppet Server package itself:
Be sure to source the newly installed profile so the following commands can be found in your path. To do so, run the following command before continuing:
At this point, we want to configure Puppet Server before allowing it to start, thus allowing for alternate DNS names when signing certificate requests. This accommodates the name on the load balancer VIP as well as the individual compiler node names when you begin standing up catalog compilers. To do this, we need to edit the file
/etc/puppetlabs/puppetserver/conf.d/ca.conf. In the file’s documentation it enumerates the new line we need:
subject-alt-name is an X.509 extension that allows various values to be associated with a certificate. In this way, we’re leveraging the extension to allow all cert signing by the CA master to allow for and associate all names of the VIP and the compilers to be accepted by the master and to be acceptable to the connecting node.
The final step before starting Puppet Server is to generate a root and intermediate signing CA for Puppet Server, as it will be terminating SSL requests for the architecture. To do this, simply run:
Once you have added the above line and set up the CA, it is time to start the server. On either platform, run the following SystemD control commands:
When Puppet Server starts, it will begin behaving as a CA master, capable of both terminating SSL and compiling catalogs. The Puppet documentation for that file is obliquely referenced here and weighs heavily in this configuration.
At this point, Puppet Server is running and is accepting requests for signing.
Your First Compiler
Next, we need to install a compiler, but we need to make sure that compiler will accept catalog compile requests but not provide CA services at all.
This server needs to know about itself and its own job, where the CA (Puppet) master is, and what names it has and is responsible for. First, install Puppet Server on the compiler (in our example,
compiler1.example.com). As soon as Puppet Server is installed, but before it is started, you need to configure the following to represent the compiler you are configuring:
/etc/puppetlabs/puppet/puppet.conf file and create a
"main" section as follows:
In this way, you’re specifying all names that a particular compiler is authorized to "answer" for, namely its own certname, its own hostname, the load balancer’s certname, and its hostname portion of the certname as well.
Next, tell the compiler that it has specific certs. Above, you’ve already told it where its Puppet master is (
server=puppet.example.com), You’ve also told it what its own names are (
compiler1.example.com,compiler1,compile.example.com,compile) which are its own host names and the host names of the VIP on the load balancer. You also need to tell the Puppet Server on the compiler the values necessary for it to configure Jetty. Edit the
/etc/puppetlabs/puppetserver/conf.d/webserver.conf and add these lines to the end of the top section:
Finally, disable the local CA service on the compiler itself. This is accomplished by editing the file
/etc/puppetlabs/puppetserver/services.d/ca.cfg. There are two lines that need to be commented/uncommented:
The distributed version of the file has the CA enabled:
The inline documentation tells you to comment the second line and to uncomment the fourth line to disable the CA service. Do that here so that the file looks like this:
Once all these components are in place on your catalog compiler, you need to connect your catalog compiler to the master in the usual fashion. First, request that your local certificate be signed:
Then, on the master, sign the certificate request by specifying the machine’s certname:
When you sign the certificate request, the Puppet CA master then receives all alternative names from the compiler, and signs all names the compiler is representing. Namely,
compile. This allows an agent, when connecting to
compile.example.com to interact with the VIP as the catalog compiler, and it will accept any of the names it sees in that communication. When the agent connects to
compile.example.com and gets forwarded to, say,
compiler42.example.com, it doesn’t blink because the signed cert is "acceptable” to the CA infrastructure you’re currently interacting with.
Once you have signed the catalog compiler’s certificate request, then return to the catalog compiler and perform a Puppet run:
Then, turn on the Puppet Server daemon and set it to start at boot:
At this point, the master, the catalog compiler, and the load balancer are all up and running, functioning as designed. The final portion is to connect a Puppet agent to this infrastructure so it works as expected.
Connecting Puppet Agents
On any agent, you would install Puppet as you would normally by first enabling the platform repo just as we did for the Puppet Servers:
On RHEL 7, enable the Puppet platform repo as follows:
Or on Ubuntu Xenial:
Once the platform repos are configured, then install the Puppet agent as follows:
On the Red Hat family of servers:
Finally, before executing your first Puppet run, you need to edit the puppet.conf to tell the node where its resources are. (You may wish to manage
puppet.conf with Puppet in your fleet as the size grows.)
Add the following lines to the puppet.conf:
In your configuration, this will reflect your infrastructure:
After you’ve edited the puppet.conf, bootstrap the SSL as you did above on the compilers:
Sign the certificate request on your master:
Finally, on the new Puppet agent, ensure a Puppet run completes without error:
On the agent node:
If everything has been performed correctly, the Puppet agent machine will request from the Master a certificate signing. You will manually (or via autosign.conf) sign the agent’s certificate request. The agent then begins the catalog upload procedure, but instead of sending that to the master, you’ve specified it should send that to your load balancer VIP instead. The load balancer will forward the catalog to one of the compilers in the pool and a standard Puppet run will complete for the agent.
It is at this point you can follow the "adding a catalog compiler” procedure to scale your compile farm, or just continue to add agents to connect your fleet to the infrastructure.
If you’ve completed everything above, you should now have a scalable infrastructure for open source Puppet. The master is serving CA certificate signing and the load balancer is handing off requests to individual compilers for processing. Agents are configured in such a way as to send those certificate requests to the master and catalogs to the load balancer VIP, allowing for a greater volume of requests.
It should be noted that no tuning is called out in this procedure, but Puppet has a great deal of interesting information that might allow you to increase capacity even more. The basic tuning guide for Puppet Server can be found here:
This will give you guidelines for tuning your server related to the hardware specifications of the server, and assist you in scheduling runs, tuning parameters, and just generally ensuring the Puppet Server is operating optimally for your infrastructure needs.
Jerald Sheets is the owner and CEO of S & S Consulting Group. He is a Puppet Certified Consultant with Norseman Defense Services, performs consulting and training on the Puppet Enterprise and open source platforms, and is a Puppet Certified Professional 2014 & 2018. He can be reached via the following methods:
Puppet Slack: @CVQuesty