Autoscaling Puppet compile masters with AWS
In classic Puppet deployment architecture, compile masters are widely used when the number of managed nodes goes up. Multiple compile masters sit behind a load balancer to take care of the additional workloads. It is not rare to see Puppet adopters launching the compile masters in the public cloud, such as Amazon Web Service (AWS) and Google Cloud Platform. However, I was sometimes asked a question: can we take advantage of the autoscaling feature provided by Cloud vendors so that compile masters can automatically scale in/out as needed? The answer is a big Yes! This article will go through this topic and provide a solution to the challenges during the journey. This blog will be based on the AWS environment and will focus on resolving compile master–related challenges instead of basic AWS operations.
First, let’s take a look at the workflow during a scale in/out period.
Scale out workflow
- Automatically spawn a new EC2 instance when needed
- Automatically install Puppet Agent and get registered under Puppet Master
- Automatically classify the new node into compile master node group
- Automatically trigger a Puppet run on the Master so that the Master knows the existence of the newly added compile master
- Start serving!
Scale in workflow
- Automatically purge the node during shutdown stage
- Automatically clean up the EC2 instance
What if the compile master is shut down/rebooted by the administrator instead of a scale in singal? Given the above
Scale In logic, as long as the instance is being shutdown, the node should be automatically purged. Thus, if it is up from a rebooting, the Puppet Master will no longer recognize it anymore. We need to find a way to tackle this issue.
Now I am going to give an example to go through the whole automation enablement. This is tested from our lab environment. Additional customization may be needed based on your real needs. All the scripts needed can be found here.
Create a AWS LoadBalancer and add its DNS as a dns_alt_name
I put this step first simply because I want to have a DNS name for the LoadBalancer so that we can put it as
dns_alt_names for Master and compile masters. I will skip the detailed steps of creating the LoadBalancer. I select Network LoadBalancer as the type and create two listeners forwarding to Puppet8140 and Puppet8142 target groups.
The two corresponding target groups should be configured as TCP port 8140 and 8142, respectively.
Now I have a DNS name of the Network LoadBalancer.
Add LoadBalancer’s DNS into Puppet Master as a DNS alternative name
Fresh PE installation: Please follow Puppet Installation to add the LoadBalancer DNS name as a
For existing running PE environment: Please follow regenerate master certificates to regenerate the master certificate and add LoadBalancer DNS name as a
Also, according to our installation guides, configure the necessary settings for the Load Balancer.
Create an AWS EFS and uploading files
An EFS share is needed here as a centralized repository to save all the scripts and files. I will not talk about EFS creation details here. An AWS document about EFS creation can be found here. One requirement for us is that the EFS should be created within the same network as Compile Master. Please pay attention to adjust EFS
Network Access and EC2 subnet
Security Group to grant the mounting access for the compile masters.
Once created, please follow EFS mount instructions on the EFS page to mount it on any machine and upload all the files from this repository.
Create dummy certificate files
Our way of purging nodes is going to be done by remote API calls, which requires a bundle of authenticated certificate files to be presented. We can create a dummy certificate bundle by using
puppetserver ca generate --certname <dummy name> command. In our example, I generate a certificate named
Three files are created and their paths are listed. Here we need to upload the
private key and the
certificate files to EFS. Also, we need to upload
ca.pem (can be found at
/etc/puppetlabs/puppet/ssl/certs) to EFS as well.
In the end, we should have at least these files stored in the EFS as the following:
Note that 3 files are needing executable permissions so we need to add accordingly.
Whitelisting the dummy certificate
Node purging APIs involve both
certificate_authority APIs, so we need to whitelist them accordingly. Puppet has provided articles to talk about the settings here:
You can also configure it from Puppet console more concretely:
Whitelisting certificate_authority: This can be done by Login PE Console -> Classification -> All Nodes -> PE Infrastructure -> PE Certificate Authority -> Configuration -> Add the following parameter in Data session:
Whitelisting PuppetDB: This can be done by Login PE Console -> Classification -> All Nodes -> PE Infrastructure -> PE PuppetDB -> Configuration -> Under class
puppet_enterprise::profile::puppetdb, add the following parameter settings:
Now, we are able to call the APIs using the dummy certificate!
Create a RBAC token
Once a compile master is added/removed, we need to initiate a Puppet run on the MoM so that the change can be captured in time. We will use an Orchestration API call to achieve this goal, so a RBAC token needs to be generated.
Once generated, note down the token content.
Configure variable values into scripts
For the three executable scripts
launcher.sh purgescript.sh startupscript.sh, I defined some variables at the beginning. Now it is time to configure those values, such as
rbactoken for RBAC token we just obtained and
cerfile for the dummy certificate file name that we created before. Here are some variable examples:
Set up the values for all these variables using the data we have collected.
Now our preparations on EFS are all set!
Configure Puppet Server autosign
To automatically sign the new certificate requests from onboarding compile masters, we need to configure the
autosign. In my example, for simplicity, my compile masters are going to be launched in a specified subnet in a VPC so its host name is predictable, following a IP-related pattern.
Enable autosign: By default, the autosign setting in the master section of the CA’s puppet.conf file is set to
$confdir/autosign.conf. The basic autosigning functionality is enabled upon installation. Please refer to the autosign doc if you want to configure other autosign approaches.
Here, we are using basic autosign. We go to the corresponding
autosign.conf file and add the autosign pattern. In my example, the
autosign.conf is located at:
With a content like:
This is a very basic setting. In production, you may want to use a more concrete filter rule. For example, you should launch all your compile masters at
compiler master subnet with a dedicated IP range. Then all the compiler masters, by default, will obtain a default FQDN
Configure compile masters auto-classified into
PE Master group
The new onboarding compile masters should be automatically classified into the
PE Master group after their certs are signed. The
launcher.sh script from our script repo is using the following script to install
We added a trusted extension called
pp_role with a value
awsloadbalancer for the onboarding compile masters. Thus, all the rest we need to do is to: Go to Classification -> PE Infrastructure -> PE Master -> Rules -> add the following rule:
Now all the compile masters will be classified automatically into PE Master group.
Now we are all set, and we can start creating auto-scaling groups!
Create launch configuration and auto-scaling groups for compile masters
Please follow the official guidelines to create an autoscaling group. Some steps need to be updated as follows.
Create launch configuration
Step 3: Configure details page, extend
Advanced Details. In
User Data field, copy and paste the script from our script repo. Then, set the EFS DNS name for variable
efsdnsname from the script.
This customized userdata will tell the new instance to mount our EFS share and run the startup script
launcher.sh. It will create a local folder
/home/awsnodemanagement and copy over all files from EFS into the folder.
Create an auto-scaling group
We can create an auto-scaling group now from our created Launch Configuration.
Step 1: Extend
Advanced Details, we should tick
Receive traffic from one or more load balancers and add
Target Groups we created from previous chapters:
Step 2: We can configure desired auto-scaling conditions and number of compile master instances. An example could be:
Now we are all set!
Actual effects of our implementation
For test purposes, I can manually adjust the auto-scaling threshold to trigger an auto-scaling action. For example, I can change the target value of Average CPU Utilization to 20% or lower then start a Puppet agent run. Once tested autoscaling is happening, we can set up the desired value. The total number of serving compile masters will be varied based on the actual workloads as well as auto-scaling threshold. More information about AWS auto-scaling group can be found here.
Meanwhile, when setting up the
target value, for example, the average CPU utilization, you need to consider the max JRuby instances allocated together with the instance type for the compiler masters. For example, if you are using a 4-core vCPU instance, then it is easy for your CPU to hit 80% when it is getting busy. Thus, you can optionally set 70% as the threshold to spawn a new compile master for such cases.
For example, with 3 compile masters, the
puppet infra status output:
When there are five compile masters,
puppet infra status output:
All of the scale ins/outs don’t require any human intervention and everything is automated! All the action items are automatic, including scale in/out compile masters, sign/purge compile masters, classify compile master to the
PE Master group and configure the compile master to serve new requests. This gives us a lot of flexibility and reduces the overhead of managing compile masters manually.
Workflow and scripts analysis
Now it is time to examine how the scripts work with our workflow.
The content from this file is only used during auto-scaling group creation. It is one-time only. Basically, it only does two things:
- Mount EFS share temporarily and copy over files
- Trigger an execution of
After the two steps, we should see the following 8 files to exist on any of the compile master’s
/home/awsnodemanagement folder. For example:
Five files are from our script repo and three files are the dummy certificate files.
This is the core setup script when a new EC2 instance is launched due to scale out. Basically, it does two things:
- Install Puppet agent
This is to run a standard installation script plus adding a trusted extension
pp_role. As mentioned, the
pp_rolewill be used to classify the compile masters. Since we already configured
autosign, during this step, the compile master will be installing all the needed packages and start services needed as a compile master.
- Enable a service to control startup and shutdown actions
Lastly, we enable and start a
systemctlservice. This service will trigger a script run when the service (instance) is being started/shutdown. This step will also trigger a Puppet agent run on MoM from the startup script so that MoM is well informed with the existence of the new compile master. More details can be found in the next sessions.
This is a unit service to make sure that once there is a
reboot signal on the instance, a script is to be executed.
startupscript.sh is the script for startup and
purgescript.sh is for shutdown.
Note: The lab environment for this blog writing is based on
systemd. The steps are only tested working for
systemd enabled OSes. For OSes without
systemd, you just need to work out a similar script to enable a service to execute:
startupscript.shwhen instance is starting up
purgescript.shwhen instance is being shutdown/rebooted
This script basically does three things when the instance is being shut down, terminated, or rebooted.
- Purge the instance itself by API calls to MoM as well as Puppet DB.
- Trigger a Puppet agent run on MoM to inform such change.
- Remove the local ssh keys. This step is for cases when a compile master is rebooted instead of being terminated. As steps 1 and 2 will be always executed, a compile master will be purged as well after rebooting. In order to enable it at the next startup, we can remove the ssl folders here as a preparation step.
Similarly, this step is for cases when a compile master is rebooted instead of being terminated. Basically, it does two things:
- Trigger a Puppet run. Given that the
sslfolders have been removed from
purgescript.sh, a fresh Puppet run will create new keys and make sure it is back online as a compile master.
- Trigger a Puppet run on MoM to make sure MoM is up-to-date.
Now we are done with everything! I am happy that my Puppet infrastructure is now flexible and intelligent and will automatically adjust itself against the change of actual workload.