Autoscaling Puppet compile masters with AWS

See more posts about: Tips & How To and Tools & Integrations

In classic Puppet deployment architecture, compile masters are widely used when the number of managed nodes goes up. Multiple compile masters sit behind a load balancer to take care of the additional workloads. It is not rare to see Puppet adopters launching the compile masters in the public cloud, such as Amazon Web Service (AWS) and Google Cloud Platform. However, I was sometimes asked a question: can we take advantage of the autoscaling feature provided by Cloud vendors so that compile masters can automatically scale in/out as needed? The answer is a big Yes! This article will go through this topic and provide a solution to the challenges during the journey. This blog will be based on the AWS environment and will focus on resolving compile master–related challenges instead of basic AWS operations.

First, let’s take a look at the workflow during a scale in/out period.

Scale out workflow

  1. Automatically spawn a new EC2 instance when needed
  2. Automatically install Puppet Agent and get registered under Puppet Master
  3. Automatically classify the new node into compile master node group
  4. Automatically trigger a Puppet run on the Master so that the Master knows the existence of the newly added compile master
  5. Start serving!

Scale in workflow

  1. Automatically purge the node during shutdown stage
  2. Automatically clean up the EC2 instance

Additional concern

What if the compile master is shut down/rebooted by the administrator instead of a scale in singal? Given the above Scale In logic, as long as the instance is being shutdown, the node should be automatically purged. Thus, if it is up from a rebooting, the Puppet Master will no longer recognize it anymore. We need to find a way to tackle this issue.

Implementation steps

Now I am going to give an example to go through the whole automation enablement. This is tested from our lab environment. Additional customization may be needed based on your real needs. All the scripts needed can be found here.

Create a AWS LoadBalancer and add its DNS as a dns_alt_name

I put this step first simply because I want to have a DNS name for the LoadBalancer so that we can put it as dns_alt_names for Master and compile masters. I will skip the detailed steps of creating the LoadBalancer. I select Network LoadBalancer as the type and create two listeners forwarding to Puppet8140 and Puppet8142 target groups.

The two corresponding target groups should be configured as TCP port 8140 and 8142, respectively.

Now I have a DNS name of the Network LoadBalancer.

Add LoadBalancer’s DNS into Puppet Master as a DNS alternative name

Fresh PE installation: Please follow Puppet Installation to add the LoadBalancer DNS name as a dnsaltname. For existing running PE environment: Please follow regenerate master certificates to regenerate the master certificate and add LoadBalancer DNS name as a dns_alt_names.

Also, according to our installation guides, configure the necessary settings for the Load Balancer.

Create an AWS EFS and uploading files

An EFS share is needed here as a centralized repository to save all the scripts and files. I will not talk about EFS creation details here. An AWS document about EFS creation can be found here. One requirement for us is that the EFS should be created within the same network as Compile Master. Please pay attention to adjust EFS Network Access and EC2 subnet Security Group to grant the mounting access for the compile masters.

Once created, please follow EFS mount instructions on the EFS page to mount it on any machine and upload all the files from this repository.

Create dummy certificate files

Our way of purging nodes is going to be done by remote API calls, which requires a bundle of authenticated certificate files to be presented. We can create a dummy certificate bundle by using puppetserver ca generate --certname <dummy name> command. In our example, I generate a certificate named aws-lambada.example.com.pem

puppetserver ca generate --certname aws-lambada.example.com

Three files are created and their paths are listed. Here we need to upload the private key and the certificate files to EFS. Also, we need to upload ca.pem (can be found at /etc/puppetlabs/puppet/ssl/certs) to EFS as well.

In the end, we should have at least these files stored in the EFS as the following:

[root@ip-10-0-0-100 efs]# ls -l
total 28
-rw-r--r--. 1 root root 3243 Apr 16 13:47 aws-lambada.example.com.key.pem
-rw-r--r--. 1 root root 2004 Apr 16 13:47 aws-lambada.example.com.pem
-rw-r--r--. 1 root root 3903 Apr 16 13:47 ca.pem
-rwxr-xr-x. 1 root root  941 May  4 03:28 launcher.sh
-rw-r--r--. 1 root root  332 Apr 18 03:29 nodepurge.service
-rwxr-xr-x. 1 root root 1532 May  4 03:46 purgescript.sh
-rwxr-xr-x. 1 root root  508 May  4 03:52 startupscript.sh

Note that 3 files are needing executable permissions so we need to add accordingly.

chmod a+x launcher.sh purgescript.sh startupscript.sh

Whitelisting the dummy certificate

Node purging APIs involve both puppetDB and certificate_authority APIs, so we need to whitelist them accordingly. Puppet has provided articles to talk about the settings here:

client_whitelist from certificate authority

certificate_whiltelist in Puppet DB setting

You can also configure it from Puppet console more concretely:

Whitelisting certificate_authority: This can be done by Login PE Console -> Classification -> All Nodes -> PE Infrastructure -> PE Certificate Authority -> Configuration -> Add the following parameter in Data session:

Class: puppet_enterprise::profile::certificate_authority
Parameter: client_whitelist
Value: [“<Certificate name of the dummy cert>”], in our example, is ["aws-lambada.example.com"]

Whitelisting PuppetDB: This can be done by Login PE Console -> Classification -> All Nodes -> PE Infrastructure -> PE PuppetDB -> Configuration -> Under class puppet_enterprise::profile::puppetdb, add the following parameter settings:

Parameter: whitelisted_certnames
Value: [“<Certificate name of the dummy cert>”], in our example, is ["aws-lambada.example.com"]

Now, we are able to call the APIs using the dummy certificate!

Create a RBAC token

Once a compile master is added/removed, we need to initiate a Puppet run on the MoM so that the change can be captured in time. We will use an Orchestration API call to achieve this goal, so a RBAC token needs to be generated.

Please follow this link to generate a RBAC token with enough permission to use the Orchestrator API. We need to add the following permissions:

* Job orchestrator: Start, stop and view jobs
* Puppet agent: Run Puppet on agent nodes

Once generated, note down the token content.

Configure variable values into scripts

For the three executable scripts launcher.sh purgescript.sh startupscript.sh, I defined some variables at the beginning. Now it is time to configure those values, such as rbactoken for RBAC token we just obtained and cerfile for the dummy certificate file name that we created before. Here are some variable examples:

purgescript.sh

momhostname=<MoM Hostname> 
rbactoken=<RBAC Token> 
cerfile=<certfile name> 
keyfile=<keyfile name>

Set up the values for all these variables using the data we have collected.

Now our preparations on EFS are all set!

Configure Puppet Server autosign

To automatically sign the new certificate requests from onboarding compile masters, we need to configure the autosign. In my example, for simplicity, my compile masters are going to be launched in a specified subnet in a VPC so its host name is predictable, following a IP-related pattern.

Enable autosign: By default, the autosign setting in the master section of the CA’s puppet.conf file is set to $confdir/autosign.conf. The basic autosigning functionality is enabled upon installation. Please refer to the autosign doc if you want to configure other autosign approaches.

Here, we are using basic autosign. We go to the corresponding autosign.conf file and add the autosign pattern. In my example, the autosign.conf is located at:

/etc/puppetlabs/puppet/autosign.conf

With a content like:

*.ap-southeast-1.compute.internal

This is a very basic setting. In production, you may want to use a more concrete filter rule. For example, you should launch all your compile masters at compiler master subnet with a dedicated IP range. Then all the compiler masters, by default, will obtain a default FQDN ..computer.internal. In order to autosign only compile masters, you may want to use a policy based autosigning.

Configure compile masters auto-classified into PE Master group

The new onboarding compile masters should be automatically classified into the PE Master group after their certs are signed. The launcher.sh script from our script repo is using the following script to install Puppet Agent:

/bin/curl -k https://${momhostname}:8140/packages/current/install.bash | sudo bash -s main:dns_alt_names=$awslbdns extension_requests:pp_role=awsloadbalancer

We added a trusted extension called pp_role with a value awsloadbalancer for the onboarding compile masters. Thus, all the rest we need to do is to: Go to Classification -> PE Infrastructure -> PE Master -> Rules -> add the following rule:

Tick: Nodes must match all rules
Fact: trusted.extensions.pp_role
Operator: =
Value: awsloadbalancer

Now all the compile masters will be classified automatically into PE Master group.

Now we are all set, and we can start creating auto-scaling groups!

Create launch configuration and auto-scaling groups for compile masters

Please follow the official guidelines to create an autoscaling group. Some steps need to be updated as follows.

Create launch configuration

Step 3: Configure details page, extend Advanced Details. In User Data field, copy and paste the script from our script repo. Then, set the EFS DNS name for variable efsdnsname from the script.

This customized userdata will tell the new instance to mount our EFS share and run the startup script launcher.sh. It will create a local folder /home/awsnodemanagement and copy over all files from EFS into the folder.

Create an auto-scaling group

We can create an auto-scaling group now from our created Launch Configuration.

Step 1: Extend Advanced Details, we should tick Receive traffic from one or more load balancers and add Target Groups we created from previous chapters: Puppet8140 and Puppet8142

Step 2: We can configure desired auto-scaling conditions and number of compile master instances. An example could be:

Now we are all set!

Actual effects of our implementation

For test purposes, I can manually adjust the auto-scaling threshold to trigger an auto-scaling action. For example, I can change the target value of Average CPU Utilization to 20% or lower then start a Puppet agent run. Once tested autoscaling is happening, we can set up the desired value. The total number of serving compile masters will be varied based on the actual workloads as well as auto-scaling threshold. More information about AWS auto-scaling group can be found here.

Meanwhile, when setting up the target value, for example, the average CPU utilization, you need to consider the max JRuby instances allocated together with the instance type for the compiler masters. For example, if you are using a 4-core vCPU instance, then it is easy for your CPU to hit 80% when it is getting busy. Thus, you can optionally set 70% as the threshold to spawn a new compile master for such cases.

For example, with 3 compile masters, the puppet infra status output:

[root@ip-10-0-0-100 efs]# puppet infra status
2020-05-11 04:52:35.474956 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Notice: Contacting services for status information...
Classifier: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/classifier-api
RBAC: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/rbac-api
Activity Service: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/activity-api
Puppet Server: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8140/
Orchestrator: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8143/orchestrator
PCP Broker: Running on Primary Master, wss://ip-10-0-0-100.ap-southeast-1.compute.internal:8142/pcp
PCP Broker v2: Running on Primary Master, wss://ip-10-0-0-100.ap-southeast-1.compute.internal:8142/pcp2
PuppetDB: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8081/pdb
Puppet Server: Running, https://ip-10-0-100-4.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-5.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-8.ap-southeast-1.compute.internal:8140/
2020-05-11 04:52:41 +0000
11 of 11 services are fully operational.
[root@ip-10-0-0-100 efs]#

When there are five compile masters, puppet infra status output:

...
Classifier: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/classifier-api
RBAC: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/rbac-api
Activity Service: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:4433/activity-api
Puppet Server: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8140/
Orchestrator: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8143/orchestrator
PCP Broker: Running on Primary Master, wss://ip-10-0-0-100.ap-southeast-1.compute.internal:8142/pcp
PCP Broker v2: Running on Primary Master, wss://ip-10-0-0-100.ap-southeast-1.compute.internal:8142/pcp2
PuppetDB: Running on Primary Master, https://ip-10-0-0-100.ap-southeast-1.compute.internal:8081/pdb
Puppet Server: Running, https://ip-10-0-100-10.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-4.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-5.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-6.ap-southeast-1.compute.internal:8140/
Puppet Server: Running, https://ip-10-0-100-8.ap-southeast-1.compute.internal:8140/
2020-05-11 04:36:58 +0000
13 of 13 services are fully operational.
[root@ip-10-0-0-100 ~]#

All of the scale ins/outs don’t require any human intervention and everything is automated! All the action items are automatic, including scale in/out compile masters, sign/purge compile masters, classify compile master to the PE Master group and configure the compile master to serve new requests. This gives us a lot of flexibility and reduces the overhead of managing compile masters manually.

Workflow and scripts analysis

Now it is time to examine how the scripts work with our workflow.

userdata.txt

The content from this file is only used during auto-scaling group creation. It is one-time only. Basically, it only does two things:

  1. Mount EFS share temporarily and copy over files
  2. Trigger an execution of launcher.sh.

After the two steps, we should see the following 8 files to exist on any of the compile master’s /home/awsnodemanagement folder. For example:

[root@ip-10-0-0-100 efs]# ll
total 32
-rw-r--r--. 1 root root 3243 Apr 16 13:47 aws-lambada.example.com.key.pem
-rw-r--r--. 1 root root 2004 Apr 16 13:47 aws-lambada.example.com.pem
-rw-r--r--. 1 root root 3903 Apr 16 13:47 ca.pem
-rwxr-xr-x. 1 root root  890 May  6 08:41 launcher.sh
-rw-r--r--. 1 root root  330 May  6 08:13 nodepurge.service
-rwxr-xr-x. 1 root root 1654 May  6 08:14 purgescript.sh
-rwxr-xr-x. 1 root root  544 May  6 07:43 startupscript.sh
-rw-r--r--. 1 root root  330 May  6 07:38 userdata.txt

Five files are from our script repo and three files are the dummy certificate files.

launcher.sh

This is the core setup script when a new EC2 instance is launched due to scale out. Basically, it does two things:

  1. Install Puppet agent This is to run a standard installation script plus adding a trusted extension pp_role. As mentioned, the pp_role will be used to classify the compile masters. Since we already configured autosign, during this step, the compile master will be installing all the needed packages and start services needed as a compile master.
  2. Enable a service to control startup and shutdown actions Lastly, we enable and start a systemctl service. This service will trigger a script run when the service (instance) is being started/shutdown. This step will also trigger a Puppet agent run on MoM from the startup script so that MoM is well informed with the existence of the new compile master. More details can be found in the next sessions.

nodepurge.service

This is a unit service to make sure that once there is a shutdown or reboot signal on the instance, a script is to be executed. startupscript.sh is the script for startup and purgescript.sh is for shutdown.

Note: The lab environment for this blog writing is based on systemd. The steps are only tested working for systemd enabled OSes. For OSes without systemd, you just need to work out a similar script to enable a service to execute:

  • startupscript.sh when instance is starting up
  • purgescript.sh when instance is being shutdown/rebooted

purgescript.sh

This script basically does three things when the instance is being shut down, terminated, or rebooted.

  1. Purge the instance itself by API calls to MoM as well as Puppet DB.
  2. Trigger a Puppet agent run on MoM to inform such change.
  3. Remove the local ssh keys. This step is for cases when a compile master is rebooted instead of being terminated. As steps 1 and 2 will be always executed, a compile master will be purged as well after rebooting. In order to enable it at the next startup, we can remove the ssl folders here as a preparation step.

startupscript.sh

Similarly, this step is for cases when a compile master is rebooted instead of being terminated. Basically, it does two things:

  1. Trigger a Puppet run. Given that the ssl folders have been removed from purgescript.sh, a fresh Puppet run will create new keys and make sure it is back online as a compile master.
  2. Trigger a Puppet run on MoM to make sure MoM is up-to-date.

Now we are done with everything! I am happy that my Puppet infrastructure is now flexible and intelligent and will automatically adjust itself against the change of actual workload.

Puppet sites use proprietary and third-party cookies. By using our sites, you agree to our cookie policy.