Infrastructure repair with Bolt

Not to beat on our own drum or anything, but Puppet is a great tool for managing and configuring your entire infrastructure. It allows you to ensure trusted and consistent state across all your nodes and lets you update that configuration in the click of a button to deploy changes to your Puppet codebase.

But what happens when the agent cannot reach the master for these configuration updates?

Puppet 5.5.16 & 6.4.4 and prior had a long-standing bug that prevented the agent from using a proxy when the HTTP_PROXY environment variable was defined. This issue, and a number of other http proxy issues, were recently fixed so the agent now correctly respects those variables/settings. However, in some cases, this means that the Puppet agent may attempt to connect to Puppet Server via the previously ignored proxy.

In many environments, an http proxy is configured to only allow connections from internal hosts to external hosts, and it will reject any attempt to "reflect" off of the proxy from an internal host to another internal host. In these environments, Puppet agents may no longer be able to connect to its Puppet Server after upgrading to 5.5.17, 6.4.4 or 6.8.0+. And since the agent can't get a catalog, you can't use Puppet to remedy the issue.

You can, however, use Bolt to remedy the issue! Bolt is well suited to solve a problem like this because it does not rely on agents getting a catalog from Puppet Server. That means that we can use it for out-of-band infrastructure repair.

Configuring agents to not use a proxy

If you're in such an environment and you've upgraded Puppet then it's likely that you've lost Puppet control over your agents. If your proxy doesn't allow internal connections, then agent runs will fail.

To resolve this issue, the agent should be configured to bypass the proxy and connect directly to the Puppet Server by adding the FQDN of the Puppet Server to the NO_PROXY environment variable. This can also be accomplished using the no_proxy setting in the latest releases, however that setting will be overridden by the HTTP_PROXY environment variable until a fix is released in 6.9.0 (and backported to 5.5.18 & 6.4.5).

This guide will show you how to use Bolt to ensure that both the system environment variable NO_PROXY and the no_proxy puppet setting includes your Puppet Server's FQDN. Note that changes to the global NO_PROXY environment variable will affect all child processes that Puppet executes or services that it starts.

Bolt target group setup

Note: We need to run different commands to check and set environment variables for Windows and Linux nodes. To make it easier to do so, we'll configure target groups for each based on PuppetDB queries. If you already have similar groups configured in your infrastructure, then you can skip this section.

If you don't already have Bolt running, then you can find instructions for that on the installation page. To configure the target groups, we'll use the Bolt PuppetDB plugin. Before we can use the plugin we need to configure Bolt to connect to PuppetDB. For this example I will authenticate with PuppetDB using a PE RBAC token.

I obtain a token with puppet-access login -l 0, obtain an ssl ca cert and save them to a directory called proxy_patch. In my case the ca cert of interest was obtained by copying the cert stored at /etc/puppetlabs/puppet/ssl/certs/ca.pem from my Puppet master host to my laptop.

Now I create a Bolt configuration file bolt.yaml with the following configuration:

---
puppetdb:
  server_urls: ["https://ox6m3vjvwj66xlr.delivery.puppetlabs.net:8081"]
  cacert: ~/proxy_patch/ca.pem
  token: ~/proxy_patch//token

Now that I have Bolt configured to connect to PuppetDB I can write a Bolt inventoryfile to organize connection information for Puppet agent nodes queried from the database. Target information is stored in a bolt inventoryfile. We will use the version 2 format for inventory which has added support for the PuppetDB plugin.

---
version: 2
groups:
  - name: linux-agents
    targets:
      - _plugin: puppetdb
        query: inventory[certname]{facts.os.family != "windows"}
        uri: facts.networking.hostname
    config:
      transport: ssh
      ssh:
        user: root
        private-key: ~/.ssh/id_rsa

In the example inventoryfile we configure the PuppetDB plugin to query for targets that are non-Windows. For each fact set that is returned a target uri is set to facts.networking.hostname. It is important to note that values set under the config section are dynamically generated for each target that matches the query. The static information about how to connect the dynamically generated targets is in the group level config section. So in this case I set the transport to ssh. The ssh transport is configured to use the ssh login root, and to use a private key stored in my .ssh directory. You can find more information about configuring Bolt transports at (https://puppet.com/docs/bolt/latest/bolt_configuration_options.html).

At this point we can verify that we can connect to the agent nodes. We will try running a simple Bolt command to echo the hostname for all the targets generated using the plugin. For example:

$ bolt command run hostname --targets linux-agents
Started on tp5t3a5vq63c0ef...
Started on ox6m3vjvwj66xlr...
Finished on tp5t3a5vq63c0ef:
  STDOUT:
    tp5t3a5vq63c0ef
Finished on ox6m3vjvwj66xlr:
  STDOUT:
    ox6m3vjvwj66xlr
Successful on 2 nodes: tp5t3a5vq63c0ef,ox6m3vjvwj66xlr
Ran on 2 nodes in 0.52 sec

Now that we have connected to our Linux nodes, let's also add configuration for our Windows nodes.

---
version: 2
groups:
  - name: linux-agents
    targets:
      - _plugin: puppetdb
        query: inventory[certname]{facts.os.family != "windows"}
        uri: facts.networking.hostname
    config:
      transport: ssh
      ssh:
        user: root
        private-key: ~/.ssh/id_rsa
  - name: windows-agents
    targets:
      - _plugin: puppetdb
        query: inventory[certname]{facts.os.family = "windows"}
        uri: facts.networking.hostname
    config:
      transport: winrm
      winrm:
        ca-cert: ~/proxy_patch/ca.pem
        user: Administrator
        password:
          _plugin: prompt
          message: Winrm password please

Notice in the static configuration for the Windows agents there is another plugin reference. In this case we use the prompt plugin to get the WinRM password from the Bolt operator. It is important to note that the prompt plugin is at the group level so the user will only be prompted once when the windows-agents targets are requested and that value will be used to authenticate with all targets.

Bolt's puppet_conf task

Bolt ships with some useful modules for managing infrastructure. We can use the puppet_conf module which contains a task for getting and setting Puppet configuration. We can examine the task information with the following command:

$ bolt task show puppet_conf

puppet_conf - Inspect puppet agent configuration settings

USAGE:
bolt task run --nodes <node-name> puppet_conf action=<value> section=<value> setting=<value> value=<value>

PARAMETERS:
- action: Enum[get, set]
    The operation (get, set) to perform on the configuration setting
- section: Optional[String[1]]
    The section of the config file. Defaults to main
- setting: String[1]
    The name of the config entry to set/get
- value: Optional[String[1]]
    The value you are setting. Only required for set

MODULE:
built-in module

Now that we have got two target groups for our Windows nodes and Linux nodes, we can use Bolt to check the settings and environment variables for all the nodes in these groups.

First, let's use a task to check if nodes have the no_proxy setting with the puppet_conf Bolt task. In this example, we're looking at the Linux nodes, but the task will be the same for Windows nodes:

$ bolt task run puppet_conf action=get setting=no_proxy --targets linux-agents
Started on tp5t3a5vq63c0ef...
Started on ox6m3vjvwj66xlr...
Finished on tp5t3a5vq63c0ef:
  {
    "status": "localhost, 127.0.0.1",
    "setting": "no_proxy",
    "section": "main"
  }
Finished on ox6m3vjvwj66xlr:
  {
    "status": "localhost, 127.0.0.1",
    "setting": "no_proxy",
    "section": "main"
  }
Successful on 2 nodes: tp5t3a5vq63c0ef,ox6m3vjvwj66xlr
Ran on 2 nodes in 2.67 sec

We see that the setting does not include our Puppet Server FQDN.

Similarly we can check if the NO_PROXY environment variable is set. Here we are checking our Windows nodes. To do the same on our Linux nodes, we'd use the command echo $NO_PROXY instead:

$ bolt command run 'Write-Host $env:NO_PROXY' -t windows-agents
Winrm password please:
Started on x4yml978aq0ct77...
Finished on x4yml978aq0ct77:
Successful on 1 node: x4yml978aq0ct77
Ran on 1 node in 1.1 sec

We see that nothing is printed and thus the environment variable is unset.

Fixing the issue

Now that we have an idea of what we need to accomplish, it is time to use Bolt's most powerful capability: the plan. We want to set global environment variables on both Windows and Linux nodes as well as configure Puppet settings.

Plans live in modules, so let's create a module called proxy_patch under site-modules and create a file called init.pp.

$ tree
.
├── bolt.yaml
├── ca.pem
├── inventory.yaml
├── Puppetfile
├── site-modules
│   └── proxy_patch
│       └── plans
│           └── init.pp
└── token

3 directories, 6 files

Save the following plan to init.pp:

plan proxy_patch(TargetSpec $nodes, String $no_proxy_fqdn_list){
  # Split targets into windows and linux OS
  $resolved_targets = get_targets($nodes)
  $resolved_targets.apply_prep
  $partition = $resolved_targets.partition |$target| {$target.facts['os']['family'] == 'windows'}
  $windows_targets = $partition[0]
  $nix_targets = $partition[1]

  # Use windows_env module to set global NO_PROXY environment var
  apply($windows_targets) {
    windows_env { 'NO_PROXY':
      ensure    => present,
      mergemode => clobber,
      value     => "${no_proxy_fqdn_list}"
    } ~>
    service { 'puppet':
      ensure => 'running'
    }
  }

  # Use stdlib module to set global NO_PROXY environment var
  apply($nix_targets) {
    file_line { "no_proxy_env_var":
      ensure  => present,
      line    => "NO_PROXY=${no_proxy_fqdn_list}",
      path    => "/etc/environment",
    } ~>
    service { 'puppet':
      ensure => 'running'
    }
  }

  # Add the 'no_proxy' option to puppet conf
  run_task('puppet_conf', $resolved_targets, 'action' => 'set', 'setting' => 'no_proxy', 'value' => $no_proxy_fqdn_list)
}

The plan accepts two parameters $nodes and $no_proxy_fqdn_list. The $nodes parameter represents the targets we wish to run plan against. The FQDN list represents a comma separated list containing the FQDN of your Puppet Server. It is important to note with this implementation the NO_PROXY environment variable will always be replaced with the $no_proxy_fqdn_list argument. You may consider modifying the plan to add some logic to query the contents of NO_PROXY and append the $no_proxy_fqnd_list argument as you see fit.

The first step of the plan partitions the targets based on operating system. We want to use different modules for managing system environment variables based on target OS. We accomplish environment variable management by applying Puppet manifest code. Specifically, we use the windows_env resource to set the NO_PROXY environment variable on Windows targets and the file_line resource from the stdlib module to manage /etc/environment on our Linux targets. In both cases we notify the Puppet service that environment variables have changed.

Once we have set the environment variable we can use a task from the puppet_conf module to configure the no_proxy Puppet setting. Note that the task is cross-platform, so we do not need different invocations based on target OS!

Before we can run the plan, we need to download the modules puppet-windows_env and puppetlabs-stdlib (the puppet_conf module ships with the Bolt system packages). In order to do that save the following to a file called Puppetfile in the same directory as bolt.yaml.

mod 'puppet-windows_env', '3.2.0'
mod 'puppetlabs-stdlib', '6.1.0'

We can use Bolt to install those modules with: bolt puppetfile install.

Now that we have the required modules we can run the plan. We invoke the plan and run it against all targets in our inventory (which is passed to the $nodes plan parameter) and the FQDN of our Puppet Server.

$ bolt plan run proxy_patch no_proxy_fqdn_list='localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140' -t all
Winrm password please:
Starting: plan proxy_patch
Starting: install puppet and gather facts on tp5t3a5vq63c0ef, ox6m3vjvwj66xlr, x4yml978aq0ct77
Finished: install puppet and gather facts with 0 failures in 6.95 sec
Starting: apply catalog on x4yml978aq0ct77
Finished: apply catalog with 0 failures in 8.29 sec
Starting: apply catalog on tp5t3a5vq63c0ef, ox6m3vjvwj66xlr
Finished: apply catalog with 0 failures in 4.69 sec
Starting: task puppet_conf on tp5t3a5vq63c0ef, ox6m3vjvwj66xlr, x4yml978aq0ct77
Finished: task puppet_conf with 0 failures in 5.24 sec
Finished: plan proxy_patch in 25.18 sec
Plan completed successfully with no result

Now let's verify the environment variables were set as expected for both the Windows and Linux based targets.

$ bolt command run 'Write-Host $env:NO_PROXY' -t windows-agents
Winrm password please:
Started on x4yml978aq0ct77...
Finished on x4yml978aq0ct77:
  STDOUT:
    localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140
Successful on 1 node: x4yml978aq0ct77
Ran on 1 node in 0.99 sec
$ bolt command run 'echo $NO_PROXY' -t linux-agents
Started on ox6m3vjvwj66xlr...
Started on tp5t3a5vq63c0ef...
Finished on ox6m3vjvwj66xlr:
  STDOUT:
    localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140
Finished on tp5t3a5vq63c0ef:
  STDOUT:
    localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140
Successful on 2 nodes: tp5t3a5vq63c0ef,ox6m3vjvwj66xlr
Ran on 2 nodes in 0.54 sec

We have confirmed that the environment variables have been updated with a Bolt command! Now let's check the Puppet setting with the puppet_conf task

$ bolt task run puppet_conf action=get setting=no_proxy --targets all
Winrm password please:
Started on tp5t3a5vq63c0ef...
Started on ox6m3vjvwj66xlr...
Started on x4yml978aq0ct77...
Finished on ox6m3vjvwj66xlr:
  {
    "status": "localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140",
    "setting": "no_proxy",
    "section": "main"
  }
Finished on tp5t3a5vq63c0ef:
  {
    "status": "localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140",
    "setting": "no_proxy",
    "section": "main"
  }
Finished on x4yml978aq0ct77:
  {
    "status": "localhost,127.0.0.1,https://q6b6x52w8k8xv1i.delivery.puppetlabs.net:8140",
    "setting": "no_proxy",
    "section": "main"
  }
Successful on 3 nodes: tp5t3a5vq63c0ef,ox6m3vjvwj66xlr,x4yml978aq0ct77
Ran on 3 nodes in 6.8 sec

Now we've also confirmed that we have updated the no_proxy settings and can breathe easy knowing our Windows and Linux Puppet agents will not be cut off from communicating with our Puppet Server by attempting to use a proxy connection instead of connecting directly.

Cas Donoghue is a software engineer at Puppet.

Learn more

Puppet sites use proprietary and third-party cookies. By using our sites, you agree to our cookie policy.