homeblogintroducing puppet data service

Introducing Puppet Data Service

What is this blog post about?

I work on the Solutions Architects team here at Puppet. We are sometimes the first line of defense in solving some of our customers' most challenging or unique problems, and every so often, we see trends in problems that we'd previously considered to be edge cases. For example, customers often ask us two questions:

  1. How can we connect external configuration data to Puppet?

  2. How can we speed up changing their own configuration for service owners without onboarding them to Git approval processes?

These questions (and others) have inspired the Solutions Architects team to develop Puppet Data Service. Puppet Data Service (PDS) is an add-on for Puppet Enterprise that provides a REST API and database storage for configuration data. This article explains why PDS exists and how it can help your daily Puppet automation practices.

Management summary

PDS provides:

  1. A database for trusted node data

  2. A database-backed Hiera backend

  3. A REST API for managing the above

By providing these features, PDS enables new workflows for managing configuration without using Git processes, such as self-service workflows for infrastructure owners, DevOps teams or end users of infrastructure.

Please note: This software is not supported by Puppet and does not qualify for Puppet Support plans. It's provided without guarantee or warranty. Its status is experimental.

Puppet configuration 101

(Note: Skip this section if you are an experienced Puppet Enterprise (PE) admin and know the secrets of facts, trusted facts, and Hiera.)

As a seasoned practitioner, you know that one of the secrets of a successful configuration management strategy is keeping a Single Source of Truth. (This principle is sometimes also called “DRY” or “Don't Repeat Yourself.”) The idea is that each piece of your configuration should be stored in one and only one predictable location. This is important because when you need to change that configuration, you want to be able to modify it quickly and efficiently, and update all dependent configurations automatically accordingly. It is also important to realize that Puppet code is usually not the right place to store (hardcoded) configurations; you want to be flexible and make your code reusable across different teams, segments, and platforms within your infrastructure.

Puppet Enterprise has many options for storing, maintaining, and retrieving configuration data to be used in code. In this article, we will focus on two most frequently used configuration points: trusted facts and Hiera.

Facts and trusted facts

Puppet comes with many built-in facts which are available on any Puppet node. Puppet teams can also write their own custom facts.

A disadvantage of using facts is that they are supplied by the node at the start of a Puppet run, meaning that a user with admin access to a node can potentially tamper with their value on that node, possibly impersonating another node type and thus changing its configuration.

To prevent this from happening, Puppet offers trusted facts. Those facts are built into the node certificate before signing and cannot be changed later without invalidating the certificate's signature. Because these facts are tamper-resistant, updating them is a heavy process, requiring creating and signing a new certificate for the node.

Hiera data

So our node sends all its facts to the Puppet server and asks it to provide configuration. The Puppet server compiles a catalog using modules and classes available to it which are assigned to the node using the Node Classifier (usually, the PE Console). Classes reside within Puppet modules written in the Puppet language.

Puppet modules (such as puppetlabs-mysql for managing mysql server, or puppetlabs-ntp for managing the ntp service) are designed to be reusable. This means that they are generic and do not contain configuration information specific to anyone's environment.

Hiera is a hierarchical database for configuration data looked up by Puppet during catalog compilation. This way, you achieve a single source of truth again; all "logic" for your configuration is in Puppet code, and all "data" for your configuration is inside your Hiera database. This allows the general purpose Forge modules to be customized to your specific needs without changing their code.

By default, the Hiera database consists of text files in the YAML format and stored on the Puppet server. You determine how the files are organized and in which order they need to be looked up.

Usually, Hiera data is part of a Git repository we call the control repo. To change Hiera data, you need to follow the process for submitting a Git pull (or merge) request and get an approval, get the change tested, and finally deployed. This is well-suited to the Puppet platform team, but less so for end users, such as DevOps teams or service owners who just want to quickly change a configuration item inside their application.

PDS to the rescue

Puppet Data Service, or PDS for short, is a Puppet Enterprise extension that provides you with an additional way to store and access configuration data for Puppet.

PDS solves the problems described above by offering the following features for Puppet teams and users:

  1. It provides an alternative source (database) for trusted node data, appearing to Puppet as trusted facts that the local node admin cannot tamper with. Site administrators can update this trusted data without requiring certificate changes on every update!

  2. It provides an additional Hiera backend allowing arbitrary data to be stored in the PDS database and benefit from the simplified management workflows.

  3. It allows management of this data with either an easy-to-use CLI tool or a simple and robust REST API.

  4. Since you can add any data you wish, storing class parameters in PDS provides the ability for service owners to provide classification to be merged in with centralized classification from the site administrators.

Now service owners or DevOps teams can start managing their own configuration data without going through the full Git pull request and approval process. You can also combine the PDS Hiera backend with the original flat-file YAML backend, giving the PDS backend priority but keeping the YAML backend as a fallback.

To that end, PDS provides:

diagram

In the diagram above, you see a possible architecture including PDS:

  • A self-service portal (ServiceNow in this example) calling the PDS REST API to manage node and Hiera data.
  • PDS server (the API service, usually installed on the PE Primary and compilers).
  • Modular backend interface provides compatibility with various databases. At the moment of writing, PDS supports PE-PostgreSQL.

PDS is compatible and tested with PE 2019 and PE 2021.4 or newer versions.

Installing PDS

The PDS code lives in the GitHub repository puppetlabs/puppet-data-service. Its README documents the installation and configuration process.

The Puppet module which automates the installation is puppetlabs-puppet_data_service.

A short demonstration of installation and use of PDS can be found on YouTube.

A simple "happy path" installation flow for the PE Standard architecture (without Compilers) goes like this:

  1. Install the puppetlabs-puppet_data_service module

  2. Classify your primary server as follows:

console config

You can get the URL to the PDS package for your OS and PE version from https://github.com/puppetlabs/puppet-data-service/releases. Do not specify the token parameter here.

  1. Add the PDS token in the Configuration data tab:
PDS image

You’ll need to specify a random string that will be used as your own token value when authenticating CLI commands or API calls. By configuring the token in the Configuration data tab, the value will be automatically converted to Sensitive and not appear in any logs.

  1. Run the Puppet agent on the primary server.

Now that you have the PDS configured, you can add it to your Hiera configuration so that the Puppet server can use PDS data when compiling your code.

Example use case for PDS

In this chapter, we will look at a simple use case which shows how PDS can be useful.

Let's suppose that our organization requires some Linux servers to have a specific MOTD (Message of the Day) configured. The OS platform team doesn't have detailed information and wishes to delegate this configuration to the service owners.

Modifying Puppet setup

Let's assume that the platform team has already installed and configured PDS so it's available to everyone in the organization. They have also added the following Hiera configuration to the central hiera.yaml file, thereby configuring PDS as an additional Hiera backend:


By placing the PDS-backed hierarchy before the default YAML ones, they made sure that Hiera lookups first look inside PDS before consulting the YAML backend. This way it’s now possible to override Hiera YAML values using PDS.

Using PDS node data

Furthermore, in site.pp they have added the following line:


We will see below why and how this is useful.

Finally, the module puppetlabs/motd is installed on the Puppet primary server and the class motd is available for classification.

The service owners need a simple way to configure MOTD on the servers they own.

First, they need to make sure that the servers that need a MOTD file are classified with the class motd. To that end, they need to add this class to the node's data in PDS.


To verify what is specified for that node inside PDS, do:


We see that the motd class is now specified in the "classes" array.

Let’s see what this means for Puppet’s facts. Run:


Then, view the facts of the node in the PE Console. You will see something like this:


We can see the PDS-managed external facts for the node.

Since the first thing we did in this section was to add a line to site.pp including the classes defined in trusted.external.pds.classes, the class motd will now be included. If we now run Puppet on the node mynode.myorg.com, and then view the file /etc/motd, we will see something like this:


This is the default MOTD configured by the puppetlabs/motd module.

Now, to set a specific MOTD message, we can use PDS Hiera data.

Using PDS Hiera data

Execute the command:


You can view the Hiera data value inside PDS:


Or:


You can also use the puppet lookup command to verify that the Hiera value can be successfully looked up:


After you run Puppet on the node, the MOTD will be updated with the content you just configured in Hiera.

Conclusion

We have discussed the challenges PDS is aiming to solve. Then we have refreshed our memory of Puppet facts and Hiera data and looked at how PDS can interact with those. Finally, we have presented a sample use case where the message of the day (MOTD) could be automated using Puppet and PDS without using any Hiera or control repo Git operations.

Learn more