homeblogfirst look installing and using hiera part 1 of 2

First Look: Installing and Using Hiera (part 1 of 2)

In a previous blog post, we introduced use cases for separating configuration data from Puppet code. This post (part one of a two part series) will go in-depth with installing, configuring, and using Hiera, but let's first look at WHY we would need Hiera.

Introduction to the SSH module

One of the benefits of Hiera is its ability to take an existing module and adapt it to a hierarchical-based lookup system. Typically, one of the first modules that people adapt to Puppet code is the SSH module. Let's look at a simple ssh class definition:
class ssh {
  $ssh_packages      = ['openssh','openssh-clients','openssh-server']
  $permit_root_login = 'no'
  $ssh_users         = ['root','jeff','gary','hunter']

  package { $ssh_packages:
    ensure => present,
    before => File['/etc/ssh/sshd_config'],

  file { '/etc/ssh/sshd_config':
    ensure  => present,
    owner   => 'root',
    group   => 'root',
    mode    => '0644',
    # Template uses $permit_root_login and $ssh_users
    content => template('ssh/sshd_config.erb'),

  service { 'sshd':
    ensure     => running,
    enable     => true,
    hasstatus  => true,
    hasrestart => true,
The template used above looks like the following:
Protocol 2
SyslogFacility AUTHPRIV
PasswordAuthentication yes
ChallengeResponseAuthentication no
GSSAPIAuthentication yes
GSSAPICleanupCredentials yes

# PermitRootLogin Setting
PermitRootLogin <%= permit_root_login %>

# Allow individual Users
<% ssh_users.each do |user| -%>
AllowUser <%= user %>
<% end -%>

# Accept locale-related environment variables
X11Forwarding yes
Subsystem	sftp	/usr/libexec/openssh/sftp-server
This module declares three packages (openssh, openssh-clients, openssh-server), ensures a proper sshd_config file, and starts the sshd service. While this works fine for RedHat distributions, there will be a problem with this module if we try and use it on other Linux variants (such as Debian or Ubuntu). Normally, logic is introduced into the module that decides which package names to use based on the operating system of the node. Instead of doing that, let's use Hiera to solve our problem by changing three lines:
$ssh_packages      = hiera('ssh_packages')
$permit_root_login = hiera('permit_root_login')
$ssh_users         = hiera('ssh_users')
Instead of providing a simple array, we're now going to utilize Hiera and do a data lookup for the packages to declare in our module, the users to permit, and the permit_root_login parameter that will be used in the sshd_config file. An array will still be returned by Hiera for the $ssh_packages and $ssh_users variables, but the elements in that array will change depending on the operating system of the node. Before we can do this, though, we need to setup Hiera, its hierarchy, and the data directory that it will use for parameter lookups.

Install Hiera

As of this writing, Hiera is not installed with Puppet or Puppet Enterprise and must be installed using RubyGems—though it will be included in the next version of Puppet. Hiera has two separate gems: hiera and hiera-puppet. The hiera gem contains the hiera library source code, the default YAML backend, and the hiera binary that can be used to execute lookups from the command line. The hiera-puppet gem contains the custom functions necessary to call Hiera from Puppet. To install these libraries, do the following:
gem install hiera hiera-puppet
(Note that if you're running Puppet Enterprise, you will need to use the gem binary that's located in /opt/puppet/bin) The last step that's necessary is to get the custom Hiera functions that Puppet needs to do a parameter lookup loaded into Puppet itself. These functions come bundled with the hiera-puppet gem, but they currently are placed into your system's $GEMPATH and are not loaded by Puppet. To remedy this, let's download a copy of hiera-puppet from source and place it in our Puppet Master's modulepath so it can make the functions available from within Puppet.
    1. Get your Puppet Master's module path by entering puppet master --configprint modulepath
    2. Change to the modulepath directory that was output from the previous step
    3. Enter the following command to download a tarball of the hiera-puppet source code, create a directory called 'hiera-puppet', expand the contents of the tarball to the 'hiera-puppet' directory, and remove the 'hiera-puppet' tarball:
curl -L https://github.com/puppetlabs/hiera-puppet/tarball/master -o 
'hiera-puppet.tar.gz' && mkdir hiera-puppet && tar -xzf hiera-puppet.tar.gz 
-C hiera-puppet --strip-components 1 && rm hiera-puppet.tar.gz
Now the custom Hiera functions are available to be used by the Puppet Master. Let's move on to configuring Hiera.

Configuring Hiera with YAML & hiera.yaml

Hiera is configured through the /etc/puppetlabs/puppet/hiera.yaml configuration file. This file is written in the markup language called YAML which is simple, human-readable, and is widely supported by scripting languages. (You can read more about YAML here.) The hiera.yaml configuration file is what Hiera uses to determine the order of its lookup, and the location of the data directory where the YAML files are located. Lets look at an example hiera.yaml configuration file that we can drop into place for our ssh module and break it down piece by piece:
    - %{operatingsystem}
    - common
    - yaml
    :datadir: '/etc/puppetlabs/puppet/hieradata'
We see that our chosen backend is YAML, and that our data will be stored in /etc/puppetlabs/puppet/hieradata instead of embedding it in our modules. This is looking promising! The last, and also the most important, piece is the hierarchy itself. We've chosen to have two levels: a common level that is common to all hosts, and a higher-priority level that contains any operating-system-specific data. When we query a hiera() function in Puppet, Hiera looks in its hiera.yaml configuration file for backends to query, and for the directory where the backend data is kept. Lets look at how we might add configuration data to Hiera's datadir.

Introduction to the YAML data backend

The YAML data backend is the quickest Hiera backend to begin using, and is included with Hiera. YAML is an extremely readable data serialization format, so it makes sense to utilize it if you don't have a specific need for another format. In the hiera.yaml configuration file above, we created a hierarchy of two levels: %{operatingsystem} and common. Assuming that we are configuring a RedHat system, Hiera will look in the datadir directory for two files in this order: RedHat.yaml and common.yaml. Why? The highest level in the hierarchy queries Facter for the operatingsystem fact (which, in this case, returns 'RedHat'), and then searches for a YAML file of that name. The second level is just the string common, so it looks for a file called 'common.yaml'. Let's take a look at those files:


ssh_packages: - 'openssh'
              - 'openssh-clients'
              - 'openssh-server'


permit_root_login : 'no'
ssh_users         : - root
                    - jeff
                    - gary
                    - hunter
With the hiera.yaml configuration file setup and our Hiera data directory containing YAML files, we can actually begin performing lookups and inspecting the resultant data.

Hiera data lookups

Using our RedHat node and the current Hiera setup, what would be the value of $permit_root_login in this line from our ssh Puppet manifest:
$permit_root_login = hiera('permit_root_login')
The answer is 'no'. How did we get that? Hiera performed a lookup for 'permit_root_login' and searched the highest priority file in the hierarchy - RedHat.yaml (based on the node's 'operatingsystem' fact being the string 'RedHat'). Hiera didn't find the parameter in that file so it moved to the next, and final, level of the hierarchy and searched common.yaml. Because the parameter is defined in common.yaml, it returned the value back to Puppet. What if we wanted all RedHat nodes to set the value of $permit_root_login to be 'without-password'? Using Hiera, we would modify the RedHat.yaml file and add the following line:
permit_root_login : 'without-password'
Because the RedHat.yaml file is queried BEFORE the common.yaml file, RedHat nodes would get this value, while all other nodes would get the value of 'no' from common.yaml. Taking this example one step further, what if we wanted all Debian nodes to have the value of $permit_root_login set to 'yes'? We would need to create a file called Debian.yaml, place it in the Hiera data directory, and enter the following:
permit_root_login : 'yes'
Now, when a Debian node contacted Puppet, Hiera would query the Debian.yaml file BEFORE common.yaml, and the value of $permit_root_login would get the value set in Debian.yaml (which, in this case, would be 'yes'). This logic could be repeated over and over for any parameter and with as many hierarchy levels as you desire.

Beyond Basic Lookups: Concatenating Values With Hiera

By default, Hiera uses a priority lookup—which means that the first time it encounters a parameter in the hierarchy it accepts that value and returns it to Puppet. This is how higher levels in the hierarchy can override values that might be set in lower levels of the hierarchy. What if you wanted to search through ALL levels of the hierarchy and return EVERY value for a specific parameter? Hiera has that ability with the hiera_hash() and hiera_array() functions. There are two variables that currently return arrays: $ssh_packages and $ssh_users. Right now, the variables are being set with a priority lookup—so the ENTIRE contents of the array is being set when Hiera first encounters the 'ssh_users' and 'ssh_packages' parameter in its lookup. What if we wanted this value to always contain the root user, but other users should change depending on what operating system a node was using? The best way to do this would be to use the hiera_array() function that searches ALL hierarchy levels and returns an array containing the value of ssh_users from EVERY hierarchy level in which it encountered the parameter. Let's modify our Hiera YAML files to reflect this change:


permit_root_login : 'no'
ssh_users         : - root


ssh_packages: - 'openssh'
              - 'openssh-clients'
              - 'openssh-server'
ssh_users   : - 'gary'
              - 'jeff'


permit_root_login : 'yes'
ssh_users         : - 'hunter'
Finally, modify the following line in the ssh module:
$ssh_users         = hiera_array('ssh_users')
After making the changes, which users will be added to /etc/ssh/sshd_config file on a RedHat node? The answer is root, gary, and jeff. Why? The root user will ALWAYS be included in /etc/ssh/sshd_config because the common.yaml file that EVERY node evaluates contains the value of 'root' for the ssh_users parameter. Next, because this is a RedHat node, Hiera will concatenate the values of 'gary' and 'jeff' to the array because those are the values for the ssh_users parameter in RedHat.yaml. What if we run this on a Debian node? The answer is root and hunter (because the value of the ssh_users parameter in the Debian.yaml file is 'hunter').

Hiera Best Practices

Hiera is still new to many people, and the concept of a hierarchical lookup system can seem a bit foreign initially. Because of this, there are a couple of best practices that are important to observe when getting started with Hiera and Puppet.

Keep hierarchies to a minimum

This is the time-proven rule of "Just because you can, doesn't mean you should." Hierarchy levels are incredibly dynamic tools that will allow you to do a number of things that were previously difficult, but too many of them can lead to problems when debugging (i.e. "Where was that parameter set, again?"). Three to four hierarchy levels should be enough for most sites; if you have more than that, you might want to re-think your approach.

Version control your Hiera data directory separately from your Puppet repository

The benefit of the :datadir: parameter in hiera.yaml is that you can use Facter fact values to determine the path of your Hiera data directory. For example, a site using two Puppet environments called 'development' and 'production' that has implemented the ssh module we outlined above might have the following directory tree at /etc/puppetlabs/puppet/environments
    |-- development
    |   |-- hieradata
    |   |   |-- Debian.yaml
    |   |   |-- RedHat.yaml
    |   |   `-- common.yaml
    |   |-- manifests
    |   |   `-- site.pp
    |   `-- modules
    |       `-- ssh
    `-- production
        |-- hieradata
        |   |-- Debian.yaml
        |   |-- RedHat.yaml
        |   `-- common.yaml
        |-- manifests
        |   `-- site.pp
        `-- modules
            `-- ssh
This site's hiera.yaml configuration file would look like the following:
    - %{operatingsystem}
    - common
    - yaml
    :datadir: '/etc/puppetlabs/puppet/environments/%{environment}/hieradata'
Hiera automatically substitues the value of the current environment for %{environment} in hiera.yaml and allows for a Hiera data directory that's completely separate from Puppet manifests/modules.

What now?

This post serves as an introduction to using Hiera with Puppet and familiarizes you with the concepts of hierarchical lookup systems, priority lookups, multilevel lookups, and data separation. The concepts in this post will walk you through getting a working Hiera setup, but there is much more that can be done (Hiera as an ENC, custom backends, etc…). The next post in this series will introduce these advanced Hiera concepts and much more. Until then, enjoy experimenting with Hiera! Additional Resources