published on 27 January 2012

You've bought Pro Puppet, downloaded a couple of modules from the Puppet Forge (and have written some of your own too), and you're on your way to implementing your Puppet environment when it hits you: something feels bulky with the way you've designed your Puppet code. Your modules may not be portable between environments (development, testing, production) without significant tweaks, each of your node declarations may require a number of variables in order for the code to work, or you're constantly needing to open up your modules to account for changes in your environment.

There's GOT to be an easier way to do this, right?

We hear stories from many customers about problems in their Puppet environments, and many of them can be traced back to the way their configuration data is integrated with their Puppet code. Configuration data is the term we use for the environment-specific data that needs to be plugged in to your Puppet code (i.e. variables, class parameters). Take the following bit of Puppet code for example:

$dnsserver    = '8.8.8.8'
$searchdomain = 'puppetlabs.vm'

file { '/etc/resolv.conf':
  ensure  => present
  content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
}

The configuration data in this example would be the hard-coded variables $dnsserver and $searchdomain and the Puppet code would be the file resource block declaring /etc/resolv.conf. This example is intentionally kept simple in order to highlight the methods by which you will separate your configuration data from your Puppet code, but imagine code that needs to set different variables in different environments (MySQL servers, databases, usernames, and passwords, for example) and you can see how the above example can quickly become unwieldy. How else can this be done?

Legacy Method - Node Inheritance

The first method that people usually tried was node inheritance. By defining variables in separate node definition blocks, and inheriting from a nested list of definitions, you could SIMULATE data separation with this method. This was the go-to method before Puppet 2.6 was released, and as such we consider it to be a legacy solution that we don't recommend using with versions of Puppet newer than 0.25 (note that if you're still using node inheritance, please read this advisory on dynamic scoping).

node common {
  $dnsserver    = '8.8.8.8'
  $searchdomain = 'puppetlabs.vm'
}

node production inherits common {
  $dnsserver = '10.13.1.3'
}

node 'agent.puppetlabs.vm' inherits production {
  file { '/etc/resolv.conf':
    content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
  }
}

PROS

  • It was the easiest method to employ.
  • Your data was in one location and, technically, separate from your modules.

CONS

  • There was no easy way to find the value of a variable for a specific node.
  • FINDING the value of a variable required "human parsing," or reading through each and every node declaration to trace variable values.
  • The data still resided in your Puppet code repository.
  • There are better ways to implement this strategy, and this should be considered a legacy solution provided solely for information purposes.

Parameterized Classes

Puppet version 2.6 gave us the ability to pass parameters with class declarations. This allows you to completely remove configuration data from your classes and provide 'sane' default values should a class declaration not pass a parameter. While this is an entry-level step in beginning to separate your configuration data from your Puppet code (the data is now in its own class—in this case dns::params), the configuration data is STILL in your Puppet code repository (and thus isn't a full separation). See below for an example:

class dns::params {
  $dnsserver    = '8.8.8.8'
  $searchdomain = 'puppetlabs.vm'
}

class dns(
  $dnsserver    = $dns::params::dnsserver,
  $searchdomain = $dns::params::searchdomain
) inherits dns::params {

  file { '/etc/resolv.conf':
    content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
  }
}

PROS

  • Class parameters can be defaulted back to a 'sane' value as outlined in our Smart Parameter Defaults document.
  • Modules that utilize this methodology are more portable—parameters need only be changed in a single 'params' class.

CONS

  • All logic must be embedded in each module's 'params' class.
  • If you use this methodology to keep your configuration data separate, every module must have a 'params' class and any logic you introduce (picking different values based on operating system, for example) must be repeated in every module.
  • The data isn't truly separate from your Puppet code as it still resides INSIDE the module (and, technically, your Puppet code repository).

External Node Classifier

Many large sites decide to use an External Node Classifier script to solve the problem of looking up configuration data. External Node Classifiers (also known as ENCs) allow you to provide class declarations, parameters, and variables to Puppet in the form of YAML. The previous example would look like this in YAML:

classes:
  - dns
parameters:
  searchdomain :  ‘puppetlabs.vm’
  dnsserver    :  ‘8.8.8.8’

PROS

  • Flexible - you design how the information lookup is done (query a database, parse a hostname or other Facter fact, etc).
  • Can be written in any language: shell, perl, ruby, python, etc…
  • Plugs into your existing CMDB (Configuration Management Database) to retrieve information that already exists in another source of truth

CONS

  • You are responsible for writing and maintaining the External Node Classifier Script
  • If the script breaks, your Puppet runs are endangered

Extlookup

Extlookup was introduced in Puppet version 2.6.0 as a hierarchical way to lookup values of parameters or variables based on a Facter fact value. To use Extlookup, you would first define a data directory that Extlookup would search based on a specific fact value (location, environment, operatingsystem, etc), and then you would specify a lookup precedence (look for a parameter/variable in a file named after the node's certname FIRST, and then search in a file named after the node's environment SECOND, and so on). Finally, you would assign a parameter/variable's value by invoking Extlookup with the built-in (as of Puppet version 2.6.0) 'extlookup()' function.

$extlookup_datadir    = "/etc/puppetlabs/puppet/data"
$extlookup_precedence = [$environment, 'common']

node 'agent.puppetlabs.vm' {
  include dns
}

class dns {
  $dnsserver    = extlookup('dnsserver')
  $searchdomain = extlookup('searchdomain')

  file { '/etc/resolv.conf':
    content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
  }
}

Sample common.csv file used with Extlookup

dnsserver, '8.8.8.8'
searchdomain, 'puppetlabs.vm'

PROS

  • Extlookup supports a dynamic and hierarchical lookup based on a node's Facter fact values.
  • There could be a single node declaration that would use Extlookup to look up the value of every variable/parameter used in Puppet.
  • The extlookup() function is built into Puppet as of version 2.6.0.

CONS

  • You must use comma-separated value files (CSV) ONLY for your lookups (i.e. variable, value), so structured data (like arrays and hashes) is not supported.
  • Data lookups only return the first-matched value.
  • It doesn't have the ability to concatenate a list of matches returned throughout the full hierarchy.

Introducing: Hiera

Hiera, short for "hierarchy" and written by R.I. Pienaar, is a pluggable, hierarchical database that can query YAML and JSON files (and any other data serialization for which you write a custom backend), as well as Puppet manifests, for configuration data. Hiera builds upon the model that Extlookup created and also adds support for structured data. With Hiera, you can dynamically lookup parameters based on a node's Facter facts. Let's look configuring Hiera for use with the previous example:

The hiera.yaml configuration file:

---
:backends: - yaml
:hierarchy: - %{environment}
            - common
:yaml:
    :datadir: /etc/puppetlabs/puppet/hieradata

The common.yaml file that Hiera uses for parameter lookup:

---
dnsserver    : '8.8.8.8'
searchdomain : 'puppetlabs.vm'

Puppet code using Hiera:

class dns {
  $dnsserver    = hiera('dnsserver')
  $searchdomain = hiera('searchdomain')

  file { '/etc/resolv.conf':
    content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
  }
}

PROS

  • Data is truly separated from your Puppet code—it exists in an entirely separate directory structure.
  • Parameter lookup is hierarchical and dynamic based on Facter facts that describe your node.
  • Hiera supports structured data—like arrays and hashes—that can be fed back to Puppet.
  • Using Hiera, your Puppet modules contain zero proprietary data (which makes the module much more portable).
  • Hiera will be integrated with the next version of Puppet (codenamed Telly).

CONS

  • As of this writing Hiera is not YET built into Puppet , so utilizing it requires an initial installation step.

Conclusion

While there are a myriad of options to solve the problem of configuration data and Puppet code separation, we recommend using Hiera for its ability to adapt to every situation. This post only gives a brief glimpse of its awesome functionality. Stay tuned for a post dedicated to Hiera, where we will be looking in-depth at its usage, flexibility, and advanced features that can simplify the management of your environment whether you're a sysadmin of 10, 100, or 10,000 nodes!

Additional Resources

Share via:
Posted in:

Good analyse, finally Hiera is the winner. It has been supported after PE 3.x version, and can be shorter.

class dns {
$dnsserver
$searchdomain

file { '/etc/resolv.conf':
content => "search ${searchdomain}\n nameserver ${dnsserver}\n",
}
}

Add new comment

The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.