About Hiera
Puppet’s strength is in reusable code. Code that serves many needs must be configurable: put site-specific information in external configuration data files, rather than in the code itself.
Puppet uses Hiera to do two things:
Store the configuration data in key-value pairs
Look up what data a particular module needs for a given node during catalog compilation
Automatic Parameter Lookup for classes included in the catalog
Explicit lookup calls
Hiera’s hierarchical lookups follow a “defaults, with overrides” pattern, meaning you specify common data one time, and override it in situations where the default won’t work. Hiera uses Puppet’s facts to specify data sources, so you can structure your overrides to suit your infrastructure. While using facts for this purpose is common, data-sources can also be defined without the use of facts.
Puppet 5 comes with support for JSON, YAML, and EYAML files.
Related topics: Automatic Parameter Lookup.
Hiera hierarchies
Hiera looks up data by following a hierarchy — an ordered list of data sources.
hiera.yaml
configuration file. Each level of the hierarchy tells Hiera how to access some kind of data source. A hierarchy is
usually organized like
this:---
version: 5
defaults: # Used for any hierarchy level that omits these keys.
datadir: data # This path is relative to hiera.yaml's directory.
data_hash: yaml_data # Use the built-in YAML backend.
hierarchy:
- name: "Per-node data" # Human-readable name.
path: "nodes/%{trusted.certname}.yaml" # File path, relative to datadir.
# ^^^ IMPORTANT: include the file extension!
- name: "Per-datacenter business group data" # Uses custom facts.
path: "location/%{facts.whereami}/%{facts.group}.yaml"
- name: "Global business group data"
path: "groups/%{facts.group}.yaml"
- name: "Per-datacenter secret data (encrypted)"
lookup_key: eyaml_lookup_key # Uses non-default backend.
path: "secrets/%{facts.whereami}.eyaml"
options:
pkcs7_private_key: /etc/puppetlabs/puppet/eyaml/private_key.pkcs7.pem
pkcs7_public_key: /etc/puppetlabs/puppet/eyaml/public_key.pkcs7.pem
- name: "Per-OS defaults"
path: "os/%{facts.os.family}.yaml"
- name: "Common data"
path: "common.yaml"
In
this example, every level configures the path to a YAML file on disk.Hierarchies interpolate variables
path: "os/%{facts.os.family}.yaml"
The percent-and-braces %{variable}
syntax is a Hiera interpolation
token. It is similar to the Puppet language’s ${expression}
interpolation tokens. Wherever you
use an interpolation token, Hiera determines the variable’s
value and inserts it into the hierarchy.
The facts.os.family
uses the Hiera special key.subkey
notation for accessing elements of hashes and arrays. It is
equivalent to $facts['os']['family']
in the
Puppet language but the 'dot' notation produces an empty
string instead of raising an error if parts of the data is missing. Make sure that an empty
interpolation does not end up matching an unintended path.
You can
only interpolate values into certain parts of the config file. For more info, see the
hiera.yaml
format
reference.
With node-specific variables, each node gets a customized set of paths to data. The hierarchy is always the same.
Hiera searches the hierarchy in order
After Hiera replaces the variables to make a list of concrete data sources, it checks those data sources in the order they were written.
Generally, if a data source doesn’t exist, or doesn’t specify a value for the current key, Hiera skips it and moves on to the next source, until it finds one that exists — then it uses it. Note that this is the default merge strategy, but does not always apply, for example, Hiera can use data from all data sources and merge the result.
Earlier data sources have priority over later ones. In the example above, the node-specific data has the highest priority, and can override data from any other level. Business group data is separated into local and global sources, with the local one overriding the global one. Common data used by all nodes always goes last.
That’s how Hiera’s “defaults, with overrides” approach to data works — you specify common data at lower levels of the hierarchy, and override it at higher levels for groups of nodes with special needs.
Layered hierarchies
Hiera uses layers of data with a hiera.yaml
for each layer.
Each layer can configure its own independent hierarchy. Before a lookup, Hiera combines them into a single super-hierarchy: global → environment → module.
default_hierarchy
- that can be used in a module’s
hiera.yaml.
It only
comes into effect when there is no data for a key in any of the other regular
hierarchies---
version: 5
hierarchy:
- name: "Data exported from our old self-service config tool"
path: "selfserve/%{trusted.certname}.json"
data_hash: json_data
datadir: data
And
the NTP module had the following hierarchy for default
data:---
version: 5
hierarchy:
- name: "OS values"
path: "os/%{facts.os.name}.yaml"
- name: "Common values"
path: "common.yaml"
defaults:
data_hash: yaml_data
datadir: data
Then in a lookup for the ntp::servers
key, thrush.example.com
would use the following combined hierarchy:
-
<CODEDIR>/data/selfserve/thrush.example.com.json
-
<CODEDIR>/environments/production/data/nodes/thrush.example.com.yaml
-
<CODEDIR>/environments/production/data/location/belfast/ops.yaml
-
<CODEDIR>/environments/production/data/groups/ops.yaml
-
<CODEDIR>/environments/production/data/os/Debian.yaml
-
<CODEDIR>/environments/production/data/common.yaml
-
<CODEDIR>/environments/production/modules/ntp/data/os/Ubuntu.yaml
-
<CODEDIR>/environments/production/modules/ntp/data/common.yaml
The combined hierarchy works the same way as a layer hierarchy. Hiera skips empty data sources, and either returns the first found value or merges all found values.
datadir
refers to the directory named ‘data’ next to the hiera.yaml
.Tips for making a good hierarchy
- Make a short hierarchy. Data files are easier to work with.
-
Use the roles and profiles method to manage less data in Hiera. Sorting hundreds of class parameters is easier than sorting thousands.
-
If the built-in facts don’t provide an easy way to represent differences in your infrastructure, make custom facts. For example, create a custom datacenter fact that is based on information particular to your network layout so that each datacenter is uniquely identifiable.
-
Give each environment – production, test, development – its own hierarchy.
Hiera configuration layers
Hiera uses three independent layers of configuration. Each layer has its own hierarchy, and they’re linked into one super-hierarchy before doing a lookup.
The three layers are searched in the following order: global → environment → module. Hiera searches every data source in the global layer’s hierarchy before checking any source in the environment layer.
The global layer
The configuration file for the global layer is located, by
default, in $confdir/hiera.yaml
. You can change the location by changing the
hiera_config
setting in puppet.conf
.
Hiera has one global hierarchy. Because it goes before the environment layer, it’s useful for temporary overrides, for example, when your ops team needs to bypass its normal change processes.
The global layer is the only place where legacy Hiera 3 backends can be used - it’s an important piece of the transition period when
you migrate you backends to support Hiera 5. It
supports the following config formats: hiera.yaml
v5, hiera.yaml
v3 (deprecated).
Other than the above use cases, try to avoid the global layer. Specify all normal data in the environment layer.
The environment layer
The configuration file for the environment layer is located, by default, in <ENVIRONMENT DIR>/hiera.yaml
.
The environment layer is where most of your Hiera data hierarchy definition happens. Every Puppet environment has its own hierarchy configuration, which applies to nodes in that environment. Supported config formats include: v5, v3 (deprecated).
The module layer
The configuration file for a module layer is located, by default,
in a module's <MODULE>/hiera.yaml
.
The module layer
sets default values and merge behavior for a module’s class parameters. It is a
convenient alternative to the params.pp
pattern.
params.pp
, use the
default_hierarchy
, as those bindings are excluded from merges.
When placed in the regular hierarchy in the module’s hierarchy the bindings are
merged when a merge lookup is performed.It comes last in Hiera’s lookup order, so environment data set by a user overrides the default data set by the module’s author.
Lookup key | Relevant module hierarchy |
---|---|
ntp::servers
|
ntp
|
jenkins::port
|
jenkins
|
secure_server
|
(none)
|
Hiera uses the ntp
module’s hierarchy when looking up ntp::servers
, but uses
the jenkins
module’s hierarchy when looking up jenkins::port
. Hiera
never checks the module for a key beginning with jenkins::
.
When
you use the lookup function for keys that don’t have a namespace (for example,
secure_server
),
the module layer is not consulted.
The three-layer system means that each environment has its own hierarchy, and so do modules. You can make hierarchy changes on an environment-by-environment basis. Module data is also customizable.