About Hiera
Puppet’s strength is in reusable code. Code that serves many needs must be configurable: put site-specific information in external configuration data files, rather than in the code itself.
Puppet uses Hiera to do two things:
Store the configuration data in key-value pairs
Look up what data a particular module needs for a given node during catalog compilation
Automatic Parameter Lookup for classes included in the catalog
Explicit lookup calls
Hiera’s hierarchical lookups follow a “defaults, with overrides” pattern, meaning you specify common data once, and override it in situations where the default won’t work. Hiera uses Puppet’s facts to specify data sources, so you can structure your overrides to suit your infrastructure. While using facts for this purpose is common, data-sources may well be defined without the use of facts.
Puppet 5 comes with support for JSON, YAML, and EYAML files.
Related topics: Automatic Parameter Lookup.
Hiera hierarchies
Hiera looks up data by following a hierarchy — an ordered list of data sources.
Hierarchies are configured in a hiera.yaml
configuration file. Each level of the hierarchy tells Hiera how to access some kind of data source.
Hierarchies interpolate variables
path: "os/%{facts.os.family}.yaml"
The percent-and-braces %{variable}
syntax is a Hiera
interpolation token. It is similar to the Puppet language’s ${expression}
interpolation tokens. Wherever you use an interpolation token, Hiera determines the variable’s value and
inserts it into the hierarchy.
The facts.os.family
uses the Hiera special key.subkey
notation for accessing elements of hashes and arrays. It is
equivalent to $facts['os']['family']
in the
Puppet language but the 'dot' notation
will produce an empty string instead of raising an error if parts of the data is missing.
Make sure that an empty interpolation does not end up matching an unintended path.
You can only interpolate values into certain parts of the config file.
For more info, see the hiera.yaml
format reference.
With node-specific variables, each node gets a customized set of paths to data. The hierarchy is always the same.
Hiera searches the hierarchy in order
Once Hiera replaces the variables to make a list of concrete data sources, it checks those data sources in the order they were written.
Generally, if a data source doesn’t exist, or doesn’t specify a value for the current key, Hiera skips it and moves on to the next source, until it finds one that exists — then it uses it. Note that this is the default merge strategy, but does not always apply, for example, Hiera can use data from all data sources and merge the result.
Earlier data sources have priority over later ones. In the example above, the node-specific data has the highest priority, and can override data from any other level. Business group data is separated into local and global sources, with the local one overriding the global one. Common data used by all nodes always goes last.
That’s how Hiera’s “defaults, with overrides” approach to data works — you specify common data at lower levels of the hierarchy, and override it at higher levels for groups of nodes with special needs.
Layered hierarchies
Hiera uses layers of
data with a hiera.yaml
for each
layer.
Each layer can configure its own independent hierarchy. Before a lookup, Hiera combines them into a single super-hierarchy: global → environment → module.
default_hierarchy
- that can be used in a module’s
hiera.yaml.
It only
comes into effect when there is no data for a key in any of the other regular
hierarchiesversion: 5
hierarchy:
- name: "Data exported from our old self-service config tool"
path: "selfserve/%{trusted.certname}.json"
data_hash: json_data
datadir: data
And
the NTP module had the following hierarchy for default
data:version: 5
hierarchy:
- name: "OS values"
path: "os/%{facts.os.name}.yaml"
- name: "Common values"
path: "common.yaml"
defaults:
data_hash: yaml_data
datadir: data
Then in a lookup for the ntp::servers
key, thrush.example.com
would use the following combined hierarchy:
<CODEDIR>/data/selfserve/thrush.example.com.json
<CODEDIR>/environments/production/data/nodes/thrush.example.com.yaml
<CODEDIR>/environments/production/data/location/belfast/ops.yaml
<CODEDIR>/environments/production/data/groups/ops.yaml
<CODEDIR>/environments/production/data/os/Debian.yaml
<CODEDIR>/environments/production/data/common.yaml
<CODEDIR>/environments/production/modules/ntp/data/os/Ubuntu.yaml
<CODEDIR>/environments/production/modules/ntp/data/common.yaml
The combined hierarchy works the same way as a layer hierarchy. Hiera skips empty data sources, and either returns the first found value or merges all found values.
datadir
refers to the directory named ‘data’ next to the hiera.yaml
.Tips for making a good hierarchy
- Make a short hierarchy. Data files will be easier to work with.
Use the roles and profiles method to manage less data in Hiera. Sorting hundreds of class parameters is easier than sorting thousands.
If the built-in facts don’t provide an easy way to represent differences in your infrastructure, make custom facts. For example, create a custom datacenter fact that is based on information particular to your network layout so that each datacenter is uniquely identifiable.
Give each environment – production, test, development – its own hierarchy.
Hiera configuration layers
Hiera uses three independent layers of configuration. Each layer has its own hierarchy, and they’re linked into one super-hierarchy before doing a lookup.
The three layers are searched in the following order: global → environment → module. Hiera searches every data source in the global layer’s hierarchy before checking any source in the environment layer.
The global layer
The configuration file for the global layer is located, by default, in
$confdir/hiera.yaml
. You
can change the location by changing the hiera_config
setting in puppet.conf
.
Hiera has one global hierarchy. Since it goes before the environment layer, it’s useful for temporary overrides, for example, when your ops team needs to bypass its normal change processes.
The global layer is
the only place where legacy Hiera 3
backends can be used - it’s an important piece of the transition period when you migrate you
backends to support Hiera 5. It supports
the following config formats: hiera.yaml
v5, hiera.yaml
v3 (deprecated).
Other than the above use cases, try to avoid the global layer. All normal data should be specified in the environment layer.
The environment layer
The configuration file for the environment layer is located, by default, in <ENVIRONMENT DIR>/hiera.yaml
.
The environment layer is where most of your Hiera data hierarchy definition happens. Every Puppet environment has its own hierarchy configuration, which applies to nodes in that environment. Supported config formats include: v5, v3 (deprecated).
The module layer
The configuration file for a module layer is located, by default, in a
module's <MODULE>/hiera.yaml
.
The module layer sets
default values and merge behavior for a module’s class parameters. It is a convenient
alternative to the params.pp
pattern.
params.pp
, the default_hierarchy
should be used, as those bindings are excluded
from merges. When placed in the regular hierarchy in the module’s hierarchy the bindings
will be merged when a merge lookup is performed.It comes last in Hiera’s lookup order, so environment data set by a user overrides the default data set by the module’s author.
Lookup key | Relevant module hierarchy |
---|---|
ntp::servers |
ntp |
jenkins::port |
jenkins |
secure_server |
(none) |
Hiera uses the ntp
module’s hierarchy when looking up ntp::servers
, but uses the jenkins
module’s hierarchy when looking up jenkins::port
. Hiera never checks the module for a key beginning
with jenkins::
.
When you use the lookup function for keys that don’t have a namespace
(for example, secure_server
), the module layer is not consulted.
The three-layer system means that each environment has its own hierarchy, and so do modules. You can make hierarchy changes on an environment-by-environment basis. Module data is also customizable.