Catalog compilation

When configuring a node, the agent uses a document called a catalog, which it downloads from the primary server. For each resource under management, the catalog describes its desired state and can specify ordered dependency information.

Puppet manifests are concise because they can express variation between nodes with conditional logic, templates, and functions. Puppet resolves these on the primary server and gives the agent a specific catalog.

This allows Puppet to:

  • Separate privileges, because each node receives only its own resources.
  • Reduce the agent’s CPU and memory consumption.
  • Simulate changes by running the agent in no-op mode, checking the agent's current state and reporting what would have changed without making any changes.
  • Query PuppetDB for information about managed resources on any node.
Note: The puppet apply command compiles the catalog on its own node and then applies it, so it plays the role of both primary server and agent. To compile a catalog on the primary server for testing, run puppet catalog compile on the puppetserver with access to your environments, modules, manifests, and Hiera data.
For more information about PuppetDB queries, see PuppetDB API.
Puppet compiles a catalog using three sources of configuration information:
  • Agent-provided data

  • External data

  • Manifests and modules, including associated templates and file sources

These sources are used by both agent-server deployments and by stand-alone puppet apply nodes.

Agent-provided data

When an agent requests a catalog, it sends four pieces of information to the primary server:

  • The node's name, which is almost always the same as the node's certname and is embedded in the request URL. For example, /puppet/v3/catalog/web01.example.com?environment=production.
  • The node's certificate, which contains its certname and sometimes additional information that can be used for policy-based autosigning and adding new trusted facts. This is the one item not used by puppet apply.

  • The node's facts.

  • The node's requested environment, which is embedded in the request URL. For example, /puppet/v3/catalog/web01.example.com?environment=production. Before requesting a catalog, the agent requests its environment from the primary server. If the primary server doesn't provide an environment, the environment information in the agent's config file is used.

For more information about additional data in certs see SSL configuration: CSR attributes and certificate extensions

External data

Puppet uses two main kinds of external data during catalog compilation:

  • Data from an external node classifier (ENC) or other node terminus, which is available before compilation starts. This data is in the form of a node object and can contain any of the following:
    • Classes
    • Class configuration parameters
    • Top-scope variables for the node
    • Environment information, which overrides the environment information in the agent's configuration
  • Data from other sources, which can be invoked by the main manifest or by classes or defined types in modules. This kind of data includes:
    • Exported resources queried from PuppetDB.
    • The results of functions, which can access data sources including Hiera or an external configuration management database.

For more information about ENCs, see Writing external node classifiers

Manifests and modules

Manifests and modules are at the center of a Puppet deployment, including the main manifest, modules downloaded from the Forge , and modules written specifically for your site.

For more information about manifests and modules, see The main manifest directory and Module fundamentals.

The catalog compilation process

This simplified description doesn’t delve into the internals of the parser, model, and the evaluator. Some items are presented out of order for the sake of clarity. This process begins after the catalog request has been received.

Note: For practical purposes, treat puppet apply nodes as a combined agent and primary server.
  1. Retrieve the node object.
    • After the primary server has received the agent-provided information for this request, it asks its configured node terminus for a node object.
    • By default, the primary server uses the plain node terminus, which returns a blank node object. In this case, only manifests and agent-provided information are used in compilation.
    • The next most common node terminus is the exec node terminus, which requests data from an ENC. This can return classes, variables, an environment, or a combination of the three, depending on how the ENC is designed.
    • You can also write a custom node terminus that retrieves classes, variables, and environments from an external system.
  2. Set variables from the node object, from facts, and from the certificate.
    • All of these variables are available for use by any manifest or template during subsequent stages of compilation.
    • The node’s facts are set as top-scope variables.
    • The node’s facts are set in the protected $facts hash, and certain data from the node’s certificate is set in the protected $trusted hash.
    • Any variables provided by the primary server are set.
  3. Evaluate the main manifest.
    • Puppet parses the main manifest. The node’s environment can specify a main manifest; if it doesn’t, the primary server uses the main manifest from the agent's config file.
    • If there are node definitions in the manifest, Puppet must find one that matches the node’s name. If at least one node definition is present and Puppet cannot find a match, it fails compilation.
    • Code outside of node definitions is evaluated. Resources in the code are added to the are added to the node's catalog, and any classes declared in the code are loaded and declared.
      Note: Classes are usually classes are defined in modules, although the main manifest can also contain class definitions.
    • If a matching node definition is found, the code in it is evaluated at node scope, overriding any top-scope variables. Resources in the code are added to the are added to the node's catalog, and any classes declared in the code are loaded and declared.
  4. Load and evaluate classes from modules
    • If classes were declared in the main manifest and their definitions were not present, Puppet loads the manifests containing them from its collection of modules. It follows the normal manifest naming conventions to find the files it should load. The set of locations Puppet loads modules from is called the modulepath. The primary server serves each environment with its own modulepath. When a class is loaded, the Puppet code in it is evaluated, and any resources in it are added to the catalog. If it was declared at node scope, it has access to node-scope variables; otherwise, it has access to only top-scope variables. Classes can also declare other classes; if they do, Puppet loads and evaluates those in the same way.

  5. Evaluate classes from the node object
    • Puppet loads from modules and evaluate any classes that were specified by the node object. Resources from those classes are added to the catalog. If a matching node definition was found when the main manifest was evaluated, these classes are evaluated at node scope, which means that they can access any node-scope variables set by the main manifest. If no node definitions were present in the main manifest, they are evaluated at top scope.