Catalog compilation

When configuring a node, the agent uses a document called a catalog, which it downloads from the master. For each resource under management, the catalog describes its desired state and can specify ordered dependency information.

Puppet manifests are concise because they can express variation between nodes with conditional logic, templates, and functions. Puppet resolves these on the master and gives the agent a specific catalog.

This allows Puppet to:

  • Separate privileges, since each node only receives its own resources.
  • Reduce the agent’s CPU and memory consumption.
  • Simulate changes by running the agent in no-op mode, checking the agent's current state and reporting what would have changed without making any changes.
  • Query PuppetDB for information about managed resources on any node.
Note: The Puppet apply command compiles its own catalog and then applies it, so it plays the role of both master and agent.
For more information about PuppetDB queries, see PuppetDB API.
Puppet compiles a catalog using three sources of configuration information:
  • Agent-provided data

  • External data

  • Manifests and modules, including associated templates and file sources

These sources are used by both agent-master deployments and by stand-alone Puppet apply nodes.

Agent-provided data

When an agent requests a catalog, it sends four pieces of information to the master:

  • The node's name, which is almost always the same as the node's certname and is embedded in the request URL. For example, /puppet/v3/catalog/web01.example.com?environment=production.
  • The node's certificate, which contains its certname and sometimes additional information that can be used for policy-based autosigning and adding new trusted facts. This is the one item not used by Puppet apply.

  • The node's facts.

  • The node's requested environment, which is embedded in the request URL. For example, /puppet/v3/catalog/web01.example.com?environment=production. Before requesting a catalog, the agent requests its environment from the master. If the master doesn't provide an environment, the environment information in the agent's config file is used.

For more information about additional data in certs see SSL configuration: CSR attributes and certificate extensions

External data

Puppet uses two main kinds of external data during catalog compilation:

  • Data from an external node classifier (ENC) or other node terminus, which is available before compilation starts. This data is in the form of a node object and can contain any of the following:
    • Classes
    • Class configuration parameters 
    • Top-scope variables for the node
    • Environment information, which overrides the environment information in the agent's configuration
  • Data from other sources, which can be invoked by the main manifest or by classes or defined types in modules. This kind of data includes:
    • Exported resources queried from PuppetDB
    • The results of functions, which can access data sources including Hiera or an external configuration management database

For more information about ENCs, see Writing external node classifiers.

Manifests and modules

Manifests and modules are at the center of a Puppet deployment, including the main manifest, modules downloaded from the Forge , and modules written specifically for your site.

For more information about manifests and modules, see The main manifest directory and Module fundamentals.

The catalog compilation process

This simplified description doesn’t delve into the internals of the parser, model, and the evaluator. Some items are presented out of order for the sake of clarity. This process begins after the catalog request has been received.

Note: For practical purposes, treat Puppet apply nodes as a combined agent and master.
  1. Retrieve the node object.
    • Once the master has the agent-provided information for this request, it asks its configured node terminus for a node object.
    • By default, the master uses the plain node terminus, which returns a blank node object. In this case, only manifests and agent-provided information are used in compilation.
    • The next most common node terminus is the exec node terminus, which requests data from an ENC. This can return classes, variables, an environment, or a combination of the three, depending on how the ENC is designed.

    • Less commonly, the ldap node terminus fetches information from an LDAP database.

    • You can also write a custom node terminus that retrieves classes, variables, and environments from an external system.

  2. Set variables from the node object, from facts, and from the certificate.
    • All of these variables are available for use by any manifest or template during subsequent stages of compilation.
    • The node’s facts are set as top-scope variables.
    • The node’s facts are set in the protected $facts hash, and certain data from the node’s certificate is set in the protected $trusted hash.
    • Any variables provided by the master are set.
  3. Evaluate the main manifest.
    • Puppet parses the main manifest. The node’s environment can specify a main manifest; if it doesn’t, the master uses the main manifest from the agent's config file.
    • If there are node definitions in the manifest, Puppet must find one that matches the node’s name. If at least one node definition is present and Puppet cannot find a match, it fails compilation.
    • Code outside of node definitions is evaluated. Resources in the code are added to the node's catalog, and any classes declared in the code are loaded and declared.
      Note: Classes are usually defined in modules, although the main manifest can also contain class definitions.
    • If a matching node definition is found, the code in it is evaluated at node scope (overriding top-scope variables.) Resources in the code are added to the node's catalog, and any classes declared in the code are loaded and declared.
  4. Load and evaluate classes from modules
    • If classes were declared in the main manifest and their definitions were not present, Puppet loads the manifests containing them from its collection of modules. It follows the normal manifest naming conventions to find the files it should load. The set of locations Puppet loads modules from is called the modulepath. The master serves each environment with its own modulepath. When a class is loaded, the Puppet code in it is evaluated, and any resources in it are added to the catalog. If it was declared at node scope, it has access to node-scope variables; otherwise, it has access to only top-scope variables. Classes can also declare other classes; if they do, Puppet loads and evaluates those in the same way.

  5. Evaluate classes from the node object
    • Puppet loads from modules and evaluate any classes that were specified by the node object. Resources from those classes are added to the catalog. If a matching node definition was found when the main manifest was evaluated, these classes are evaluated at node scope, which means that they can access any node-scope variables set by the main manifest. If no node definitions were present in the main manifest, they are evaluated at top scope.

For more information, see Configuration, Indirection, Writing external node classifiers, The LDAP node classifier, Language: node definitions, Language: Scope, Directories: The modulepath (default config).