The Fact Is…

One of the major interfaces to extend functionality with Puppet is the use of Facter and custom facts. A fact is a key/value data pair that represents some aspect of node state, such as its IP address, uptime, operating system, or whether it's a virtual machine. Custom facts are able to provide catalog compile time data to the puppet master to customise the configurations for a specific node (using PuppetDB, this data becomes available even when you're doing things elsewhere). Facts are provided by Facter, an independent, cross-platform Ruby library designed to gather information on all the nodes you will be managing with Puppet. For an example of using a custom fact within Puppet, you can use this data in the context of a catalog compile to make a decision about the catalog that you sent back to the node. A fact that tells us the node's operating system could cause some conditional logic in a class that tells the node the right set of packages to use to configure ntp on that particular system. Because the package names differ between Linuxes (let alone between Linux and Windows), this fact simplifies ntp configuration. Alternatively, you could use the $::ipaddress fact to customise the appropriate listen directive in Apache or SSH. You can provide Puppet custom facts by:
  • Extending Puppet using Ruby
  • Using the facter.d library shipped with the Puppet Labs stdlib module
  • Simply setting an environment variable available during a Puppet agent run
In this way, you can provide compilation infrastructure rich state information about your nodes to the catalog. It might become common practice for you to alter behaviour in Puppet classes on the basis of node state data for all your modules; in fact, it's one of the things that makes it so usable and flexible! However, there are some security implications here. For example, one of the more common facts is the 'hostname' fact. Unless you choose otherwise, this will match the certname of the node (i.e. the CN on the certificate generated out of the Puppet CA when you provisioned the node) though Puppet won't use the same mechanism to arrive at that determination. Depending on your platform, you will probably return the value of $::hostname by running a command like hostname from the context of the Facter libraries used by Puppet. Other ways of driving the value include $::fqdn (which might also do a hostname -f), or perhaps introspect some other aspect of the system to derive the value. In any case, the system works out the value to populate the key and sends it to the master. On the other hand, the value of $clientcert (which is available to the Puppet master during a catalog compile) is taken from the CN of the Puppet CA signed SSL certificate used by that particular agent) - see below however, we can still do better than $clientcert. From a security perspective, there is a vast difference between the assertion backed by PKI that a node's hostname is ‘xyz’ versus the same assertion made via a fact. This is especially true given the variety of ways you might arrive at the value that was sent to the Puppet Master. Assuming we trust the PKI, the state of being that the node's hostname matches the certificate (if that's the way we run our PKI) is significantly more trustworthy that the assertion that the hostname fact is the correct one. This is the major reason Puppet uses SSL and PKI for node identification. Admittedly, in order to get to the point where you can send that data to a Puppet Master, you need sufficient rights over the private key material on a node. The implication for secure module writing is that if you want to securely distribute the right configuration settings and associated data back down to nodes, you must use trustworthy data when compiling a catalog. For settings with a smaller security impact (e.g. OS package names), trusting facts is cheap and secure enough that you could just use them. However, you probably don't want a module that distributes private SSL keys based on an assertion of their identity via facts — you would want this module to only send that data to properly authenticated nodes and be sure of their identity when compiling the catalog. For modules with security critical information, determining how a class instantiates a node should be made solely using data from the Puppet Master — that might be data in the Puppet code that comprises your class, or in an ENC, or in your Hiera backend. It should in no way be influenced by an untrusted assertion from a node. Even then, the root of the decision tree within your code should be your PKI. EDIT 2013-02-28: Whilst $clientcert is marginally better than a normal fact in that it is inserted directly by Puppet and not overridable in the same way, it still doesn't represent the validated certificate when used in scope during catalog compilation.  Something like the following parser function (props to Stephen Johnson) would serve to provide that data into scope, in preference over using $clientcert:
module Puppet::Parser::Functions
  newfunction(:certcheck, :type => :rvalue, :doc => <<-EOS
    Returns the actual certname
  ) do |arguments|
    return host
Basing conditional decisions on your PKI bases that decision on the SSL certificate that the node used to authenticate (so interpolating and traversing your Hiera hierarchy should probably use data provided from the function above *not* $hostname or $fqdn). If you completely trust data supplied via facts, then including data, settings, or anything else in a catalog where that information might be used to perform a privilege escalation where a node has spoofed fact data, you've been hacked. The true danger here is that the escalation you achieve is to get root on something else. If your organisation has different sets of users root on different sets of boxes, your module design philosophy could lead to leakage across node sets. If a user has root and they have access to the private key, they can run Puppet and and spoof facts because Puppet runs as root. The spoofed facts may result in a catalog being returned that leaks potentially security impactful data, configurations and information that might be used to perform mischief elsewhere. Also, quick side note: if users have root on multiple boxes, there's nothing to stop them moving SSL certs around the place and putting incorrect configurations on the wrong boxes. Don't give people root if you don't have to! The alternative to using facts in modules to make decisions, as I mentioned above, is using trustable data. By that, I mean a combination of PKI validated information and the data you have control over on your Puppet Master. It may increase the overhead of the amount of static data you need to manage, but this is a trade off between security, how much you can be bothered, and what the outcome of it all going wrong is. Incidentally, this significantly promotes the idea of encoding useful data in hostnames and therefore Puppet certificate names (see my personal blog post on the simple things — this is an extension of this post). I personally have no problem with using facts to make decisions with low security impact outcomes. Ultimately, however, what all this comes down to is the good old fashioned security principle of not trusting user input and sanitizing the hell out of it before you do anything with it.  

Learn More

  • Download Facter
  • Read our Custom Facts Documentation
  • Watch our webinar on writing custom facts in Facter