DataDog and Puppet: Sending facts as tags for metrics and events
Tags are pieces of metadata in Datadog that are key to correlating data from various sources. Tags simplify extracting business information to make it easier for organizations to answer questions such as “do spikes in CPU usage correlate with deployments of applications?” or “does a certain server role contribute to a majority of our support tickets?”
Puppet users can take advantage of extensible information Puppet knows about a system to easily send specific facts (core or external) as tags (metadata) with either metrics or events. Sales engineer Elizabeth Plumb has a great video here from a previous PuppetConf about the Puppet and Datadog integration.
Note: This guide assumes you are using the official Datadog agent module for Puppet, specifically 3.5.0 or greater.
Setting up Tags for Metrics
With the “datadog_agent” class classified to nodes, there is a parameter “facts_to_tags” that will take an array of strings. Once it is classified and Puppet is run on the managed nodes, the node’s Datadog config file will be updated with these facts and their values.
In this case, I used “osfamily” to separate out my Linux/Windows nodes, “trusted.extensions.pp_role” which is part of my server role provisioning (trusted facts info here), “datacenter” which is a custom external fact I wrote for my environment, and “ec2_metadata.ami-id” which is another built-in fact, to group based on ami-ids.
Adding or editing these facts will only work once the code has been deployed to your node and will cause changes on your next Puppet run. It’s easy to confirm that Tags are reporting in the Datadog metrics section:
Setting up Tags for Events
Adding facts as tags for events assumes you have enabled the “puppet_run_reports” parameter for the “datadog_agent” class on your Puppetserver ; directions can be found here. Enabling these tags is as easy as setting a single parameter -- also an array of strings similar to above -- through hiera. In this case, I used the Data section of classification to configure this parameter on my Puppetserver:
Notice I used the same tags as above. This is intentional and important so I can correlate events -- which are run reports sent to Datadog from the server -- with metrics that are sent from the nodes themselves.
At the time of writing, in order to finish the setup, you need to commit your changes to your master (through the console or however you set your hiera today) and run Puppet. Once Puppet has updated the configuration file on the Puppetserver, you need to restart Puppetserver (
systemctl restart pe-puppetserver for Puppet Enterprise or
systemctl restart puppetserver on open source) to re-read the configuration file by the report processor. Report processors are typically cached and, as such, updates to them require a restart of the Puppet server for each change.
Here is what the tags look like in the event stream of Datadog:
Tips for Tagging with facts
There are various tips for maximizing the above functionality, and ensuring your data is meaningful to the organization and able to help you extract value from both tools through the integration.
- When selecting Puppet facts to be set as tags, the result will be a key:value pair where the key matches the full path to the fact (indexing into a hash with dot notation for example) and the value will be the result of the fact at that index. This means that if you select a fact that is a hash at the top level, such as “trusted,” the value will be the whole hash as a string, which will not be very useful.
- Dynamic facts that can change each run, such as uptime or memory usage, will clutter the tags in Datadog and are most likely already being tracked by Datadog. Try to limit the tags to those that are static to help build or filter data towards business value.
- Facts that are good candidates to use as tags are organization-related data that will help with event/ticket routing. These can be operating system family/versions, trusted facts related to provisioning metadata, or custom facts related to server ownership or team responsibility.
- If you are unsure which facts might be good candidates, start with business questions: What are you trying to show with your dashboard? From where does that information come?
- Selecting the same facts, or having some consistency, for both metrics and events makes it easier for you to correlate the data in Datadog. If you are trying to find a connection between cpu/mem/disk usage and Puppet failures for a specific type of server in your environment, you will need something to help join these sets of data.