homeblogenhance your puppet enterprise support workflow with pe_status_check

Enhance your Puppet Enterprise support workflow with pe_status_check

Use pe_status_check to monitor your PE infrastructure and perform preventative maintenance

puppetlabs-pe_status_check is a new supported module for Puppet Enterprise. It provides a series of indicators of system status that the Puppet Support team has determined to avoid support incidents or outages.

Utilizing this module and the accompanying documentation will allow the user to craft preventative maintence workflows and, should it still be required, increase the quality of information in any support ticket, to help decrease the time to resolution for any incident.

Using the module

The module pe_status_check is available on the Forge and standard suppported guidance should be followed to add it to your system.

The core functionaility of the module is in the two structured facts that it provides:

pe_status_check agent_status_check

pe_status_check is confined to Puppet Enterprise infrastructure components, and has further confinements on the status checks it completes based on the relevancy of the indicator on the particular infrastructure node type, for example, Primary / Replica / Compiler / PSQL nodes.

agent_status_check is confined to all agent nodes that are NOT Puppet infrastructure components. These indicators will remain limited in volume and execution time so as not to adversely impact the agent population’s runtime.

Both Facts return structured facts in the format indicator id state (boolean). Below is a sample return from a Primary Puppet Server:

pe_status_check { S0001 => true, S0002 => true, S0003 => true, S0004 => true, S0005 => true, S0006 => true, S0007 => true, S0008 => true, S0009 => true, S0010 => true, S0011 => true, S0012 => true, S0013 => true, S0014 => true, S0016 => true, S0017 => true, S0018 => true, S0019 => true, S0021 => true, S0022 => false, S0030 => true, S0031 => true, S0033 => true, S0034 => true, S0036 => true, S0039 => true, S0040 => true }

An indicator set to true ensures that status check is returning an expected value, an indicator set to false would impart that this system is in a fault state, or trending towards a fault state.

How to use this data

When any indicator is reported as false, the first step is to consult the reference table in the documentation.

Using the sample above, S0022 is at fault. According to the documentation, this “determines if there is a valid Puppet Enterprise license in place at /etc/puppetlabs/license.key on the primary server which is not expiring in the next 90 days.”

The first step towards resolution is to follow the documentation for “Self-service steps.” In this case, I am directed towards the Support KB for an article which helps resolve such issues.

If the self-service step fails or you are unable to complete, the next step is to raise a case with Puppet Support, including the information requested in the “What to include in a Support ticket” section. In this case I am requesting to “Open a Support ticket referencing S0022 and provide the output of the following commands is -la /etc/puppetlabs/license.key and cat /etc/puppetlabs/license.key” at which point the Support team will help to further resolve.

Using the provided class for notification

The module also includes an optional class to increase visibility of any indicators that are in a bad state with pe_status_check.

When classified as per the module instructions, your infrastructure node reports will include an “intentional change” event in the form of a notification for every indicator in the fault state.

See the following example:

Notice: /Stage[main]/Pe_status_check/Notify[pe_status_check S0022]/message: defined 'message' as 'S0022 is at fault. The indicator determines if there is a valid Puppet Enterprise license in place at /etc/puppetlabs/license.key on your primary which is not going to expire in the next 90 days. Refer to documentation for required action.

This can be consumed directly out of the reports in the console or as part of a pql query, or form the input to a custom workflow if using a custom report processor.

It is also possible to set this class to ignore any given indicator if it is determined appropriate for the environment.

Example of utilizing pe_status_check to resolve issues within Puppet Enterprise

Puppet Enterprise has many built-in and supported tasks and plans for configuring and maintaining the installation. Using the facts provided by this module to target these tasks/plans to remediate an issue is a good "out of the box" use case.

Take the example of expiring SSL certificates used for Puppet communication in your agent population. The indicator provided by the agent_status_check module will give you a 90 day warning when these certificates are due to expire. If acted upon before expiration, renewal is non-service impacting.

To do this we can use pql to return the certname all nodes where agent_status_check.AS001 = false and feed this into the enterprise_tasks::agent_cert_regen plan for a supported and automated way to prevent these certificates from reaching expiration.

puppet plan run enterprise_tasks::agent_cert_regen agent=$(puppet query 'inventory[certname] { facts.agent_status_check.AS001 = false }' | jq -r '.[].certname' | paste -sd, -) master=$(puppet config print certname

The above is just one example, but other built-in and custom tasks and plans can be utilized in solving many of the issues.

Suggested external use cases

Where direct automated resolution is not possible or perhaps not appropriate, it would also be possible to use integrations with third–party tools, such as Splunk or ServiceNow, to build custom workflows based on the indicators, for example:

  • Raising a ticket with the internal infrastructure department if disk utilization is trending up.
  • Triggering the build of a new compiler if existing compilers are under consistant load.
  • Sending an email to your account representative at Puppet, if your license is near expiration.

Learn more

  • Watch this demo on preventative maintenance with Puppet Enterprise status check.