December 12, 2021

How to Use Bolt Tasks for Better Splunk Compliance Reporting

How to & Use Cases
Security & Compliance

Due to its declarative DSL and resource abstraction, Puppet is an excellent solution for enforcing compliance across any server estate. Puppet modules provide automatic hardening and middleware configurations. Configuration drift is reported as corrective changes to Puppet Enterprise Console. But there's an alternative approach for compliance testing: using Puppet tasks and plans with Bolt, then sending the findings to a Splunk report.

Table of Contents

How Does Splunk Help with Compliance?

Splunk is kind of a one-stop solution for collecting audit trails and reporting on compliance. It lets you report on data from any machine source and visualize events on those machines. In terms of compliance reporting, it helps you find and address obstacles and errors in the app lifecycle sooner, from coding to staging to testing to deployment.

More and more organizations are required to scan their infrastructure and determine whether it complies with standardized security and hardening compliance benchmarks like CIS, NIST, PCI or BSI. The "scanning", "testing" or "issue detection" step is often separated from "remediating" or "enforcing". That's where Splunk and Bolt come in.

    How to Use Bolt Tasks for Better Splunk Compliance Reporting

    Community-developed Puppet modules offer two basic approaches to testing for compliance: using custom facts or using noop runs (a noop run is a run of the Puppet agent which reports on required changes but does not enforce them). The advantage of those approaches is that testing and enforcing compliance can be bundled in one compliance module. There are also disadvantages: using large numbers of facts can lead to scalability issues and noop runs can overwhelm PuppetDB with a large number of noop change events.

    Some additional advantages of using Bolt tasks for compliance testing:

    • You can choose the language you are familiar with and which is supported on the target systems
    • It's easy to collaborate on benchmark sets, tasks are easy to explain and to maintain
    • The system is flexible and can be integrated with any reporting platform
    • Compliance tests can be run using Bolt or via Puppet Enterprise orchestrator

    Let's take a look at how to use Bolt tasks to send compliance information to Splunk, and how Splunk reports can use that compliance data.

    Related: Check out our podcast on Bolt: Uniting Models and Tasks

    Building Compliance Testing Tasks

    Before we start experimenting, let's create an empty module which will contain our tasks. We will use Puppet Development Kit, or PDK (see PDK installation instructions).

    mkdir ~/modules
    cd ~/modules
    pdk new module bolt_compliance
    cd bolt_compliance

    As an example, let's create a task testing for control 1.1.2 of the CIS Red Hat Enterprise Linux 7 Benchmark. This control checks whether the /tmp directory is a separate filesystem, as in:

    # mount | grep /tmp
    tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)

    If the grep fails, the /tmp directory doesn't occur in the list of mounted filesystems and the control fails.

    It's easy to create a Bash task to check for this control. To create an empty task, we use pdk:

    pdk new task cis_rhsl7_1_1_2

    This will create an empty task script file and a metadata file. The metadata file should contain a short task description and any parameters the task accepts. We don't have any parameters but we could put something like "1.1.2 Ensure separate partition exists for /tmp" into the metadata description field.

    Modify the tasks/cis_rhsl7_1_1_2.sh like so:

    #!/bin/bash
    #
    # CIS Red Hat Enterprise Linux 7 Benchmark Server Level 1
    #
    # 1.1.2 Ensure separate partition exists for /tmp
    #
    if mount | grep /tmp > /dev/null ; then
     echo "Control passed: /tmp is a separate filesystem"
    else
     echo "Control failed: /tmp is not a separate filesystem"
    fi

    We can now run a task using Bolt like the following:

     $ bolt task run bolt_compliance::cis_rhel7_1_1_2 -n localhost
    Started on localhost...
    Finished on localhost:
      Control failed: /tmp is not a separate filesystem
      {
      }
    Successful on 1 node: localhost
    Ran on 1 node in 0.11 seconds

    Note that the opening and closing braces with nothing in between mean that there is no structured (json) output from the task. We only print a message in the task which is provided as text output.

    So the basic building block is done. We can write a control-specific task for any control using Bash or any other language available on the systems we want to test for compliance. For example, here is an implementation of the CIS control 5.1.1 in Python.

    pdk new task cis_rhel7_5_1_1

    Note that pdk will create a Bash template file cis_rhel7_5_1_1.sh. We need to rename is to cis_rhel7_5_1_1.py because we are going to use Python for this task.

    #!/usr/bin/env python
    #
    # CIS Red Hat Enterprise Linux 7 Benchmark Server Level 1
    #
    # 5.1.1 Ensure cron daemon is enabled (Scored)
    
    import subprocess
    import json
    
    command = 'systemctl is-enabled crond'
    
    result = {}
    
    try:
       output = subprocess.check_output(command, shell=True)
       result['_output'] = "control passed: crond enabled - " + output
       result['compliant'] = True
    except subprocess.CalledProcessError as e:
       result['_output'] = "control failed: crond disabled - " + e.output
       result['compliant'] = False
    
    print(json.dumps(result))

    Note that in this implementation we provide structured output instead of only text. By providing a separate compliant key we make our lives easier later while searching for compliance issues using Splunk.

    After we have created our tasks, we should run pdk validate to check the validity of our metadata files and task naming conventions.

    Writing a Plan to Run Compliance Controls

    Our next step is creating a plan to run a series of controls. We call it run.pp and put in in the plans subdirectory of our module. We need to accept two parameters: an array of controls, and an array of nodes to run the controls on. For the array of controls, we choose the data type Array[String[1]], and for the nodes we use the built-in TargetSpec data type, which allows us to be flexible in the way we specify the nodes.

    plan bolt_compliance::run(
     Array[String[1]] $controls,
     TargetSpec $nodes,
    ) {
    
     notice("Running controls: ${controls}")
    
     $controls.each | $control | {
       notice("Running control: ${control}")
       $result = run_task("bolt_compliance::cis_rhel7_${control}", $nodes)
       notice("Result for control ${control}: ${result}")
    }

    Let's test our plan:

    $ bolt plan run bolt_compliance::run --params '{"controls": ["1_1_2", "5_1_1"]}' -n localhost
    Starting: plan bolt_compliance::run
    Running controls: [1_1_2, 5_1_1]
    Running control: 1_1_2
    Starting: task bolt_compliance::cis_rhel7_1_1_2 on localhost
    Finished: task bolt_compliance::cis_rhel7_1_1_2 with 0 failures in 0.01 sec
    Result for control 1_1_2: [{"node":"localhost","status":"success","result":{"_output":"Control failed: /tmp is not a separate filesystem\n"}}]
    Running control: 5_1_1
    Starting: task bolt_compliance::cis_rhel7_5_1_1 on localhost
    Finished: task bolt_compliance::cis_rhel7_5_1_1 with 0 failures in 0.05 sec
    Result for control 5_1_1: [{"node":"localhost","status":"success","result":{"_output":"control failed: crond disabled; Command 'systemctl is-enabled crond' returned non-zero exit status 127\n"}}]
    Finished: plan bolt_compliance::run in 0.09 sec
    Plan completed successfully with no result

    Note that since controls parameter is an array, we need to specify it using the json syntax on the command line. Alternatively, we can create a json file and refer to it in our parameters:

    $ cat params.json
    {
      "controls": ["1_1_2", "5_1_1"]
    }
    $ bolt plan run bolt_compliance::run --params @params.json -n localhost

    Creating a Splunk Report with the HTTP Event Collector

    The next step is to figure out how to report our findings to Splunk. We will use Splunk's HTTP Event Collector, or HEC service, documented here: Splunk HEC Service. We create a "compliance_report" index and save the provided token.

    Using Postman for experimenting with sending events to Splunk HEC, we find out that we need to send a POST request to https://<splunk-uri>/services/collector with the following headers:

    Content-type: application/json
    Authorization: Splunk <token>

    The request body should be a json object containing a key "event" with another object as value, like so:

    {
    "event": {
            "key1": "value1",
            "key2": "value1"
    
        }
    }

    We get the response

    {
        "text": "Success",
        "code": 0
    }

    After we verify that our event posted successfully, we can verify this in Splunk's Search and Reporting app that our event has been indexed:

    Splunk progress image

     

    Using Compliance Results in a Splunk Report

    Now we need to get the output from the compliance testing tasks, create Splunk events from the output, and send the events to create a Splunk report on your compliance. We can write a custom Plan function for this, but in this case we choose to use a task. With a task we have more flexibility – we can determine the node the task runs on so we can circumvent firewall issues if the workstation running Bolt doesn't have required connectivity.

    A task sending output to Splunk could look something like this. First, the metadata file:

    {
     "puppet_task_version": 1,
     "supports_noop": false,
     "description": "Send json data to a Splunk HEC",
     "parameters": {
       "splunk_endpoint": {
         "description": "The Splunk HTTP Event Collector endpoint",
         "type": "String[1]"
       },
       "splunk_token": {
         "description": "The Splunk HTTP Event Collector token",
         "type": "String[1]"
       },
       "data": {
         "description": "The data to be sent to Splunk",
         "type": "Hash"
       }
     }
    }

    The task should accept three parameters: the Splunk endpoint, the Splunk token and the data we want to send.

    An example implementation of the send_to_splunk task in Python:

    #!/usr/bin/env python
    
    import sys
    import json
    import requests
    
    params = json.load(sys.stdin)
    
    splunk_endpoint = params['splunk_endpoint']
    splunk_token = params['splunk_token']
    data = params['data']
    
    headers = {
       'Content-Type': 'application/json',
       'Authorization': 'Splunk ' + splunk_token
    }
    
    # warning: don't use `verify=False` in production!
    response = requests.post(
       splunk_endpoint, headers=headers, json=data, verify=False)
    
    result = {}
    result['_output'] = response.text
    
    print(json.dumps(result))

    Modifying the Plan to Send Data to Splunk

    The final step of this adventure is to modify our plan to send data to Splunk as follows:

    plan bolt_compliance::run(
     Array[String[1]] $controls,
     TargetSpec $nodes,
    ) {
    
     notice("Running controls: ${controls}")
    
     $default_task_args = {
       splunk_endpoint => 'https://my.splunk.endpoint/services/collector',
       splunk_token => 'my-splunk-token',
     }
    
     $controls.each | $control | {
       notice("Running control: ${control}")
    
       # run the $control task on the $nodes
       $result = run_task("bolt_compliance::cis_rhel7_${control}", $nodes)
       notice("Result for control ${control}: ${result}")
    
       $result.each | $result | {
       # we take $result.value which is our task's output and merge it with some extra data
         $result_hash = $result.value + {
    target => $result.target.name, # add host name to the event
    control => $control,                  # add the control ID to the event
    message => $result.message  # add the textual message to the event
         }
    
         # construct the Splunk event
         $task_args = $default_task_args + { data => { event => $result_hash } }
    
         # send the event to Splunk
         $splunk_result = run_task('bolt_compliance::send_to_splunk', 'localhost', $task_args)
    
         notice("Result from Splunk: ${splunk_result}")
       }
     }
    }
    

    Let's test the new plan on 2 test CentOS 7 nodes defined in our inventory.yaml so we can refer to them using the symbolic name all:

    $ bolt plan run bolt_compliance::run --params '{"controls": ["1_1_2", "5_1_1"]}' -n all
    Starting: plan bolt_compliance::run
    Running controls: [1_1_2, 5_1_1]
    Running control: 1_1_2
    Starting: task bolt_compliance::cis_rhel7_1_1_2 on macs6lesp8kcgfl.delivery.puppetlabs.net, jwpaw4v58f8shqq.delivery.puppetlabs.net
    Finished: task bolt_compliance::cis_rhel7_1_1_2 with 0 failures in 5.08 sec
    Result for control 1_1_2: [{"node":"macs6lesp8kcgfl.delivery.puppetlabs.net","status":"success","result":{"_output":"Control failed: /tmp is not a separate filesystem"}},{"node":"jwpaw4v58f8shqq.delivery.puppetlabs.net","status":"success","result":{"_output":"Control failed: /tmp is not a separate filesystem"}}]
    Starting: task bolt_compliance::send_to_splunk on localhost
    Finished: task bolt_compliance::send_to_splunk with 0 failures in 0.84 sec
    Result from Splunk: [{"node":"localhost","status":"success","result":{"_output":"{\"text\":\"Success\",\"code\":0}"}}]
    Starting: task bolt_compliance::send_to_splunk on localhost
    Finished: task bolt_compliance::send_to_splunk with 0 failures in 0.88 sec
    Result from Splunk: [{"node":"localhost","status":"success","result":{"_output":"{\"text\":\"Success\",\"code\":0}"}}]
    Running control: 5_1_1
    Starting: task bolt_compliance::cis_rhel7_5_1_1 on macs6lesp8kcgfl.delivery.puppetlabs.net, jwpaw4v58f8shqq.delivery.puppetlabs.net
    Finished: task bolt_compliance::cis_rhel7_5_1_1 with 0 failures in 7.61 sec
    Result for control 5_1_1: [{"node":"macs6lesp8kcgfl.delivery.puppetlabs.net","status":"success","result":{"compliant":true,"_output":"control passed: crond enabled - enabled\n"}},{"node":"jwpaw4v58f8shqq.delivery.puppetlabs.net","status":"success","result":{"compliant":false,"_output":"control failed: crond disabled - disabled\n"}}]
    Starting: task bolt_compliance::send_to_splunk on localhost
    Finished: task bolt_compliance::send_to_splunk with 0 failures in 0.83 sec
    Result from Splunk: [{"node":"localhost","status":"success","result":{"_output":"{\"text\":\"Success\",\"code\":0}"}}]
    Starting: task bolt_compliance::send_to_splunk on localhost
    Finished: task bolt_compliance::send_to_splunk with 0 failures in 0.83 sec
    Result from Splunk: [{"node":"localhost","status":"success","result":{"_output":"{\"text\":\"Success\",\"code\":0}"}}]
    Finished: plan bolt_compliance::run in 16.12 sec

    If we do a search in Splunk now, we see this output:

     

     

     

    Getting Rid of Hard-Wired Splunk Configurations

    It's easy to move the Splunk configuration items from the plan code into a separate configuration file. We just need to replace this code:

     $default_task_args = {
       splunk_endpoint => 'https://my.splunk.endpoint/services/collector',
       splunk_token => 'my-splunk-token',
     }

    With this code:

     $default_task_args = loadyaml('splunk-config.yaml')

    splunk-config.yaml should contain our configuration items like so:

    splunk_endpoint: "https://my.splunk.endpoint/services/collector"
    splunk_token: "my-splunk-token"

    The function loadyaml() is supplied by the puppetlabs-stdlib module, so we need to install it on the system where we run the plan:

    puppet module install puppetlabs-stdlib

    We also need to make sure that Bolt knows where to find the puppetlabs-stdlib module. We can supply the --modulepath parameter to every Bolt invocation, but the easier way is to put this configuration into bolt.yaml. The modulepath should also include the path to the directory containing the bolt_compliance module, ~/modules in this example:

    modulepath: "~/modules:~/.puppetlabs/etc/code/modules"

    START USING PUPPET ENTERPRISE FREE

    This blog was originally published on December 12, 2018 and has since been updated for relevance and accuracy.

    Learn More