Running Puppet Comply® at Scale

Introduction

Puppet Comply can help you assess your compliance status across a large and diverse estate of systems under management. Generally, you can use Comply to scan estates up to 10,000 nodes without any special considerations. Scans and reports on larger estates will still run without error but certain operations may take longer than desired due to the large amount of data that needs to be processed. This document will present additional options and factors to consider to optimize usage of Puppet Comply on estates larger than 10,000 nodes.

It is beneficial to determine anticipated PostgreSQL Capacity as part of initial install as resizing postgres later can be a tedious and time consuming process.

Scanning at Scale Approach

Scan execution is handled by the Puppet orchestrator.

The task concurrency parameter determines the number of concurrent scans. Default in PE 2023.2 has been increased to 1000 but is lower in previous versions. This means that if the parameter is set to 1000 and the user runs a scan of 5000 nodes, 4000 tasks will be initially queued. The orchestrator will be fully consumed until the scans are completed on all 5000 nodes. Any new tasks initiated within the Puppet ecosystem will be placed at the back of the queue.

For this reason, users must create a Scanning Schedule that meets the needs of their organization.

General recommendation is that scans should be executed in batches that do not exceed 10k nodes. For example, to scan an entire estate consisting of 50k nodes, 5 separate scheduled scans should be configured at a suitable interval.

A common approach is to run a batch of 1k nodes to determine how long this takes before deciding on the interval. (Duration * 10 + Buffer)

Finally, we recommend that scans are scheduled to coincide with periods of minimal workflow to help ensure adequate network throughput.

For more information about optimizing performance, see Tune task and plan performance in Puppet Enterprise (PE).

System Requirements

To support environments with more than 10,000 nodes, your Comply installation needs a total of at least 16GB of memory and 100GB of storage space available.

Configuration and Tuning

Data Retention

Puppet Comply defaults to retaining all historical scan data, this configuration is not recommended when scanning large numbers of nodes. You should enable the "Enable data retention policy" configuration option and set the "Scan data retention period in weeks" value to the minimum number of weeks required by your organization.

Depending on your node count, scan frequency, and desired retention period, you may need to adjust the "Comply PostgreSQL capacity" and "Comply PostgreSQL memory" values under the "Additional Support" configuration section as well. See the table below for recommended values based on certain landmark node counts. Note that these values assume one scan per week and the default data retention period of 14 weeks.

1,000 nodes

PostgreSQL Capacity = Default, PostgreSQL Memory = Default

10,000 nodes

PostgreSQL Capacity = 25Gi, PostgreSQL Memory = Default

50,000 nodes

PostgreSQL Capacity = 75Gi, PostgreSQL Memory = 4Gi

75,000 nodes

PostgreSQL Capacity = 105Gi, PostgreSQL Memory = 8Gi

100,000 nodes

PostgreSQL Capacity = 135Gi, PostgreSQL Memory = 12Gi

General rule of thumb is, add 6Gi to PostgreSQL Capacity for each additional 5,000 nodes. For longer retention periods, divide calculated storage requirement by default retention period (14) to determine per week storage requirement and then multiply by desired retention period.

Note: Comply PostgreSQL capacity cannot be modified after initial installation without contacting Puppet support.