Query structure

Summary

PuppetDB's query API can retrieve data objects from PuppetDB for use in other applications. For example, the PuppetDB-termini for Puppet Servers use this API to collect exported resources.

The query API is implemented as HTTP URLs on the PuppetDB server. By default, it can only be accessed over the network via host-verified HTTPS; see the jetty settings if you need to access the API over unencrypted HTTP.

Query structure

A query consists of an HTTP GET request to an endpoint URL which may or may not contain:

A query URL parameter, whose value is a query string.
Other URL parameters, to configure paging or other behavior.

That is, most queries will look like a GET request to a URL that resembles the following:

https://puppetdb:8081/pdb/query/v4/<ENDPOINT>?query=<QUERY STRING>

Alternatively, you can provide the entity context instead of using the <ENDPOINT> suffix with the following:

https://puppetdb:8081/pdb/query/v4?query=<QUERY STRING>

Consult the root endpoint documentation for more details.

API URLs

API URLs generally look like this:

https://<SERVER>:<PORT>/pdb/query/<API VERSION>/<ENDPOINT>?<PARAMETER>=<VALUE>&<PARAMETER>=<VALUE>

For example: https://puppetdb:8081/pdb/query/v4/resources?limit=50&offset=50.

API version

After the /pdb/query/ prefix, the first part of an API URL is the API version, written in the v4 format. This section describes version 4 of the API, so every URL will begin with /pdb/query/v4.

Entity endpoints

After the version, URLs are organized into a number of endpoints that express the entity you wish to query for.

Conceptually, an entity endpoint represents a PuppetDB entity. Each version of the PuppetDB API defines a set number of endpoints.

See the entities documentation for a list of the available endpoints. Each endpoint may have additional sub-endpoints under it; these are generally just shortcuts for the most common types of query, so that you can write terser and simpler query strings.

URL parameters

Finally, the URL may include some URL parameters. Some endpoints require certain parameters; for others they're optional or disallowed. Each endpoint's page lists the parameters it accepts, and most endpoints also support the paging parameters.

A group of parameters begins with a question mark (?). Each parameter is formatted as <PARAMETER>=<VALUE>, and additional parameters are separated by ampersands (&). All parameter values must be URL-encoded.

`query`

The most common URL parameter is query, which lets you define the set of results returned by most endpoints.

There are two query languages available in PuppetDB, consult the documentation for each for more details.

AST query language: a JSON based query language.
Puppet query language: a new query language designed for human users to simplify querying over the legacy AST language.

A complete query string describes a comparison operation. When submitting a query, PuppetDB will check every possible result from the endpoint to see if it matches the comparison from the query string, and will only return those objects that match.

Note: Only the root endpoint supports PQL.

Paging

The next most common URL parameters are the paging parameters.

Most PuppetDB query endpoints support paged results via a set of shared URL parameters. For more information, please see the documentation on paging.

Query responses

All queries return data with a content type of application/json. Each endpoint's page describes the format of its return data.

Rich data

Puppet 6 supports rich_data types like Timestamp and SemVer, and enables rich data by default. When rich data is enabled, readable string representations of rich data values may appear in the report resource event old_value and new_value fields, and in catalog parameter values.

For example, a Timestamp value would be recorded in PuppetDB as a string like "2012-10-10T00:00:00.000000000 UTC", and a Deferred value would be recorded as a string like "Deferred({'name' => 'join', 'arguments' => [[1, 2, 3], ':']})".

Tutorial and tips

For a walkthrough on constructing queries, see the query tutorial page. For quick tips on using curl to make ad hoc queries, see the curl tips page.

Experimental query termination

PuppetDB now monitors all queries for client disconnects, and terminates the query (including the database work) as soon as the client is gone. The same mechanism also helps enforce relevant query timeouts promptly.

For now, this subsystem can be disabled by setting the environment variable PDB_PROMPTLY_END_QUERIES to false, which might be helpful if you encounter this issue with PuppetDB 8.1.0, but the variable is likely to be removed in a future release.

Experimental query optimization

Note: this feature is experimental and may be altered or removed in a future release, and while it is expected to be safe to enable it, and it is now enabled by default, some caution is still advisable. If something were to go wrong, the result set returned by the query might be incorrect. See below for one way to double-check the results if you suspect something might be amiss.

PuppetDB has an experimental query optimizer that may be able to substantially decrease the cost (and correspondingly decrease the response time) of some queries. It does this by attempting to avoid retrieving unnecessary data when generating a response.

At the moment, this optimization can only be applied for queries that ask for a subset of the available query response fields, for example, a query against the nodes endpoint that only extracts the certname. Further, for any given query it may or may not have any effect at all, and the effect may vary across PuppetDB releases.

This optimization is now enabled by default, but the default can be changed by setting the PDB_QUERY_OPTIMIZE_DROP_UNUSED_JOINS environment variable to by-request before starting puppetdb.

To enable the optimization for an individual query, just add optimize_drop_unused_joins=true as a parameter. If you'd like to determine wheter or not PuppetDB attempted to optimize a query, any efforts are logged at debug level.

(If you happen to have diff and jq installed, you should be able to compare the results of a given query with and without the optimizer by saving the data in each case to a file and then running a command like this: diff -u <(jq -S . result-1.json) <(jq -S . result-2.json).)