Published on 26 July 2017 by

Whether you’re new to containers or a seasoned veteran, you’ve probably come to realize that actually operating container-based infrastructure can quickly become unwieldy, particularly at scale. At scale, and over time, some of the subtleties around containers can trip you up and make operations harder than they need to be.

In this blog post, I highlight some of the common challenges with container configuration, and ways you can use Puppet to overcome them. We have come to think of configuration management as being concerned with low-level tasks like installing packages, maintaining configuration files, and ensuring services are running — all things that do not seem to be terribly useful for containers. But these examples are really just manifestations of the larger problem that configuration management addresses: managing the changes to infrastructure over time and at scale. (You may also want to take a look at Lumogon, a new tool for inspecting, reporting on, and analyzing your containers.)

Configuration management and the myth of the immutable container

One of the attractions of containers is that they are immutable — i.e., a container image is an artifact that can be passed around freely with the knowledge that it does not depend on the environment in which it will ultimately be used.

But container images are not immutable at runtime, at least not by default. Container immutability is akin to package immutability: By convention, nobody will change the files that come with a package, but there is no technical control that inhibits that. In a similar vein, the code running inside a container is free to change its local file system, and therefore accumulate state changes over its runtime, guarded only by the social convention to not do that.

Rather than relying on the social convention, and having to debug a problem in the middle of the night caused by someone violating this convention, it is much better to run containers in a manner that makes violating immutability an error. The --read-only flag achieves that for Docker by making it impossible to modify the root file system at runtime.

Depending on the application running inside the container, this might be too drastic, and applications often have legitimate needs for mutable scratch space — for temporary storage of uploaded data, for example. In that case, using the --tmpfs flag to make specific directories writable makes it possible to poke writable holes into the otherwise guaranteed-immutable file system.

Build containers quickly and reliably with Puppet image_build

With Puppet, you can get a better idea of what’s going on inside that container by defining it more clearly in the first place. Puppet is great for configuration management because it has a declarative approach — not an imperative one — which drives straightforward automation. That is, you declare what you want, and Puppet makes it happen.

Manifests that define resources are at the core of how Puppet helps you define container images and their resources. With image_build, you can reuse all the knowledge about your infrastructure that you already worked into your Puppet manifests for the container images you build. If you are not using Puppet yet, image_build gives you access to Puppet's powerful mechanisms of expressing how any aspect of your infrastructure needs to be set up. And you can share that across images, within your organization or with the world at large through the Puppet Forge.

The nice thing here is that building Docker containers with Puppet follows a similar pattern to building containers with Docker alone, which looks something like this:

$ docker build . -t lutter/someimage

With Puppet image_build, you would do something like this:

$ puppet docker build --image-name lutter/someimage 

As you can see, the build syntax is very similar, and running containers built from these two images are identical.

When do you rebuild your images to make sure they have the most up-to-date packages? Maybe not as frequently as you like, because it’s time-consuming to bring everything back up. If you rebuild your images nightly using Puppet image_build, you can be certain the image is current, as well as the containers built from it.

To try it, install the image_build module:

$ puppet module install puppetlabs/image_build

Once installed, use Git to clone the Puppet image_build examples to your local machine:

$ git clone https://github.com/puppetlabs/puppetlabs-image_build

Navigate to the puppetlabs-image_build/examples/nginx/manifests directory and take a look at the init.pp manifest file:

include 'dummy_service'

class { 'nginx': }

nginx::resource::vhost { 'default':
  www_root => '/var/www/html',
}

file { '/var/www/html/index.html':
  ensure  => present,
  content => 'Hello Puppet and Docker',
}

exec { 'Disable Nginx daemon mode':
  path    => '/bin',
  command => 'echo "daemon off;" >> /etc/nginx/nginx.conf',
  unless  => 'grep "daemon off" /etc/nginx/nginx.conf',
}

Instead of passing the runtime details you might pass on the command line, you can store them in a metadata.yaml file:

cmd: nginx
expose: 80
image_name: puppet/nginx

Creating this image from the command line just adds the word puppet to a familiar Docker build command:

$ puppet docker build

When you execute the main image_build command from within the sample puppetlabs-image_build/examples/nginx/ directory, you’ll see command-line output that looks like a familiar Docker build:

command line output from executing main image_build command

Once the build is complete, running the Puppet-created image is like running any other:

$ docker run -d -p 8080:80 puppet/nginx

If you run curl (or point your browser at the host), you’ll see NGINX is indeed running inside the container and serving up the index.html page:

$ curl http://0.0.0.0:8080
Hello Puppet and Docker

Annotate your containers with metadata

If you build your container images with Dockerfiles, you’re familiar with Dockerfile's ability to create consistent images. But you still may not have all the detail you want and need to manage the image later. If a Dockerfile requires fedora:24, you might assume Fedora is coming from a trusted repository — DockerHub or Fedora itself — but maybe it’s really a local Fedora image that’s been modified. The Dockerfile alone can’t tell you everything.

Here’s an example NGINX image built from a Dockerfile:

# Pull base image.
FROM dockerfile/ubuntu

# Install Nginx.
RUN \
  add-apt-repository -y ppa:nginx/stable && \
  apt-get update && \
  apt-get install -y nginx && \
  rm -rf /var/lib/apt/lists/* && \
  echo "\ndaemon off;" >> /etc/nginx/nginx.conf && \
  chown -R www-data:www-data /var/lib/nginx

# Define mountable directories.
VOLUME ["/etc/nginx/sites-enabled", "/etc/nginx/certs", "/etc/nginx/conf.d", "/var/log/nginx", "/var/www/html"]

# Define working directory.
WORKDIR /etc/nginx

# Define default command.
CMD ["nginx"]

# Expose ports.
EXPOSE 80
EXPOSE 443

This is certainly straightforward, but it still leaves a lot of questions — for example, which version of Ubuntu is pulled from dockerfile/ubuntu, and which version of NGINX is the Ubuntu repository offering up? Or even, Or even, "How do you run containers off this image?"

It's extremely helpful to be able to get answers directly from the running container. Besides the questions above, you might want to know who built the underlying image, when they built it, how they built it, and what software is actually running in the container.

With some foresight at build time, all these questions and many more can be conveniently answered by applying labels to container images. To ensure consistency, you'll want to define your organization's standards for the meanings of these labels, and when they need to be applied to images.

At their simplest, labels are simply strings attached to a Docker image. For example, the following directive in a Dockerfile will annotate the image with a release status:

LABEL com.example.release-status="beta"

You can enhance images with other kinds of information through labels, too; for example, placing the Dockerfile with which an image was built into the image itself is a great way to preserve details about how the image was built. Then set a label — say com.example.dockerfile — to the location of the Dockerfile inside the image, and it becomes very easy to extract that Dockerfile from your containers at runtime for troubleshooting and inspection.

Labels don't need to be restricted to static data. It's possible to define something akin to an API for container introspection by setting labels to commands that can then be run via docker exec inside the container. For example, setting the label com.example.cmd.pkgs to the command that needs to be run to list all packages in the image gives you a way to do just that via docker exec regardless of the distribution and package management tool used inside the image.

Use Puppet to better manage schedulers and orchestrators

Container schedulers like Kubernetes give you powerful ways to automate deployment of containers across multiple hosts, and can enforce certain characteristics of the deployment, like the number of copies of an image that should be running. With that power comes the responsibility to manage the scheduler itself, and ensure it's running in the intended manner.

For example, Kubernetes needs to be given accurate information about the compute and storage resources of each node at its disposal. With Puppet, you can determine host properties dynamically with Facter. It gathers system information that you can use as dynamic variables in your containers:

$ facter -y | head -n 20

os:
  name: Ubuntu
  family: Debian
  release:
    major: '16.04'
    full: '16.04'
…
memorysize: 1024.00 MB
memoryfree: 1000.65 MB
…


You can use the power of Facter in your Puppet manifests to build labels, this time applied to the Docker daemon itself, which you can then query dynamically. For example, here’s a simple Puppet class that defines OS, kernel and memory labels:

class { ‘docker’:
    labels -> [
       “os=${facts[os][family]}”,
       “kernel=${facts[kernelversion]}”,
       “memory=${facts[memory][system][total_bytes]]}”,
    ],
}

You can apply this class across all your container hosts to apply these labels consistently and uniformly, something that is much harder to achieve with a startup script, or from the command line.

These capabilities make sure that your container infrastructure runs in the manner and with the configuration you intended; accurately reflects its runtime environment; and that all this is enforced uniformly and automatically in all the places where it is needed.

Puppet makes containers easier, more consistent and transparent

Whether you’re new to containers or an expert, Puppet can give you great ways to make your containers more consistent, more transparent and easier to build. It’s clear that code plus data has advantages over data alone, because the code can act on the data dynamically. Puppet also helps you codify patterns, abstract away the things you don’t care about, and avoid copy-and-paste errors. That means fewer headaches and better outcomes.

David Lutterkort is an advisory software engineer at Puppet.

Learn more

Share via:

Add new comment

The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.