homeblogbeyond golden image crotchety sysadmins journey container acceptance

Beyond the golden image: a crotchety sysadmin’s journey to container acceptance

It always comes back to, “Well, it works on my machine.”

For as long as applications have been released to production, we’ve been hearing that sentence when something breaks. Over time, IT organizations have come up with two broad approaches to solving this problem:

  1. Make the development environment look more like production
  2. Make production look more like the development environment

The first approach, generally speaking, has been favored by organizations; the second is in ascendance. But what do they mean, and how do modern application deployment methods like containers fit in?

The orthodoxy: Make the development environment look more like production

This is the approach that most organizations have experience with. We quickly learned that deploying a new application directly into production can be a recipe for disaster (or at least late-night calls). In order to mitigate this risk, we created a series of production-like environments for developers to work in prior to release — development, QA, testing, integration — however many resources it took to test the application.

This was helpful, but it created a lot of work for operations staff who now had to ensure that multiple environments were correctly configured, rather than just one. After all, it doesn’t matter if I do a bunch of testing in a pre-production environment if that environment doesn’t look like my production environment (with all of the real-world forces affecting it, like load times, network bandwidth, etc.).

At first, this configuration was a manual process. We would (hopefully) write documentation to describe to others how a system should be configured. A person would then sit down and follow the documentation, manually configuring the system. This is slow and error-prone, so over time sysadmins began to write scripts to automate portions of this documentation.

This was an improvement, but the scripts still had to be run manually, and there were generally manual steps between scripts. We improved the process further by building job execution engines that could automatically run the scripts on a schedule or in response to a predefined event.

This all represented progress, but script-based configuration management is brittle. It is a good way to set the initial state of a system, but it doesn’t work well to maintain that state.

Infrastructure as code provides an approach to configuration management that allows us to truly define repeatable, persistent system configuration in a way that can be easily distributed to, and even owned by, software developers. This made it much easier to automate the configuration of a production-like system from scratch. With local virtualization technologies, developers could even quickly create production-like VMs from scratch on their own laptops, test the application there, and then deploy to test and production environments with a high degree of confidence.

For operations, this was close to the promised land, but there was an undercurrent of discontent.

The insurgency: Make production look more like the development environment

Periodically, a new technology would spring up that would prompt someone to ask, “Well, wait a minute. Why don’t I just configure my development environment the way I need it, and then just copy that whole environment into production?”

There have been a few iterations of this: hard disk clones, golden images, VM templates, public cloud images. Each new technology made this seem more possible than the previous iteration, but they all broke down due to two fundamental constraints. Technology had to successfully overcome the first constraint before we bothered to spend any time addressing the second.

Let’s walk through these two constraints.

Constraint 1: cloning disks/images/VMs is time-consuming and expensive

The first constraint is that it takes a long time to reproduce an entire operating system, no matter what technology backs it. Disk clones and golden image installations can take hours apiece. VMs and cloud images can take tens of minutes, which is an improvement, but is still glacial with respect to modern demands on availability and responsiveness.

Because these processes are so time-consuming, nobody ever took them seriously as techniques for managing production environments. As a result, nobody bothered to invest in tooling to make the process of deploying an entire image more robust, which led to these technologies fading into irrelevance. This is significant.

How to overcome this: Hello, containers.

Containers have finally given us a way to deploy entire environments without the high overhead of previous techniques. This technological leap has finally made it feasible to deliver fully-configured application instances to production, and because of this, we now see interest in building tooling to streamline the process.

This, ultimately, is what is making this latest iteration of the “golden image” model stick. It’s lightweight enough to be practical, and practical enough to invest in. Technology has finally caught up to the dream. We can solve the “works on my machine” problem by simply deploying “my machine” to production.

Constraint 2: Applications aren’t ready

Now that containers have addressed most of the practical problems associated with deploying fully configured environments to production, we are beginning to seriously consider the second problem: most applications weren’t designed for this model. Because creating configured systems has always been expensive, we’ve built applications on the assumption that they will be deployed to long-lived servers. This assumption leads to a number of application behaviors that are incompatible with a container-deployment model, but the primary ones are these:

  • They save state to the server that they run on.
  • They are monolithic.

Many applications store important information locally on the servers they run on. This is a convenient way to improve performance — it’s much faster to read data from a local disk than it is to retrieve it from a remote database. However, as soon as an application does this, it becomes tied to that server. This eliminates the primary benefit of containerization — there’s no sense in containerizing an application if the container can never be destroyed and redeployed.

In order to reap the benefits of containerization, an application must be designed to be stateless. This means that it doesn’t store any important information on the server it runs on. Once an application is stateless, it is trivial to destroy and recreate the container that hosts it. The application will retrieve its state from remote sources as soon as it’s recreated.

Because it’s long been more cost- (and space-) effective to buy a single high-powered server than multiple low-powered servers, legacy applications were designed to consolidate as much of their operation as possible on as few servers as possible. This is a perfectly sensible model when the runtime environment is relatively expensive (in terms of money or system resources). However, containers are cheap. They consume few resources and cost next to nothing to instantiate.

It is now practical to design applications in a way that separates isolated functions onto individual containers. This provides all sorts of benefits: individual components can be scaled and updated, interdependencies are clear and well-defined, problems are easier to isolate and identify. But it is diabolically hard to reconfigure existing applications to isolate their functions in this way. It’s much easier to design an application from the ground up to be deployed in this way than to convert it after the fact.

An uneasy truce, but a way forward: a hybrid environment

Because legacy applications are so difficult to retrofit into an architecture that supports containerization, persistent application servers will remain for some time. But this time around, with containerization, the “golden image” model finally appears to have crossed the line from unfulfilled promise to practical application. New applications are built with containerization in mind, and the activity and energy in the container space show no sign of abating.

All indications are that the future belongs to containers, but the present isn’t going anywhere any time soon, and so it’s critical that organizations incorporate best practices for both traditional and containerized applications.

Continuous configuration management allows IT organizations to most effectively manage their stateful, monolithic applications. Container technologies such as Docker and Kubernetes (and the orchestration that automates deployments) allow them to manage modern stateless, ephemeral application servers. By using the two in conjunction, enterprises can optimize their present IT estate while building for the containerized future.

As the lines between production and development environments continue to blur, are your teams ready to adapt and fully embrace modern methods of a containerized future?

Greg Sarjeant is the services portfolio lead at Puppet.