The Problem with Patch Management: All the Humans
Patch management is the process of applying software updates to installed software systems. If that sounds simple, you haven’t done it on a large complex scale.
Every change to infrastructure introduces risk. That risk has to be understood and managed. For software updates, assessing the risk each update introduces can be very difficult. Often this means IT organizations either don’t update often enough, or don’t update at all.
Because sysadmins are busy and the process of managing software updates can be resource-intensive, the work must be prioritized. Vetting which updates are important adds another step to the process of managing them. Is it a security update? How severe is it? If it’s a bug fix release, are we experiencing the bug? What are the potential services this update can affect? Every update not meeting a high bar gets pushed to the bottom of the work stack, or pushed off entirely.
In order to prioritize the stream of incoming updates, admins must maintain a sufficient level of expertise in a multitude of system components — and not all provide direct business value. Beyond system components of a given operating system, the differences between the operating systems themselves must be understood. So the work is managed by specialized teams, enforcing team silos. Each operating system has a particular “best practice” patch management solution, meaning multiple tools must be monitored. Each tool has its own way of doing things, further entrenching specialization.
Patch management today is a human process that is inefficient, costly, error-prone, requires unnecessary expertise, and sacrifices IT agility.
Products on the market today “help” by essentially making it easier to prioritize the work. They alert us to the updates available, inform us what they are, and wait for instructions from the human expert. The tools can’t make basic decisions on behalf of their users, because they lack the necessary insight into the infrastructure, so they defer to the expert.
The fact is, updates are never applied to servers, but the tools lack this insight. Updates are applied to the previously desired configuration of that server. In other words, updates are applied to policy, not machines. However, the tools are too dumb to know which updates potentially affect policies in your configuration management system, forcing the administrators to further specialize in areas that are not core to business value. The lack of insight means current tools are unable to know that a business application needs to be reloaded because a dependent library was updated.
The tools reinforce the problem.
What’s needed is a way to detect available updates across every operating system and all firmware on physical devices. All verified updates need to be applied in a way that minimizes or eliminates downtime. The updates must be introspectable, auditable, and repeatable. The process should minimize human interaction and required expertise.
The dumb tools need to get smarter.
The solution is, somewhat ironically, to do what the business wants IT to do in the first place: focus on business value. In a previous blog post, Managing Change to Enable Agile Operations, I make the argument that any effective change management process validates all incoming change, including patch management, against the business applications and services themselves. Modern practices such as continuous delivery make this practical by constantly ensuring incoming change results in the desired behavior. Software updates should be treated as just another incoming change.
Once updates are treated as a change equal to development changes, the goal becomes to apply every update that does not break the applications. Whether an update is a security update, bug fix, or new feature is irrelevant.
If it doesn’t break it, ship it.
The only attention that should be paid to patch management is when an update breaks the pipeline. Instead of focusing on understanding each update enough to prioritize it, the human experts need to dive only into understanding the updates that break the business-critical applications.
In the end, patch management is a part of configuration management. The desired version of libxml and the kernel is no different than the desired version of the business application in production. But the tools today are too dumb to know how to express this, let alone how to verify and apply it. We as sysadmins should not be forced to be experts of the OS internals to stay agile. We must start seeing patches as just another change. The industry can do better. We as sysadmins should demand better.
- Curious about continuous delivery? Download our free ebook and learn how to get started.
- Prioritize product features for continuous delivery.
- Do configuration management with Puppet Enterprise, and you'll have more time to do things that add real value to the business.
- Go ahead — download Puppet Enterprise and try it out on 10 nodes for free.