Module of the Week: pdxcat/amanda – Advanced Network Backup

The following is a guest post by Reid Vandewiele, a system administrator at the Portland State University Computer Action Team (PDX CAT). Reid, William Van Hevelingen, Spencer Krum and other CATs are big contributors to various modules on the Puppet Forge and also host a few of their own. They are active members of the Puppet community and can usually be found on IRC under the monikers marut, blkperl and nibalizer, respectively. Thanks guys for the awesome guest post!
Purpose Provides amanda server and client configuration
Module pdxcat/amanda
Puppet Version 2.7+
Platforms Debian, Solaris, FreeBSD, SuSE

The Advanced Maryland Automatic Network Disk Archiver, or Amanda for short, is a network backup solution in the same class as Bacula. Proponents tout its smart automatic planner, use of native tools to perform data dumps, ability to recover data from tape in the absence of the tool itself, and the available commercial support through Zmanda. A venerable bastion of free and open source software, Amanda has been around since 1991 and is still actively maintained with the most recent stable version having been released on February 12, 2012.

Let’s Puppetize that!

The pdxcat/amanda module was developed to handle the installation and basic configuration of the client and server components of the Amanda system in a heterogeneous computing environment. Besides the core classes, the module ships a couple of defined types to assist in setting up multi-server authentication. In the most basic configuration, however, it is only necessary to include amanda::server and/or amanda::client on a given node.

Installing the module

Complexity Easy
Installation Time 5 minutes

The newest version of pdxcat/amanda can always be obtained from GitHub. That said, it’s a heck of a lot easier and faster to install it from the Puppet Forge. In this example I’ll be using three machines all running Ubuntu 12.04 “Precise” (please see the addendum for an important disclosure regarding the use of this version). This means that I won’t be able to demonstrate pdxcat/amanda on Solaris or FreeBSD, but that’s one of the beautiful aspects of Puppet for well-maintained modules: The steps I take to configure my Ubuntu machines using Puppet should translate directly to performing the same configuration on any supported platform. Now on to the Puppet Forge:

The Amanda software follows the traditional Unix philosophy of “do one thing and do it well,” letting appropriate pre-existing utilities dump data at the system level, and relying on either inetd or ssh for network transport. Amanda does not attempt to reinvent the wheel. The pdxcat/amanda Puppet module correspondingly utilizes and has a dependency on pdxcat/xinetd (a fork of ghoneycutt/xinetd) in addition to its dependency on the utility module ripienaar/concat.

The Puppet Module Tool (PMT) handled that minor bit of complexity for me. After issuing just one command I’m all ready to go.

Well, almost. I actually ran into snag with this setup that needs to be addressed but which due to the off-topic nature I’m disinclined to cover in detail.

Every time Ubuntu comes out with a new release I do this and every time I forget. It’s a Bad Idea (TM) to assume that less than three weeks after launch the newest bleeding-edge version of Ubuntu will “just work”. As it turns out, at the time of writing the stock Amanda packages that ship with Ubuntu 12.04 were broken. See Bug 932064. I’m sure it will be fixed soon, but in the meantime - and In order to finish this blog post - I was forced to backport Amanda packages from Quantal into my repo to test the module on 12.04 VMs. C'est la vie.

Resource Overview

The README provides a decent overview of the resources intended for the end user, but it doesn’t go into a lot of detail beyond the most basic usage and parameters. It introduces two classes and three defined types. To get at the gory details though, it’s necessary to open up the manifests themselves. I personally like to use GitHub for this when inspecting unfamiliar modules but as the PMT has just installed the files in /etc/puppet/modules, they’re only a few keystrokes away.

Amanda::Server

The server class exposes nine parameters to the user, none of which are required. This means that if I wanted to, I could literally issue `include amanda::server` to handle the software installation and take a pass on any additional functionality.

What additional functionality is available mostly has to do with installing one or more manually written Amanda configs - a term which in-context refers to a set of files and directories that together constitute all the information Amanda needs to perform backups. Most of the parameters are tied into this and will be addressed later in the configuring section.

The xinetd parameter is standalone, and boolean. When true, the pdxcat/xinetd module will be used to install the Amanda server-side services (amindexd, amidxtaped) in xinetd. False, and no xinetd services will be installed.

Amanda::Client

The client class is much simpler in comparison. This makes sense, since all native Amanda logic is located on the server and the client software usually serves only as a dumb data dumper. The interesting configuration in the client class has to do with setting up authentication to the remote Amanda server.

The remote_user parameter is used to tell the client what unix user the server will be running as (important because Amanda cares, part of the authentication determination), and the server parameter tells the client what backup server will be used. Like in the server class, the xinetd parameter tells the client whether or not to install xinetd services (amdump, in this case).

If more than one backup server/user is being used at a site, the amanda::amandahosts defined type can be used to specify that kind of additional information.

Defined Types

When the client and server classes aren’t precise enough, more manual configuration is possible using the defined types provided: amanda::amandahosts, amanda::config, and amanda::ssh_authorized_key. We won’t get into detail about them here. Suffice it to say that they provide a handy platform-agnostic way of tweaking Amanda nuts and bolts in a way that takes advantage of the module’s knowledge of platform quirks.

For those interested the defined types, the README file for the module contains examples and additional information. It’s certainly worth the extra foray into the code because as it turns out, outside of the (surprisingly gnarly) package installation, most pdxcat/amanda class logic is a wrapper for using these defines. The additional exploration will also expose why pdxcat/amanda declares a dependency on ripienaar/concat (Dun dun dun!).

(Spoiler: the .amandahosts file is managed with the concat pattern.)

Testing the module

The top-level view of the module is somewhat misleading when it comes to tests.

At first it appears that the module may contain both smoke tests and rspec tests, but as it turns out the existence of tests/ and spec/ directories is a false positive. The directories are empty in the 0.1.2 release.

Not to be daunted by so ephemeral a problem as missing tests, let’s forge bravely ahead to perform what sanity checking we can. Syntax validation and style evaluation are two easy checks to run, as previous articles in this series have demonstrated. Note that while the Puppet syntax checker is included in core, the puppet-lint tool is not and to use it you’ll need to install the `puppet-lint` gem.

Pretty good! No syntax errors and no heinous violations of style. This is by no means to say that the code is perfect (or even necessarily good), but adherence to the recommendations set forth in the Puppet Labs Style Guide says something meaningful about maintainability and potential community involvement. It suggests that the maintainers are up to date and in sync with the greater Puppet community ...or alternatively that they are aware of puppet-lint and are suckers for perfect output. Whatever works. :-)

The absence of smoke tests is something that could be easily rectified. In fact, the module’s README file already has all the necessary code in it. Filling in the smoke tests for this module will almost literally be a cut-n-paste operation.

Configuring the module

Complexity Easy
Installation Time 5 minutes

The client class seems pretty straight-forward and even intuitive, but there’s definitely something strange going on with the server. Luckily the README does a pretty good job of explaining what is going on.

The short version is that the configs_source parameter works kind of like a Puppet File source parameter “puppet:///” URI, but without the “puppet:///” part. That is, “modules/data/amanda” means the “files/amanda” directory out of the “data” module. Okay... Weird, but okay. Let’s just roll with it for a minute and see where this leads.

The “configs” parameter is an array of values that is intended to specify which Amanda configs are to be installed on the server. It is combined with the configs_source parameter to produce the name of a directory to effectively rsync from the puppet master to the agent. Here we are syncing all files and directories from the master’s $modulepath/data/files/amanda/daily and $modulepath/data/files/amanda/archive to the node’s /etc/amanda/daily and /etc/amanda/archive.

Besides configs and configs_source, there are a few more parameters that can be used to more finely define the behavior of that sync operation. Those parameters are:

  • configs_directory - The basedir on the node to which the config directories will be synced
  • manage_configs_directory - Whether or not to declare a file resource to manage the configs_directory
  • owner - The owner to apply to all files synced as part of a config
  • group - The group to apply to all files synced as part of a config
  • file_mode - The mode to apply to all regular files synced as part of a config
  • directory_mode - The mode to apply to all directories synced as part of a config

Under the hood config syncing is accomplished by using a custom function in the Amanda module to ferret out the list of files while the catalog is being compiled on the master. It was implemented this way largely to eliminate the need to enumerate all the files that make up an Amanda config, and from a purely functional perspective it works great. It seems a little out of sync with the feel of other puppet resources but I can’t think of a better way to do it yet and so I’m inclined to give it my blessing and move on to the test run.

Example usage

We’ve walked through the config. Now let’s see how it runs on our virtual Amanda server node.

...Wait, what? The error message reads: “Unable to find referenced module ‘data’ at [...]”. Oh. Right, I just finished talking all about how the configs parameter and configs_source parameter works, but I didn’t actually set it all up. The Amanda module is letting me know in no uncertain terms that it’s not gonna happen until I do.

Setting aside for the moment the details of what an Amanda config looks like, I’ll just sketch out the “data” module that I told Amanda I would be using.

Cool. Let’s roll the server again.

A wall of blue and green text (most of which is omitted here) scrolls by. As far as the Amanda module is concerned, the server is ready to go. Let’s do the same thing on the client.

Board is green. Stepping back for a moment, we should now have the following systems in place:

  • puppet.local, on which we installed the pdxcat/amanda module and set up node definitions for server.local and client.local.
  • server.local, with an Amanda server set up and the mock “daily” and “archive” configs installed.
  • client.local, for which server.local should be able to perform a backup.

It all looks pretty good, but I have to break the fourth wall and talk about Amanda again for a minute. In order to test and verify that this all works we need to actually have an Amanda config, and not just the empty directory sketch-up I did to get the catalog(s) to compile. At its simplest, an Amanda config needs to have an amanda.conf file and a disklist file. I’m going to install now the following contrived example.

# File: amanda.conf
define tapetype "vtape" {
    length 64 mbytes
}
define changer "vtapes" {
  tpchanger "chg-disk:/tmp"
}
define dumptype "default" {
  program "GNUTAR"
  auth "bsdtcp"
}
dumpcycle 7 days
dumpuser "backup"
indexdir "/var/tmp/amanda/index"
infofile "/etc/amanda/daily/curinfo"
labelstr "^V[0-9][0-9][0-9]$"
logdir "/etc/amanda/daily"
mailto "root@server.local"
org "pdxcat/amanda"
runspercycle 1
runtapes 1
tapecycle 9
tapetype "vtape"
tpchanger "vtapes"

# File: disklist
client.local /etc default

What this config will do is backup the /etc directory of the client machine to “virtual tapes” in /tmp on the server. Brilliant, I know. Now to install the config, we just need to include the files in the “data” module, as specified in the Puppet configuration for server.local, which if you will recall was given the parameters

configs        => [ 'daily', 'archive' ],
configs_source => 'modules/data/amanda’,
First, install the files in the “data” module on the puppetmaster.

So, pausing for a moment: Why is it that we are specifying that the configs come from the “data” module? Why aren’t we installing them in the Amanda module instead?

The simple answer is we’re splitting things up like this because we can. The pdxcat/amanda module follows best practice in providing an interface that allows full utilization of the module without the need to modify any of the files and directories provided. Ideally, a reusable module can be thought of and used as a library. When writing a text munger in C, one calls library functions like read() and write(), but doesn’t typically patch their libc.so. Seperation of data and code is a common maintainability practice, and just as applicable to configuration management - especially once your Puppet installation starts getting big.

The “data” module used here represents a site-local module containing non-reusable data or code that is specific to our operation - data such as the amanda.conf and disklist files we just created. Using the configs_source parameter in the amanda::server class lets us cleanly tie together reusable function from the Amanda module and custom information from our own data module.

Enough talking. Let’s kick off a Puppet to run on the server to make this all more real.

Note that the two config files from the “data” module have been installed in /etc/amanda/daily by amanda::server.

For this demo, there’s a manual config-specific step necessary to make the virtual Amanda server pretend that it’s a real boy. Normally a backup server would have some hardware attached to it on which to store the backup data it receives from clients. Maybe a giant disk array, or a standalone tape robot... Something corporeal that is referenced in the config.

Recall from the config given previously that we have opted to use /tmp. Here follows an appropriate hack to initialize our “backup tapes”.

And that’s it. We have an Amanda server on which Amanda was set up and the config installed by Puppet, and we have a client machine on which Amanda was installed and configured by Puppet to allow the server to perform backups. There was some manual setup necessary on the server to install “tapes” on which to save backup data, but we’re finally able to answer the important question “Does it work?” with a resounding “Yes!”

The check command ran and returned without error. The amdump command ran and returned silently, in typical Unix fashion. Amanda sends the full report to the user specified in the config (in this case root) but I’ll spare you the gory details of the long and verbose output. Suffice it to say that the backup test was successful.

Conclusion

The pdxcat/amanda Puppet module provides a good head-start towards managing the Amanda backup solution. Writing the configs is still left up to the administrator, but installing and ensuring them once written is easy and fast. Much of the legwork on the server side can be handled by Puppet, and all of the client-side configuration is effectively automated. The notable client configuration that is missing (configuration which was not used in this demonstration) is the amanda-client.conf file, which could have been used to set up client-side restore operation defaults.

Looking through the module code it feels like there is a lot of complexity around installing the client/server packages, but at its core the functionality provided by the module is simple and focused.

What the module does:

  • Install Amanda client and/or server packages
  • Configure the client(s) to allow the server to perform backups
  • Ensure config files written by an administrator are installed on the server
  • Support a variety of platforms
  • What the module doesn’t do:
  • Parameterize or build Amanda config files (server-side)
  • Manage the Amanda disklist or otherwise provide for puppet-integrated configuration of backup targets
  • Schedule the backup runs
  • Provide tests

Of particular note is that testing the module at present requires some upfront knowledge of Amanda and the writing of a custom config. Well wait a second, we just did all that work, right? How about turning it into an all-in-one stand-alone smoke test for the Puppet module?

Writing this post has brought to light a couple of awesome ideas that would make the Amanda module even better, some of which wouldn’t even take all that much time to code. In no particular order, here’s the brainstorm list from the review:

  1. Smoke tests for the classes and defines
  2. >Integrated smoke test for a server that includes setting up a contrived Amanda config
  3. Rspec tests for the classes, defines, and functions
  4. Allow management of amanda-client.conf through the amanda::client class
  5. Provide a defined type to schedule backup runs for a specified config

Since the first draft of this blog post was written, brainstorm item #1 has been added to the module on Github, a couple of bugs have been fixed, and the “demo” config that was written for this blog post has been turned into a working smoke test, completing brainstorm item #2. Fixing that item #2 is especially exciting as it significantly reduces the overhead for installing and testing the Amanda module (albeit with just a single VM), whittling it down to these five commands on Ubuntu:

# puppet module install pdxcat-amanda
# puppet apply /etc/puppet/modules/amanda/tests/demo.pp
# su - backup
$ amcheck demo
$ amdump demo

The detailed results will be mailed to root@localhost. These changes are available in the at-the-time-of-writing newest pdxcat/amanda release from the Puppet Forge, pdxcat/amanda 0.1.3.

A solid set of core classes and a good collection of defined type tools already makes pdxcat/amanda well worth using for deploying Amanda with Puppet. There are certainly areas that could be improved, but all in all it’s a decent module for a potentially complicated application.

Learn More
Puppet sites use proprietary and third-party cookies. By using our sites, you agree to our cookie policy.