Published on 20 August 2015 by

TL;DR: Your modules are all in one big Git version control repository and you want to split them up so they’re easier to manage. A little work with Git and r10k can help. Skip to the TL;DR / How To section below for the bare how-to.

You’ve been working with Puppet for a year or two. It started on a whim, just converting some bash scripts or writing something new and using puppet apply. That worked better than you thought, so you kept writing little Puppet manifests until you had a good chunk of your job automated. Eventually, you realized you’d better track this stuff in Git, so you started checking in your one-offs. While you were at it, you moved some files around and organized your one-off manifests into module directories.

Meanwhile, your boss notices that you’re actually going home on time and hasn’t heard you muttering the phrase “stress induced ulcer” in a couple of weeks. This is especially notable since the other system administrators look increasingly sickly and overcaffeinated. So now the jig is up, and your boss wants to know how you’ve been doing it. Next thing you know, you’ve got some budget and a project assigned to you: “puppetize all the things.”

Or maybe it happened this way: You just started at a new job where someone else was doing all that stuff, and now you’re the one getting an ulcer from trying to manage it.

There is hope.

You can easily implement some better practices.

I spent enough years as a sysadmin to know that hardly anyone uses all of the “best practices.” When you’re learning something new, always trying to follow best practices can really slow you down.

This is about learning better practices.

It’s about doing a little better than before. Using puppet apply is a little better than just bash scripts. Tracking Puppet code with Git is a little better than not tracking it. Organizing code into modules is a little better than keeping it all in one folder. Splitting the modules into separate repos is a little better than a single repo. Using r10k to deploy the code is a little better than just copying it all into place.

This is a step-by-step guide, based on my own experience refactoring the build scripts for the VMs we use in Puppet Labs trainings.

The basic plan is:

  1. Pull the modules out of the monolithic Git repo while preserving unique commit history in new repos.
  2. Create a puppetfile to reference your new repos and existing Forge modules.
  3. Use r10k to deploy the modules on the master.

Let’s start with a little sanity check. Ideally, you have duplicates of your production infrastructure for dev, QA and staging. If that’s the case, you will just run these changes through your normal release pipeline. If you don’t, take this opportunity to set up at least one pre-production master, and preferably one agent node for testing.

Puppet Enterprise is free for under 10 nodes, and is very easy to set up quickly for this purpose. If you have the capacity in your pre-production VM environment, why not set up a full mirror of your production infrastructure while you’re at it?

The most important thing is, don’t test this out in your live production environment. If you run through testing in pre-production, there is about a 99 percent chance that it will work perfectly and you can go right to production. If you skip that step and deploy untested on production, there is a 99 percent chance that it will break everything, destroy your entire business, and lead you into a downward spiral of despair and shame. Anyway, once you have the Puppet code written already, it’s kind of fun to spin up new environments.

r10k

r10k is an amazingly helpful tool that is hardly ever described clearly, so here goes: r10k is how you get the right version of your code where you want it. Beyond that basic use, it also allows you to easily manage and use separate environments for dev, production, etc., plus you can do some other really cool stuff. We’re not going to cover that other stuff in this guide, but using r10k now means you’ll be able to grow into some more advanced features as you need them.

We’re assuming that your current code is all in a single repo. If you’re using external modules from the Forge, you may have copied all the code into your own repo, so you never get updates when the upstream author releases a new version. If you’re Git savvy, maybe you’re using submodules to get those updates. r10k will let us just point at the modules on the Forge and/or Git repos to avoid a lot of the pain involved in using those external modules.

The Big Split

Now that we’ve got a high-level view, let’s get into the nuts and bolts. The first step is to split the massive repo into smaller repos, one per module. Merge everything to master before you start this process, because once the code is split into modules, it’s a pain to merge changes made before the split. It’s best to get all the in-progress work done and into master before moving on. You’ll feel better too.

The basic strategy is to make a full clone of the main repo to a local directory named with the module's name. For each module, use the git filter-branch --subdirectory-filter command to select commits that apply only to that module. If you don’t care about history, just copy the files and commit them to a new repo. filter-branch is smart enough to preserve only the history of the files you’re filtering on. I did all of this by hand, since we had only a couple of modules and I wanted to check on logs as I went. If you have more than a few, you might want to write a quick bash script, but it probably isn’t worth fine-tuning a script that you’ll run only once.

Any time I use Git commands that rewrite history, I’m extra paranoid, so I added the extra step of deleting the origin remote after I cloned the repo with git remote remove origin. Since you’ll be pushing to a new remote anyway, you might as well delete them right away after you clone so you don’t accidentally wipe out 90 percent of your codebase. Also, don’t force push — that’s just foolhardy when you’re doing this kind of thing. You should be pushing to a blank remote, so if you’re tempted to force push, there’s a good chance you’re about to do something very wrong.

If your modules were in a folder called “modules” and you were trying to filter out a module called “localrepo” the command would be git filter-branch --subdirectory-filter modules/localrepo. This leaves you with a repo of just that subdirectory with only the commits for that subdirectory. So a folder called modules/localrepo/manifests will now be just manifests.

Once you’ve got your repo filtered down to just the one module, create a new repo on GitHub (or whatever remote Git server you use) named for that module. For example, when I was moving the “userprefs” module, I made a repo at http://github.com/puppetlabs/pltraining-userprefs since we were going to publish it to the Forge under the pltraining namespace. When you have that repo, just add it as a remote and push your new repo.

Now just rinse and repeat. You’ll need to run through the same process for every module in the original before you move on.

Puppetfile

Now that you’ve waded through all the mess of splitting Git history, you can get to the fun part. You’ll build a puppetfile with references to each of your new repos and any Forge modules you use. A puppetfile lives in the root of each environment directory, and r10k uses that file to deploy the modules. For example, if you’re just using the production environment, your puppetfile would go in /etc/puppetlabs/puppet/environments/production. First, you’ll need to tell it where to deploy the modules. We’ll just assume that they go in the local modules directory:

moduledir './modules/‘

Then just add references to your modules. For userprefs, we can use the Git repo like this:

mod 'userprefs’,
  :git => 'https://github.com/puppetlabs/pltraining-userprefs’

Or, since we publish userprefs on the Forge, we can shorten it and set a version number:

mod 'pltraining/userprefs', '1.0.6'

For your new Git-based modules, you might want to start maintaining a release branch so you know that broken code doesn’t get pushed out to your infrastructure. You can reference a branch in your puppetfile using ‘ref':

mod 'userprefs’,
  :git => 'https://github.com/puppetlabs/pltraining-userprefs',
  :ref => ‘release'

Another option would be to create tags against your master branch for stable versions instead of a release branch. The nice thing about a release branch is that it opens the possibility of more seamless automation in the future, where code is deployed just by pushing to that branch.

Before deploying the code, run through a quick test of your puppetfile. You just need to clone the original repo to a temporary location, delete the modules directory, and run r10k puppetfile install. Once r10k has finished, use the built in Git tools to check for differences between your monolithic repo code and the r10k-deployed code. For a quick glance of changed files, use git status. To see what’s changed line by line, use git diff.

If you’re using modules from the Forge, you may notice some differences. If you do, take a look at the code to see if you need to pin your module to a specific version in the puppetfile.

Deleting Everything

Now that you're sure everything works, it's time to delete all of that code from your original repo. Just delete your 'modules' directory and push the change. If you had a bunch of modules from the forge in your repo, this will be especially cathartic.

Here's a screenshot of what it looked like when we did it:

deleting code from original repo

Getting the Code on the Server

Now we need a way to get your code onto the Puppet master server. This is where r10k comes in. For now, we’re going to just do this by hand, but this will set the stage for some pretty magical automated stuff later on. We won’t cover that here, but rest assured that you’re moving in the right direction.

First, ssh to your master and install the r10k gem:

gem install r10k

CD to the puppet master’s production environment directory:

cd /etc/puppetlabs/puppet/environments/production/

Now, copy your puppetfile into that directory and run this command:

r10k puppetfile install

If you’ve done everything right, the modules should show up with all the right content. That’s right, after all that work you’re back where you started. Except that now you can easily update external modules to whatever version you want, and you can test changes to your own modules by creating branches in Git. We do this all the time when we’re developing our classroom VMs. For example, I was recently improving some of the code for the Learning VM. In our Puppetfile, that module is defined by the following:

mod ‘learning’,
  :git => ‘http://github.com/puppetlabs/pltraining-learning’

Now that module will always be drawn from the master branch of that Git repo. I cleaned up some old code that was no longer used in a branch on my fork called cleanup, so I updated the Puppetfile to this:

mod ‘learning’,
  :git => ‘http://github.com/joshsamuelson/pltraining-learning’,
  :ref => ‘cleanup’

After I’d run through testing and those changes were reviewed and merged, I reverted the Puppetfile to point at the main repo.

TL;DR / How To

NOTE: These instructions assume that you’re using Git for managing your Puppet code. If you aren’t, maybe this would be a good time to consider that switch.

IMPORTANT NOTE: All in-progress work needs to be merged into master on the main repo before starting this process. Merging later will require some manual cherry-picking, which could be a real hassle on a big project.

EVEN MORE IMPORTANT NOTE: Don’t test this for the first time on your production puppet master. If you can’t think of how to test the process read the whole article.

For each module in your modules directory:

  1. Clone the primary repo to your workstation to a folder named after that module (e.g. userprefs):
    git clone https://github.com/puppetlabs/puppetlabs-training-bootstrap ~/repo/pltraining-userprefs

  2. Delete the origin remote from the repo, just so you don’t do something really stupid:
    git remote remove origin

  3. Change to the repo directory:
    cd pltraining-userprefs

  4. Manipulate the repo to get only the history relevant to that subdir:
    git filter-branch —subdirectory-filter modules/userprefs

  5. Add a brand new remote (e.g., one you set up on GitHub)
    git remote add origin https://github.com/puppetlabs/pltraining-userprefs

  6. Push master to new remote repo:
    git push origin master

Create a puppetfile in the original source repo with following format:

moduledir './modules/‘`

#For modules published on the forge:
mod 'pltraining/userprefs', '1.0.6'

#For modules published in a git repo:
mod 'localrepo’,
  :git => 'https://github.com/puppetlabs/pltraining-localrepo'

#To reference a branch other than master in a git repo:
mod ‘learning’,
  :git => 'https://github.com/puppetlabs/pltraining-learning',
  :ref => ‘release’

Test out your puppetfile:

  1. Clone your original repo to a temporary directory:
    git clone https://github.com/puppetlabs/puppetlabs-training-bootstrap ~/temp/ptb

  2. Change to the temporary directory:
    cd ~/temp/ptb

  3. Delete the modules directory:
    rm -rf modules

  4. Install the modules using r10k:
    r10k puppetfile install

  5. Check for differences between what git expects and what r10k installed:
    git status
    git diff

What’s Next

Now that you’ve got all that working, it’s probably seeming a little silly to maintain a separate puppetfile for each environment. Maybe you even had the idea that you could just create a new branch in your main repo for each environment, and check those out into your environment directories. To get the development environment modules, you can check out a branch called development into the proper folder. That’s actually the next level of using r10k. Now that you’re using the better practice and splitting up your code, you’re in a great position to move on to the next level.

I hope this has inspired you to implement some better practices in your Puppet code, even if you don’t end up doing anything more advanced with r10k. We made this change in the education team, and it has really improved our workflow. Because we don’t need to work with environments for our VM builds, this is probably the limit of how far we’ll go with r10k. Depending on your context, you might not need much more.

As much as best practices are important in ops, there is a reason we don't all use best practices all the time. It’s because the half-assed, jury-rigged solutions that we create in the moment are actually working. The line between dirty hack and elegant solution is often surprisingly thin, and can be crossed just by spending a little time improving old code — and just as easily crossed back by a last-minute fix.

That’s why I want to promote this idea of working on better practices. Sometimes all you have time for is to make things a bit better, and you don’t have the time to make them perfect. If you did, you would’ve done it perfectly in the first place.

Josh Samuelson is a training solutions engineer at Puppet Labs.

Learn More

Share via:
Posted in:
Tagged:

I was able to get it to work in Safari and Chrome even in incognito mode so I was sure that I didn't have a cookie or something. Can you try in another browser?

i've tried with numerous proxy (aws frankfurt, vooservers london, upc warsaw), and i'm getting "sorry" everywhere.
maybe it's locked to US and i don't have access to any US proxy to test.

Just sharing a small Python script that can be used for the Big Split (to avoid the repetition...):   https://github.com/stivesso/My_Public_Python_scripts/blob/master/split_monolithic_puppet_git_repository.py

The content of this field is kept private and will not be shown publicly.

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.