homeblogmapping puppet forge

Mapping the Puppet Forge

A long time ago (well, June of this year) the Puppet Forge was running without a leader. In my role as community manager, I saw the Forge as having this awesome potential to be the resource for user-generated content surrounding the Puppet community. I knew it was getting more attention, but that was mostly anecdotal. My next step was to find some data that could tell a good story. Puppet Modules are often the first way people learn and start using Puppet. We’ve had our Puppet Forge for a while, but I didn’t feel like I knew a lot about it. When we were getting ready to interview Product Owners for the Puppet Forge and Modules, I decided I wanted to know more to help me prepare for the interview, and maybe give me some insight into usage patterns that I hadn’t thought about. Like any geek, I love data. I knew we had all sorts of data in our module download logs, but we had not ever really taken the time to transform that data into awesome information. I started with simple awk/sed/grep to find basic information, like what modules were popular. This worked for a time, but then I wanted to know modules by name, find popular authors, and do things like ignore version number changes.

First look: hacks on hacks on hacks

I ended up writing my own log processor (that isn’t very exciting). The output however, is interesting. This was the initial output of a script I whipped up one afternoon. The numbers next to the author or module name are download counts. This covers a period of several months from the May-July timeframe. ============= Authors (top 25) ================ puppetlabs 33293 ghoneycutt 6109 saz 4059 dhoppe 2485 DavidSchmitt 1925 thias 1583 jamtur01 1059 ripienaar 990 camptocamp 859 bobsh 810 mstanislav 786 rcoleman 734 lab42 643 brightbox 598 razorsedge 570 adobe 565 rafaelfc 505 jeffmccune 500 BenoitCattie 479 puppetmanaged 413 bcarpio 400 rocha 394 attachmentgenie 388 dcsobral 347 pdxcat 340 ============= Modules (top 25) ================ puppetlabs-stdlib 5224 puppetlabs-firewall 2929 puppetlabs-apt 2671 puppetlabs-apache 2232 puppetlabs-mysql 2126 puppetlabs-vcsrepo 1575 saz-sudo 1383 puppetlabs-razor 1185 puppetlabs-mongodb 1137 puppetlabs-nodejs 1077 puppetlabs-tftp 1001 ripienaar-concat 990 puppetlabs-ruby 931 puppetlabs-ntp 813 puppetlabs-motd 804 puppetlabs-nginx 626 puppetlabs-cloud_provisioner 594 saz-memcached 482 puppetlabs-rabbitmq 470 puppetlabs-xinetd 469 thias-apache_httpd 460 puppetlabs-java 458 puppetlabs-passenger 446 camptocamp-tomcat 414 BenoitCattie-nginx 393 I liked what this gave me. I could now see what modules were being downloaded, and who was writing popular modules. This was fun to look at, and gave me quite a few things to think about, but it wasn’t very visual. I wanted something pretty to look at, and tell a story more than a fixed-width text dump.

Second look: Maps on maps on maps

At that point, I did what any self-respecting recovering system administrator does, I handed it to my subconscious for a solution. We needed maps.
  • How many people download modules?
  • Where do they download from?
  • Are we big in Japan?
  • Is Puppet the biggest thing since Björk in Iceland?
  • Where’s the most unexpected/farthest away place people have downloaded our software?
These are questions completely answerable from our logs. So, I started with the module logs and looked into GeoIP and map making software. Before too long, I came up with this: This showed locations of downloads using GeoIP technology. This is pretty fun to look at and does answer some questions. Are we big in Japan? Well, we’re not doing too bad, but the USA and Western Europe are super-represented. Then we compare it with download counts per country. Click on the map to go to a version where you can mouse-over the country to see download counts. Puppet Forge Module downloads by country Here you can see Australia, India and China are using Puppet modules a ton as well. I spent a lot of time playing with these maps, and wanted to open them up so you can as well. Be sure to let us know if there's any other data you're interested in seeing, and share back any interesting questions you've answered. The Forge still holds worlds of untapped potential and data. Ryan Coleman will probably blog in the future about trends we’re seeing on from the Puppet Forge, and about other interesting tidbits of user-generated content in the Puppet ecosystem. Until then, keep on Puppetizing and putting modules on the forge. Learn More: