Mapping the Puppet Forge
A long time ago (well, June of this year) the Puppet Forge was running without a leader. In my role as community manager, I saw the Forge as having this awesome potential to be the resource for user-generated content surrounding the Puppet community. I knew it was getting more attention, but that was mostly anecdotal. My next step was to find some data that could tell a good story. Puppet Modules are often the first way people learn and start using Puppet. We’ve had our Puppet Forge for a while, but I didn’t feel like I knew a lot about it. When we were getting ready to interview Product Owners for the Puppet Forge and Modules, I decided I wanted to know more to help me prepare for the interview, and maybe give me some insight into usage patterns that I hadn’t thought about. Like any geek, I love data. I knew we had all sorts of data in our module download logs, but we had not ever really taken the time to transform that data into awesome information. I started with simple awk/sed/grep to find basic information, like what modules were popular. This worked for a time, but then I wanted to know modules by name, find popular authors, and do things like ignore version number changes.
First look: hacks on hacks on hacksI ended up writing my own log processor (that isn’t very exciting). The output however, is interesting. This was the initial output of a script I whipped up one afternoon. The numbers next to the author or module name are download counts. This covers a period of several months from the May-July timeframe. ============= Authors (top 25) ================ puppetlabs 33293 ghoneycutt 6109 saz 4059 dhoppe 2485 DavidSchmitt 1925 thias 1583 jamtur01 1059 ripienaar 990 camptocamp 859 bobsh 810 mstanislav 786 rcoleman 734 lab42 643 brightbox 598 razorsedge 570 adobe 565 rafaelfc 505 jeffmccune 500 BenoitCattie 479 puppetmanaged 413 bcarpio 400 rocha 394 attachmentgenie 388 dcsobral 347 pdxcat 340 ============= Modules (top 25) ================ puppetlabs-stdlib 5224 puppetlabs-firewall 2929 puppetlabs-apt 2671 puppetlabs-apache 2232 puppetlabs-mysql 2126 puppetlabs-vcsrepo 1575 saz-sudo 1383 puppetlabs-razor 1185 puppetlabs-mongodb 1137 puppetlabs-nodejs 1077 puppetlabs-tftp 1001 ripienaar-concat 990 puppetlabs-ruby 931 puppetlabs-ntp 813 puppetlabs-motd 804 puppetlabs-nginx 626 puppetlabs-cloud_provisioner 594 saz-memcached 482 puppetlabs-rabbitmq 470 puppetlabs-xinetd 469 thias-apache_httpd 460 puppetlabs-java 458 puppetlabs-passenger 446 camptocamp-tomcat 414 BenoitCattie-nginx 393 I liked what this gave me. I could now see what modules were being downloaded, and who was writing popular modules. This was fun to look at, and gave me quite a few things to think about, but it wasn’t very visual. I wanted something pretty to look at, and tell a story more than a fixed-width text dump.
Second look: Maps on maps on mapsAt that point, I did what any self-respecting recovering system administrator does, I handed it to my subconscious for a solution. We needed maps.
- How many people download modules?
- Where do they download from?
- Are we big in Japan?
- Is Puppet the biggest thing since Björk in Iceland?
- Where’s the most unexpected/farthest away place people have downloaded our software?