One of the many interesting projects I encountered at Supercomputing was OSiRIS, (Open Storage Research Infrastructure) a National Science Foundation pilot project that allows researchers to more easily use data and share it with their peers.
After the show, I caught up with Benjeman Meekhof, a senior administrator of high-performance computing storage at the University of Michigan, wanting to learn more about OSiRIS. Benjeman agreed others would find this topic as interesting as I did, so here's an edited version of our conversation.
Paul: Can you give us a little background on OSiRIS?
Benjeman: OSiRIS is an NSF grant to build a multi-institutional research data storage platform. Our platform is built on top of a Ceph storage cluster spanning the University of Michigan, Wayne State University, and Michigan State University. Indiana University is also a collaborator on the project, focusing on advanced network management. Our challenges include bridging Ceph to federations such as InCommon, managing software-defined networking (SDN) topology, and enabling our users to manage their data lifecycle and metadata needs. More information is at http://www.osris.org/
Often institutions end up building dedicated storage infrastructure on a per-project basis. We hope to provide a shared, transparent, high-speed infrastructure so storage is already available when a researcher needs it. We also believe OSiRIS will make multi-institutional collaboration easier than existing on-campus resources. Rather than having to move data between campuses, scientists can connect directly to our Ceph infrastructure and work directly on shared data sets. Researchers will be able to mount directly CephFS, use the S3 protocol, or use Ceph as a raw object store.
The replicated, distributed nature of Ceph gives us the means to prefer data be stored in proximity to certain groups or distributed evenly across all participating institutions. In any case, we'll leverage software-defined network routing to ensure the most efficient path is being used for data access.
Paul: What are some of the most recent milestones the project has hit recently?
Benjeman: At SC16, we deployed a fourth OSiRIS site to demonstrate that it is possible to successfully run a Ceph cluster over higher-latency WAN distances. Finding the practical limits of a multi-site deployment, and illustrating how you would manage such a deployment, is part of our project mandate.
We've also recently had success in proof-of-concept testing with CERN's LHC (Large Hadron Collider) ATLAS clients accessing data in OSiRIS. Full-scale testing with that is planned soon.
Paul: What role do you play with OSiRIS?
Benjeman: I'm the lead project engineer with UM Advanced Research Computing - Technology Services. ARC-TS coordinates and leads project engineering according to the general management of our project principal investigators (PIs). My role is to coordinate the efforts of our very talented multi-institutional team and ensure we are continually pushing forward on technical implementation.
Paul: Can you tell us a little bit about the infrastructure in use?
Benjamin: We deploy all our systems via network boot with Foreman. At each site, we start by deploying a small smart proxy from a virtual-machine template and from there we can start building the remaining systems. Admins can easily deploy a new VM or hardware at any site by using the central Foreman instance (we use the libvirt plugin to provision new VMs as needed). Everything we deploy is managed by Puppet. At the moment, we deploy just a single Puppet master that is used by all sites (UMich, Wayne State, MSU, IU and for the SC16 show, a fifth site at Salt Lake City). To manage our change workflow, we use a combination of GitHub and r10k to manage Puppet environments.
Paul: What kind of infrastructure challenges does the team face?
Benjeman: Deploying and managing systems at multiple sites and accounting for local differences are both big challenges. As noted already, enabling collaboration at the admin level is also a challenge. So far, I think the tools we've chosen are working well.
Paul: I understand OSiRIS started with a green field. Can you describe the criteria that influenced the infrastructure tooling choices?
Benjeman: We knew we had to start with some kind of configuration management tool — I don't think anybody starts a project without that anymore. What most pushed us to Puppet is the maturity of the tool and size of the community. For nearly every task, we have needed to perform, there was a pre-existing Puppet module of good quality (including one for Ceph). We also know that Puppet is widely used, and some of our group had Puppet experience already. In my previous job, I worked with the LHC ATLAS project, and Puppet was the de facto standard tool for that group, so we knew it was popular in the research computing community.
The power and flexibility of Hiera is also a big plus. Hiera allows you organize site-specific data and present it to the code that manages your infrastructure in a granular, hierarchical way. We thought that Hiera would be a good fit for managing the differences between OSiRIS site deployments.
We also needed a deployment system that would scale to multiple sites Easily, and Foreman, with its smart-proxy system, was a natural fit. Foreman's integration with Puppet is a plus.
Another criterion was our need to deploy quickly, and with a minimum of time spent on non-core infrastructure building tasks. Our impression was that the initial setup of Foreman and Puppet could be done quickly with the foreman-installer tool, and indeed it was. The Git/r10k workflow is well-documented and covered by various overview articles, so it was easy to get started on that as well.
Paul: How is the team using Puppet at your site?
Benjeman: Typically when we deploy a new service, the first step is to find a Puppet module to manage it. We really don't do anything that isn't managed by Puppet, and all of our sites are managed under the same Puppet code tree.
We also use Puppet to initialize our new Ceph storage nodes from information in Hiera. Many of them are exactly the same hardware, so a Hiera level is a logical place to organize.
The distributed nature of our admin team means we really need a workflow that enables collaboration and r10k in combination with Github is working well for that. Admins can work out changes and deploy for testing in a private environment/git branch without interfering with each other. We require changes to our production Puppet branch be submitted via pull request so they get a look by another pair of eyes before going out.
Paul: What aspect of Puppet do you use most frequently, or rely on the most?
Benjeman: The Hiera integration is important for us. We are able to account for information
and group it as needed at different levels of the hierarchy,
and also add levels to the hierarchy as needed. In our main Puppet manifest (the
site:pp file), we
parse information out of our certnames that can then be referenced in
a Hiera level to let us easily place information specific to system
location, role, or other grouping.
For example, we have a couple of different classes of Ceph storage with the same disk arrangement. By adding a component to the hostname to be parsed and resolved to yaml, we can define our Ceph OSD arrangement on these hosts just once for each common type, and let Puppet handle the Ceph disk initialization.
Paul: If you were talking to someone who's facing similar challenges but was just getting started with Puppet, what advice would you offer?
Benjeman: Spend some time reading about best practices for manifests and Hiera, as well as tutorials, before you start writing. Make sure you have a good basic structure with your roles and profiles. Leverage Hiera wherever it makes sense, but don't dump common static things in there that don't benefit from it. Define your workflow for testing and development before you start producing. That said, don't get too hung up on planning. As we have moved forward, we've found better ways to structure things into roles and profiles, as well as additions to our Hiera structure, but we had to learn by doing.
Paul: What are you looking forward to in the future? What's next for OSiRIS?
One of the things we hope to show is that OSiRIS can expand to other sites easily. At the recent SC16 conference, leveraging the infrastructure discussed here, we temporarily deployed a fourth OSiRIS site as an example of how flexibly and easily the infrastructure can grow. In the near future, we'll be putting a small amount of OSiRIS storage at the Van Andel Institute in Grand Rapids, MI, which will be managed under Puppet the same as our other sites.
We also hope that other projects can follow our template and use similar processes to deploy multi-institutional projects.
As far as the project, our next big goal is to onboard the LHC ATLAS project and test our capability to support these researchers on a large scale.
Paul: What kind of numbers are we talking about? Can you paint a picture of the speeds and feeds?
Benjeman: Currently, with our most recent upgrade ongoing, we have 2PB of storage across WSU, MSU, and UM. We'll have nearly 4PB when everything comes online in the near future. Over the course of five years, we project growth to 15PB, assuming trends in hard drive price/size ratio continue. Actual usage of this raw storage depends on data replication options. A typical OSiRIS storage block is a server with attached storage holding 60 disks, each with 8TB of storage. We link these hosts locally with 4x25Gb NICs for 100Gb bandwidth between storage elements within a site.
Connections between sites vary. The minimum is 10Gb, but UM and MSU will have 100Gb link intended to provide higher speed access to the LHC ATLAS computing sites at those two institutions. Our intersite connections leverage the Michigan Lamdba Rail (MiLR) redundant fiber loop, and via a connection from UM to OmniPOP in Chicago, we have 100Gb to R&E (research and evaluation) networks such as Internet2 and ESNet.
To get an idea of the scale of the network bandwidth we have to work with, you might also want to check out this article about our participation in this year's bandwidth challenge at SC16.
Paul: Thanks, Benjeman! I am always amazed at the innovation that takes place in the HPC community, especially the sites using Puppet! I really appreciate you taking the time to tell us about the work you’re doing.
Benjeman: Thanks Paul! I appreciate the opportunity to talk about OSiRIS and we're grateful for the work that's gone into making Puppet a useful tool.
Paul Anderson is a senior professional services engineer at Puppet.