The UX-ening of role-based access control for Puppet Tasks
Editor’s note: This post was originally published on an internal blog for Puppet employees. We’ve shared it here for a behind-the-scenes look into the UX practices at Puppet.
For the new role-based access control (RBAC) feature for Puppet Tasks, we adopted a new agile approach to define, design and plan the feature. In this blog post, I’d like to walk you through the UX side of the process.
Agile practices and UX design have felt like opposing methodologies for most of my career up until this point. In the past when I designed only in small increments it led to incomplete workflows and odd, patchwork page layouts. On the flip side, when I presented a full-fledged high fidelity UX vision with detailed specs to engineering we ran into roadblock after roadblock. But why? We had it all figured out, right? We did research, design, iterations, meetings, more research. This surely meant that we had ironed out all the kinks. Nope, not right. It was too big to correctly estimate. We also built it in chunks which, alone, offered no user value and meant we had to implement the whole burrito by the release date or else pull the plug. We were also pushing hard to get the whole thing delivered by a certain date and so had no time to learn or pivot along the way.
That’s what happened at the beginning of 2017 when we built Direct Puppet in the Puppet Enterprise console. It was painful. After some serious retro-ing we decided on a new approach:
Build the thinnest layer of a complete user workflow each sprint and deliver something that provides user value. This allows us to learn and gets us closer to an ideal solution.
That is hard to stomach as a designer. “But what about the big picture? How do we know what we are working toward?” Thankfully, in our retrospective the whole team also agreed that the vision has to come first and we decided that we would spend some time (a finite amount of time) figuring out the direction we wanted to go. We also agreed that this direction might change based on what we learned along the way. That meant we might have to throw away design and engineering work as we went.
The team spent about five weeks doing discovery research, collecting and prioritizing stories, sketching, and defining our designs before we felt ready to pull any engineering work into a sprint. Even then our design was not complete. Once development started, it happened in parallel with design work and was based on just the pieces we felt confident with at the time. The result was that we built this in three iterations, with each iteration a complete workflow that solved the user problem at some level and got us closer to our ideal design.
When engineering was working on workflow layer #1 (which didn’t require any new UI), I was doing research and further refining the design vision. This allowed us to reconsider our next workflow layer as we learned new things. By workflow layer #3, the design and the implementation had merged together into the product we are all excited to be shipping for Puppet Enterprise 2018.1.
This project spanned a few months , with other projects and holidays also in the mix. Below I’ll walk through the first five weeks of discovery, planning, and design and then the following 16 weeks of Agile with design tweaking, learning, and implementation. Be forewarned, this goes into great detail.
Step 1: gathering user stories
Before diving into workflows or sketches, we needed to learn from our users what their RBAC needs were surrounding ad hoc tasks. We needed to gather real job stories. So we talked to four customers and essentially learned that:
Customers need to limit which tasks their users can run on which nodes. (e.g., their developers can run ‘all tasks’ on their dev app nodes, but on their production nodes they can only run safe tasks, like echo or checking a package status.)
Step 2: prioritizing user stories
From our research and from other conversations that our products team had with customers, the team met and pulled together our list of stories, which we reviewed, revised, and prioritized with the team.
These were the stories we pulled up to the top:
- As an admin, before I can enable my developers to run tasks, I need to limit what tasks they can run and on what nodes, so that they don’t do something destructive.
- As a developer, when I want to run a task, I need to only be able to run the tasks I can run on the nodes I can run them, so I don’t do something I’m not supposed to.
As a developer, when I want to run a task, I only need to view the nodes that I can run on.
Step 3: sketching ideas
Once we had our stories in order and knew which we were solving and not, it was time to ‘diverge.’ I love diverging! The idea is to explore the range of possibilities and put vague concepts (good or bad) onto paper. It gives you permission to think outside the straight and obvious path.
So as a feature team — engineers, product owner, engineering manager, QA and all — we got together and sketched our ideas for how to solve those key stories. After comparing all these sketches, we picked the ‘best’ direction that we thought was at an intersection of feasibility, usability, and good experience. Concepts from multiple designs made their way into the ‘best direction.’
Step 4: initial workflow and wireframes
At this point, now that we had a vision in mind (not yet designed), we felt confident about some of the underpinnings of the feature and decided we could start to ticket an initial workflow without any UI design. No GUI design was needed for this layer of work because it used all existing components.
Workflow layer #1
- Admin creates a text file defining their “task node objects,” which associates a list of tasks that can be run on a list of nodes.
- In the RBAC permissions UI, admin selects the name of the “task node object” to assign to a role.
- A consumer running a task can only select tasks they have access to run.
- If the consumer tries to run a task on a set of nodes for which that task is not permitted, the job fails.
This workflow isn’t ideal, but it stitches some basic building blocks together and is functional. Would we want to ship it? No. It’s way too cumbersome and hacky. But it enabled an admin to limit who can run what tasks on what nodes and gave us that foundation to build on. We didn’t spend too much time designing the “task node object” text file or the consumer experience since we knew we were going to change those pieces.
While the team collaborated on writing the story ticket for the first workflow layer, I was concurrently working on fleshing out the designs for the vision (or at least what I thought it to be at the time). With this UI, admins would be able to create and manage “task node set” objects and assign them to RBAC roles. This new object would contain a list of tasks and a list of nodes that their consumers could run on. We assumed that users would want to save these objects so they could apply them to multiple roles. (Take note of this assumption here, it comes up again.)
Here’s my first set of wireframes:
Step 5: workflow layer #2
As a team we met to review these designs and agreed on the next wave of work that could be done to support this vision. The “task node object” would be stored to the Orchestrator API in place of the text file.
Workflow layer #2
- Admins create a task node object in a new task node GUI section within RBAC
- They select tasks to include
- They select nodes to limit those tasks to
- Save the new task node object
- In the RBAC user roles section, they select a new permission to “run task node objects”
- They select the task node object from the dropdown
- They hit add
In this meeting we also started to discuss the name of these reusable “task node set things” and then realized that we only needed a temporary name for now. (Agile principal: Defer all decisions to the last responsible moment!) The name we picked was “Dumplings,” a silly name that was very fun to say and that we would hopefully not forget to replace later.
After this meeting, I set out to learn from internal folks (our customer success team) how the designs would hold up to their customers’ needs. After four conversations I took away that these “dumplings” would likely lead to a spaghetti mess where dozens of slightly different dumplings would have to be created for each unique RBAC role. Oops! Believe me, I love spaghetti (and Dozens of Dumplings sounds like a restaurant I would go to), but this concern made sense. Our earlier assumption* (see asterisks above) about the combo of tasks and permitted nodes needing to be reusable no longer seemed valid.
Instead, admins wanted to assign the same set of base tasks to multiple RBAC roles, but permit them to run on different nodes. Therefore, the task list and node list should NOT be bound together in a reusable dumpling. Instead, the task list itself should be reusable as well as the set of nodes, but not together.
Step 6: design pivot
In response to our findings, I set out on a slightly different design direction. This next round had no concept of dumplings (at least not user-facing). Now, the reusable thing was just a “task set,” which users would create and manage and then assign to an RBAC role and limit to a set of nodes.
Reusable task sets that could be assigned to RBAC roles:
Admin selects which "task set" a user has access to, then select which nodes they are permitted to run that task set on:
Oh no! But what about dumplings?? The developers are building dumplings!! Weren’t they pissed?
No need to fret! Because we were trying to be our best, most Agile selves, we knew what we were building was intended for learning and so there would be changes. No design was final.
And it turns out that dumplings didn’t have to be completely thrown out, at least not behind the scenes. In the RBAC UI, when an admin adds a new permission to a user role with a task set and node set, a dumpling is created in the Orchestrator API. The original bits still work. But the concept of a dumpling to the user was no longer relevant. Engineering did have to remove the dumplings interactions in the GUI, but it was worth it because we learned from it and we were replacing it with something clearly better. When we shopped around this new round of designs with "task sets" to internal folks it had much better traction.
Good thing we delayed the decision to give dumplings a real name, because in the end they didn’t need a name after all.
Step 7: choosing the next piece to build
Feeling confident that this was the better direction, UX, Products, and Engineering met to groom these designs. And, again, we asked ourselves:
What’s the thinnest part of this UI that we can build next to deliver value?
We chose to first build the ability to select a single task and limit the permitted nodes using node groups, since we already have that concept of a reusable list of nodes in the console. That work would enable users to limit what task can be run on what nodes.
Workflow layer #3
- From the RBAC UI, admins select “tasks” from the Object type list
- They can select a single task or all tasks
- They select “all nodes” or a node group to limit permitted nodes
- The click “add” and the permission is added to the RBAC permissions table.
Following this meeting, I augmented the designs to represent workflow layer #3. In this iteration (which will be released in Puppet Enterprise 2018.1 to customers) users select a task (or all tasks) and permit it on either ‘all nodes’ or limit it to a single node group:
The consumer selects a task and can only select nodes within the set they are permitted to run that task on. The tooltip indicates that the result of their query is limited by their permissions.
Step 8: prototype research
Before we got too far along on this path, I wanted to validate this workflow for ease of use and clarity, as well as make sure that using node groups for limiting nodes in RBAC was going to work for our users. I also wanted to figure out the value of task sets over adding single tasks at a time. We talked to five customers and learned that:
- Node groups, even if they weren’t yet using them, was an acceptable way to limit permitted nodes for running tasks in RBAC.
- PQL would also be useful for limiting RBAC task scope, so you didn’t have to create and manage groups for this purpose.
- Task sets were seen as a convenience that could be useful down the road as the feature was used more widely.
- Initially, users were quite content assigning a single task at a time (or all tasks) in RBAC (e.g., “permit running the service task on the dev nodes group” or “permit running the package task on the dev node groups”).
Last steps: getting it out the door
Engineering created tickets for the above designs, we groomed workflow layer #3 and started building it. Then we shipped it!
At this point we have completed building a round of design that I consider to be user-ready. It solves the user problem we set out to solve in a minimal, but useful way. This feature will ship as is with Puppet Enterprise 2018.1, but we will continue to improve on it. Next possible steps are:
- Add PQL as a way to specify permitted nodes in RBAC, so admins have more flexibility than what node groups offer today.
- Make it less tedious for admins by adding the ability to add more than one task at a time to a permission
- Make it less repetitive by allowing for reusable task sets that can be added across multiple RBAC roles
For the first time, while working on Puppet Tasks and RBAC for Puppet Tasks, following Agile principles didn’t feel like selling my soul as a designer and delivering patchwork solutions. It took a commitment from the team that we were going to keep learning and improving each sprint and that we would only ship what we agreed was useful and usable. We agreed that it was okay to throw work out in the name of learning and improving.
Early on, at the start of each sprint, I found that my UX hackles were up and I had to verify with the team, “But we are not SHIPPING this to customers, right?” My anxieties lessened with each sprint. In the end, the thing we released is simpler and more usable than what was initially designed. We believe we got a better outcome from our incremental approach than we would have gotten otherwise.
And so, in conclusion, y’all should go get some dumplings with a side of Agile for lunch.
Melinda Campbell is a principal UX architect at Puppet.