In today’s episode, we talk to Nick Maludy DevOps Manager at Encore Technologies. He shares his career journey going from a developer to managing teams and applying DevOps at scale.
Doing the DevOps can be hard and transformation takes time, Nick shares insight on how his team has implemented changes and continues to improve over time. Learn what tools they used and what advice he has to help introduce DevOps into your teams.
Mentioned in the episode:
• Nick's twitter: @NickMaludy
• Puppet's slack channel: http://slack.puppet.com
• Blogs resources: https://puppet.com/docs/bolt/latest/bolt.html
• encore tech blog: https://encore.tech/blog/
• S**t Gary Says: http://garylarizza.com/
• Rob Nelson's blog: https://rnelson0.com/
• Bolt hands-on lab: https://puppetlabs.github.io/bolt/
Yasmin Rajabi is a Principal Product Manager at Puppet.
Yasmin [00:00:19] Welcome everyone to pulling the strings, the Puppet podcast. My name is Yasmin and today I will be your host. This is my first podcast at puppet where I'm the host so please bear with me. I am a product manager at puppet and I currently focus on Bolt, one of our newest open source projects. I'm really excited for today's session because we have Nick Maludy with me today. For those of you who have been following the bolt project, Nick's been there from the beginning so I'm really excited to spend some time with him and learn a little bit about what he's up to. Do you want to introduce yourself?
Nick [00:00:52] Hi everybody. I'm Nick Maludy, I'm the DevOps manager at Encore technologies.
Yasmin [00:00:57] And so do you want to tell the audience a little bit about when you started using puppet?
Nick [00:01:03] Sure. So start using puppet in the winter of like January 2017 or so. We had some problems in our environment and puppet was one of the products that came up to help us solve them.
Yasmin [00:01:16] And when did you start using Bolt?
Nick [00:01:18] Bolt was probably the following year. I think you guys released it in the fall of 2017 and I happened to catch one of the YouTube replays of the puppetconf and found it that way.
Yasmin [00:01:30] It feels like it was just yesterday but also so long it does. So what's your setup like today. How are you using puppet?
Nick [00:01:38] So we use open source puppet we run multiple compile masters with a single CA server, load balanced behind HA proxy right now and then use bolt for all of our remote execution for patching node deployments that sort of stuff.
Yasmin [00:01:53] That's awesome. Can you tell me a little bit about how your organization has been adopting DevOps? You said that your title is DevOps manager so you want to let us know about your role and how that's changed?
Nick [00:02:05] Yeah. So it's interesting. So I started off originally as an engineer that lasted about a month and we needed someone to fill the role of a leader within the organization. At the time it was just a small team of engineers so they promoted me to manager and I was a manager of our three person team or four person team. And then over the following couple of years we had people come and go and teams merge and diverge and now I'm the manager of basically two distinct teams but it's a total of about 20 people right now.
Yasmin [00:02:36] Nice.
Nick [00:02:37] And each one of those teams is, one of them I would consider more of a development team. They focus and that was my original DevOps team, the focus is on our our tooling are our build pipelines, our automation and then we have another team which is basically our operations team but we're in the process of teaching them more about DevOps and then turning them hopefully into more of an SRE team is my my vision there.
Yasmin [00:03:01] That's awesome. So you both sides of it - Dev and Ops. How has that been going.
Nick [00:03:05] It's good and it's it's very interesting. I'm learning a lot. I'm a traditional developer so I've never done the operations side of it but it turns out it's not that different from what I'm used to in terms of just keeping our software running anyways. It's been a lot of fun and it's been a really enjoyable experience as a manager because I've been helping teach a lot of these engineers new skills, helping them show the vision of how things can be and then also seeing them realize these little nuggets for themselves and then be able to go and off into their own little worlds and achieve good things and make differences in inside of their day to day. So it's been really really rewarding.
Yasmin [00:03:40] That's great. Can you tell me a little bit about the teaching process? What types of things are you implementing or using to enable the team to learn?
Nick [00:03:48] So we started off again a traditional operations team, the ops team manages things like VMware networks, our storage, load balancers, Citrix that sort of stuff. And then also servers, Windows, Linux not so much the Linux side but mostly the Windows side for that team and they were just there to keep the lights on right. Patch things, add remove change, that sort of stuff, work tickets, a lot of service now work and the learning process really began when we saw the success of our DevOps team and how accelerated they were and how little they had to be on call how little they had to support their their own infrastructure. We were able to automate a lot of it and deploy it consistently across multiple customers. As we want to try to replicate that with our other products we supported on our operations side of the house, and so we identified kind of the key skills and there's a whole list of them that these operations people that don't know anything about code, don't know anything about git, need to go and start learning that a lot of them but never logged into Linux command line before. So we identified this skill set and then we shuffled it around and organized it into a roadmap and kind of a course catalog if you will. So we said OK we need to do Linux command line first because everything kind of runs out Linux and then we need to do git on the command line and make sure they know how to use git and github and bit bucket and then eventually we've progressed. And now we're actually I'm super proud to say we've gone through two full training cycles on puppet itself and now a team of 20 are at least at a beginner level familiar with puppet and able to go and use it to tackle challenges on their own.
Yasmin [00:05:29] Wow that's really great. Sounds like you're a really good team.
Nick [00:05:31] Yeah. We're really really fortunate in that aspect. We also as an organization had to donate things to this learning process. Right it was not just a expect everyone to go pick this stuff up. You know me as a manager I had to plan this stuff out. Understand what the curriculum needed to look like, understand the objectives we needed. But then we needed to purchase some online learning tools to help us because I don't have time to go and create all this content myself. Some of the things we do we supplement here and there where it makes sense to say hey here's the way Encore does it versus the way you know the rest the world does it. Or here's where we take the baseline tool and then apply it into our customer environments. And then also we have to dedicate time from our organization to allow these people to go do this training and learning effort. So our budget is one hour per day so five hours a week we go and say hey engineers at Encore you have this learning budget. Here's the things you need to accomplish. And we try to set deadlines for the each learning objectives so that the team is kind of going along at the same pace. And eventually we've slowly but surely and actually not that that long of a period we've been doing this for about eight or nine months now and I feel really happy with the progress we've made in that time.
Yasmin [00:06:40] That's great. What's been the reaction of the team so far?
Nick [00:06:43] It's funny at the start. They hated it. They hated having homework again. They hated having because we do not just online learning and videos. We have kind of recommended or required exercises they need to go do not that it's anything super challenging. But for example in puppet they had to go set up a server, set up a puppet master, write some manifests, apply those to a node, change it around a little bit and then provide submissions along the way in the form of repos. And at first they hated it. They didn't like again doing that homework and that extra effort. But over time as they run into challenges where they need these skills I've gotten positive feedback where, for example, one of the engineers that's a VMware engineer needed to log into an ESX Host and he said it was the only time in his life he's logged into an ESX host and has not been completely lost the command line. He was able to go and run basic commands and execute those things without having to go look it up on Google. Another example of that was just the other day we were going through and reviewing some of our our puppet training that we had done and we had an engineer that was a traditional Windows Active Directory engineer. All of a sudden the light kind of clicked on in the middle of the meeting. He was able to visualize and articulate the roles and profiles he wanted to go start writing to remove the group policy that we used to have to write. And now we can check it into git and have it as a code repo and then be able to use that for multiple customers. Or right now he has to point and click all that stuff inside a group policy. He's like this immediately this could make my life better. That was again very rewarding.
Yasmin [00:08:09] That's awesome. So sounds like not only are they excited about it but it seems to be having an impact on your business as well.
Nick [00:08:16] I I definitely agree with that. We're able to deliver things much more consistently. We're able to deliver things faster. So let me back up and describe Encore as a business. So we're a managed service provider primarily. What that means is we basically run other people's I.T. infrastructure for them. Basically the again the stack that I mentioned earlier, the way we make money is by scaling that that same process across multiple customers. And so as a team if we're able to develop a solution that can scale quickly across multiple customers and be repeatable and consistent we're better as a business and we can stretch our resources further and our small team of people can manage more and more clients and more and more servers and we can make more money which is good for the bottom line. So yeah from that aspect the bolt and puppet have helped us from a consistency perspective. Right. When we put something in code in a role or a profile or manifest is the same way and it's executed the same in every single environment and that helps us from a reproducibility perspective when we're going in scaling out these different configuration changes across our environments.
Yasmin [00:09:17] So what advice would you have to other organizations that are implementing DevOps and what tools have helped you? You've mentioned puppet and bolt a little bit but what advice would you have for them.
Nick [00:09:28] See I would say the first thing is you need to describe the engineers that it's not a fearful thing. We're not trying to take anyone's job by implementing DevOps and automation. A lot of the companies that we run into they have this fear mindset and I think it's overcome fairly quickly but it just needs to be a leadership driven decision to say hey we're not trying to fire anybody we actually want to make all of you smarter and better and work more effectively because that's better for our business.
Nick [00:09:55] You know it's an expensive thing to go hire a person and to train them. We want to lose any of that knowledge. We just want to make you better as an engineer and make your talents go further. I think some of the other tools and techniques that we've used is an agile methodology we use internally not with a capital A agile as I like to say where we're the lower case kind of agile where we actually try to be fluid in our process and not let the process and the paperwork slow us down. So we're constantly changing, loosening the rules, tightening the rules, deleting process where it doesn't make sense anymore and introspecting and looking at ourselves.
Nick [00:10:36] That's awesome. And have you kind of ran into any roadblocks or any failures that you've learned from that are worth sharing out to others that are experiencing this today?
Nick [00:10:47] I would say from a failure perspective we probably didn't do this soon enough is really my biggest complaint. We tried to do other things in the interim to transform their organization or to fix the individual problems that we were having rather than than biting the bullet and saying hey we just need people to work differently. And that I think because those individual changes and the bandaids caused more problems and really just kicked the can down the road. It just extended the pain other than than saying as an organization hey we need to make a change and it's going to take time and it's going to be different. But in the end it will be better.
Yasmin [00:11:22] Since you've been the manager of this throughout the entire time when they're implementing DevOps, what have been some of the biggest challenges for you in helping the team along the ride?
Nick [00:11:33] I think my biggest challenge is taking a step back from coding day to day. I grew up as a problem solver. I went to school as an engineer. I spent my time post college doing heavy deep problem solving and engineering work and then having to go and manage people is a lot different and I can't be in the weeds all the time and that's a struggle for me. And it's a struggle every day and I try to find some things here and there where I can use my engineering efforts like the other day I was writing some Python code to do something that I couldn't figure out how to do in Excel. I did it in five minutes instead of an hour it would have taken me. But yeah that's been a huge challenge for me as a as a former engineer going into that that management role, taking that step back. And then realizing that people are people right and they sometimes need different levels of effort right. It's not just a piece of code I can go recompile a rerun. I need to think about the bigger picture and how words and actions can influence people and in positive ways, right. And helping encourage people rather than just yelling at them because yelling doesn't work. You know we we need to put the carrot out there and help feed their their minds and their bodies and their spirits to help go and work towards that that better goal that you want to get them down the road on.
Yasmin [00:12:45] That's great. And what have there been any tools that have helped you in this process?
Nick [00:12:52] I think the biggest ones have been puppet and Bolt. That was our original goal we started with puppet it was our tool to help solidify our server configurations which were either being done manually or with like ssh and a for loop semantics. That's been huge. Bolt has been another one we were doing our ad hoc remote code execution in this other this other tool and Bolt was something that we were able to use to kind of unify and it's actually helped out on our learning journey as well where the common language between bolt and puppet makes it such that our our engineers don't have to learn a whole another language to go and do this ad hoc automation versus configuration management. They can kind of learn one and understand the other one pretty easily. Another tool that we like to use pretty often here is a stack storm. Stack storm is our orchestration layer. It kind of ties all of our automation together and this glue kind of layer and then finally I think service now has been a big one for us. We're able to expose these service catalog automation items into a gooey that then our customers can go and consume and run puppet commands in the background or run stack storm actions or something like that.
Yasmin [00:13:58] That's great. So all these tools are kind of fitting into your self-service portal kind of?
Nick [00:14:03] Yeah. Yeah basically our service now is are our self-service portal. And at a high level it's a light wrapper. It's basically our user interface and there's a light wrapper that then goes and makes an API call to stack storm and stack storm then usually runs some puppet plan or some bolt plan or applies a puppet manifest onto an end node. That's the generals high level 30000 foot view of the way that it works.
Yasmin [00:14:28] That's great. And so obviously I'm a little biased on Bolt but I'm really curious just from an organizational standpoint what impact that software has had and like with things that you've been able to implement maybe to improve workflows or anything that you could share out.
Nick [00:14:44] Yeah my primary one that I'll to talk about is with patching. We used to patch our servers quarterly and which is fine. That met our security compliance standards at the time. As you know over the past couple of years security vulnerabilities are released more and more often. A lot of times there are zero days where you actually need to go patch today because the vulnerabilities out in the wild. And a lot of that has changed the way that we think about how often these things need to happen and Bolt has given us the ability to automate our patching process such that we can run it every single week if we need to. And so we run our our patching process every Monday or Tuesday depending on whether it's Windows or Linux. And now we don't have the fear and it all runs in a single day. So if at any single given day or on a Thursday or Friday if a new CVE is dropped I have the confidence in my tool sets and my team's capabilities to go and patch every single one of our systems and our network. And that's pretty huge for me. From a service delivery team we don't have to worry about that first level defense right. Making sure those those things are patched and are our backyards clean.
Yasmin [00:15:50] And are you working with the security teams on gathering that CVE data?
Nick [00:15:54] Yeah. So sometimes our security teams will go and send us an email saying hey this CVE just dropped. We need to go patch it. Other times just people reading the register and seeing how bad something was just broken. Other times it's an engineer parcing through release notes. There's this kind of a mixture there and then the others on the security team side. We have vulnerability scans that go on. Those are not as regular as every week. But when we do get output so the vulnerability scans will go and then use bolt to go and apply patches to all the systems are affected. Which is also why I'm excited about the new puppet remediate. So we use a scanner in-house house that's covered under puppet remediate and I'm super excited to tie in our existing remediation workflows or our patching workflows to the output of those tools using that.
Yasmin [00:16:37] I know John is so excited to talk to you. I think it's really nice to see how your bolt usage has evolved just from the beginning the project and also just all of the how valuable your feedback has been to our product roadmaps across the board.
Nick [00:16:51] I appreciate that. Thank you.
Yasmin [00:16:52] So I guess we've talked a lot about the things that you've learned and what you've implemented but what are your goals in the next year for the team or the technology and the things you have planned?
Nick [00:17:03] So we're gonna keep educating. We are about halfway through our education bullet points. I think we're through a lot of the easy ones and we're now onto the hard ones. I actually talked to the team late last week and said we're going to stick on the puppet bullet points for a while and we're gonna go deep on this one. One of the things that we're doing that's kind of interesting instead of just going through videos we're going to do some collaborative learning and tackle some problems together as a team and try that out, see how it works, which basically means we're going to take a Windows server that's not Puppetize yet. And as a team we're gonna go Puppetize it. We're gonna go through the process of understanding breaking down the build process figuring out what the components are how the configuration changes are made and then determining how do I apply puppet resources to each one of those things. And then breaking that out into Hiera, understanding what pieces of Hiera need to be implemented. And then finally the operational side of it is what one off commands do we run. And then pulling those things into bolt plans. A next one to kind of follow in through that is our coverage on our windows and VMware side of the house. So we have some of our windows systems covered under puppet management but that's a very small number and we're trying to expand that out as much as we can. And as we go through this learning objective and more of our our windows engineers are versed in puppet they'll be able to help implement that and then VMware on that side of the house. We have some some cool things in the works around VMware and power show to help our windows engineers and our VMware engineers, they're familiar with things like power CLI be able to run that through bolt and through puppet to help manage our VMware environments. And then I think finally the other one that we're working on is taking more of that the self-service approach that we provide for our customers and helping provide that for our engineers. So taking our our ops work that we do and trying to put that into bolt as much as we can instead of having people just go run commands manually we want to provide those things as plans and bolt primarily and then front end it with the service now or chat ops so that engineers can go execute that stuff whenever they need to.
Yasmin [00:18:59] That sounds really exciting that's a packed roadmap.
Nick [00:19:02] Yeah. Yeah. I got a lot of stuff to do.
Yasmin [00:19:06] As you've been doing this and as you're kind of planning it out, are there any resources on the puppet side or bolt side that have been helpful to you that you think you want to share advice on what's been helpful so far?
Nick [00:19:17] Yeah. On the the puppet side the shit Gary says blog is good. The puppet documentation itself has been really good. Our Nelson has a blog and then also the puppet slack. I have to plug I'm on there quite a bit. It's amazing how a simple question can get answered very quickly by resources that are close to the problem. I think that kind of back and forth helps helps quite a bit.
Yasmin [00:19:42] Well since you brought it up. How can people find you on the puppet slack?
Nick [00:19:46] That's @nmaludy and I usually hang out in the Bolt channel.
Yasmin [00:19:52] Well it's been a really good time talking with you learning about how our tools have helped make an impact at your organization and just really your career growth has been awesome. I know you're pretty active on the socials. So do you want to let people know where they can find you when you're not on the puppet slack?
Nick [00:20:10] Yeah at @nickmaludy on Twitter and I also blog occasionally on our our blog at Encore.tech/blog.
Yasmin [00:20:17] Thank you. And that puppet slack channel for those of you who want to go find Nick or other members on the puppet team whether it's bolt or puppet or our new product remediate, it is just puppet.slack.com. You can sign up there and if you are getting started we have a hands on lab which is Bolt.Guide and that'll help you get started. If you are starting your DevOps journey and want to get to the elite level that Nick and his team are at. Thanks Nick again for joining. We're really glad that you could dial in. Thank you all out there for listening. We hope that this was valuable. We will have all the links on the show notes and again if you need anything, hop on the puppet slack, talk to Nick, talk to other people at Puppet. We're happy to be part of the conversation.
Nick [00:21:01] Thanks Yasmin.