Provisioning ESXi servers at Puppet with Bolt

See more posts about: Tips & How To

A few months ago, as part of the infrastructure team at Puppet, I began work on a project to add about a dozen ESXi servers to our internal CI service. Naturally this was an opportunity to see to what extent I could automate the provisioning of these servers, and this post summarizes what I learned.

The process our team had used previously was to follow a document containing a mish-mash of CLI commands and shell scripts. The document was designed to be worked from top to bottom in sort of a decision tree, with headings like “Common Network Setup” and then “Supplemental Network settings for Application Foo” and subheadings “Cisco model X” / “Dell model Y”.

For the most part this document outlined an existing standard around what networks, interfaces, and storage backends we needed to configure, but there was some variation based on what kind of device we were configuring. Taking stock of what we’d done in the past, it was clear that we were in a good place to lay the groundwork for something more cohesive and automated.

Bolt as a first step in building better automation

Bolt is a great entry point for building automation in a situation like this. It can easily run the collection of single commands or scripts you have squirreled away somewhere on groups of servers with minimal setup. You can get as simple or complex as you want (or have time for), and you can add logic that takes some of the decision making embedded in an existing document into code.

My goal in the first pass at this was to have a working script that would configure the set of servers I was provisioning with a single Bolt command using ESXCLI.

ESXCLI can configure just about anything in ESXi. I’ll give some examples and provide a link to the command reference. One thing that makes ESXCLI somewhat tricky is that it doesn’t output anything in a lot of situations. If a command returns an exit code other than 0, then Bolt will raise a failure in the output. However, we want to review output from Bolt and be able to tell what actually happened, so we can do one of two things to add verbosity:

  • Option one: Add the --formatter=keyvalue option to each of our commands. This will output a result “boolean=true” if the command succeeds, or some helpful info if it doesn’t e.g: “A portgroup with the name storage1 already exists”.
  • Option two: Add list / get commands to show the end result of the configuration we’re changing throughout the script.

Here’s a partial provisioning script that adds a vSwitch and some portgroups, as well as a storage backend. I’ve added additional commands (Option Two above) so that we can review the configuration in the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
dell_esxi_provisioning.sh

#!/bin/sh

##Add a vSwitch for VM port groups
esxcli network vswitch standard add -v vSwitch1
esxcli network vswitch standard uplink add -v vSwitch1 -u vmnic2
esxcli network vswitch standard uplink add -v vSwitch1 -u vmnic3
esxcli network vswitch standard policy failover set -v vSwitch1 -a vmnic2,vmnic3
esxcli network vswitch standard list        ## For Output

##Add some port groups
esxcli network vswitch standard portgroup add -p testnet1 -v vSwitch1
esxcli network vswitch standard portgroup set -p testnet1 --vlan-id 12
esxcli network vswitch standard portgroup add -p testnet2 -v vSwitch1
esxcli network vswitch standard portgroup set -p testnet2 --vlan-id 24
esxcli network vswitch standard portgroup list        ## For Output

##Add storage backends
esxcli storage nfs add -H nfs.example.com -s /shared/vmware -v nfs-share-01
esxcli storage nfs list        ## For Output

When you run this with Bolt, it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
bolt script run testing.sh -t esxihost-01.example.com -u root -p $esxi_pass

Started on esxihost-01.example.com ...
Finished on esxihost-01.example.com:
  STDOUT:
	vSwitch1
   	Name: vSwitch1
   	Class: cswitch
   	Num Ports: 11776
   	Used Ports: 8
   	Configured Ports: 128
   	MTU: 9000
   	CDP Status: listen
   	Beacon Enabled: false
   	Beacon Interval: 1
   	Beacon Threshold: 3
   	Beacon Required By:
   	Uplinks: vmnic2, vmnic3
   	Portgroups: testnet1, testnet2, Management Network

	Name            	Virtual Switch  Active Clients  VLAN ID
	------------------  --------------       --------------     -------
	Management  vSwitch1                 	1   	22
	testnet1        	vSwitch1                	0   	12
	testnet2         	vSwitch1                	0  	24

	Volume Name            	Host                                   	Share         	Accessible  Mounted 
	-------------------------  -----------------------------------------  ----------------  ----------  -------  ---------  
	nfs-share-01    	nfs.example.com /shared/vmware    	true       true  	          false   

Successful on 1 target: esxihost-01.example.com 
Ran on 1 target in 3.12 sec

Next steps: Expanding our use case to include delegation and provisioning from the Puppet Enterprise console

As I mentioned, the infrastructure team manages ESXi servers on a few different kinds of hardware. At this point, I had a script that could provision any number of ESXi servers on one type of hardware, but what if I wanted to add the ability to provision other hardware platforms and then delegate provisioning to a colleague with less access than myself?

I wanted to wrap the running of our provisioning script in a task to enable someone to run it from the Puppet Enterprise Console. This allows delegation of running the task to someone who doesn’t have root access to the ESXi servers. We only need to allow access to the task and the target server (as an inventory object) in the Puppet Enterprise Console.

Side note: the concept of an agentless inventory is somewhat new to Puppet Enterprise. If you’re unfamiliar, you can read about this here. By adding agentless nodes to the inventory, I can delegate running a task to a user, and that user doesn’t need credentials to the target server. I can set those up as an administrator and grant them to the user and limit the user to running this specific task. Here’s an example of what the permissions look like:

To expand what hardware I can provision with a task, I simply need to write and test scripts for each of my other hardware types and write a simple script wrapper. Then I compile everything as a Puppet module and install it on my master. The module structure looks like this:

1
2
3
4
5
6
7
8
/etc/puppetlabs/code/environments/production/modules/esxi
├── files
│   ├── cisco_esxi_provisioning.sh
│   ├── dell_esxi_provisioning.sh
│   └── hp_esxi_provisioning.sh
└── tasks
	├── provision.json
	└── provision.sh

The scripts in the files directory are the scripts that I wrote for each hardware profile, and the task directory contains a json file which contains metadata for the “provision” task as well as provision.sh, which is the task itself.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# cat tasks/provision.json
{
	"files": [
    	  "esxi/files/dell_esxi_provisioning.sh",
    	  "esxi/files/cisco_esxi_provisioning.sh",
    	  "esxi/files/hp_esxi_provisioning.sh"
	],
	"input_method": "environment",
	"parameters": {
    	  "server_type": {
        	    "type": "Enum['dell','cisco','hp']"
    	  }
	}
}

# cat tasks/provision.sh
#!/bin/sh

script_file="${PT__installdir}/esxi/files/${PT_server_type}_esxi_provisioning.sh"

chmod +x ${PT__installdir}/esxi/files/${PT_server_type}_esxi_provisioning.sh

eval "$script_file"

When the user goes to run this task in the PE console, they only need to choose the server_type (equipment) that they’re provisioning, and they’re presented with a dropdown for this field:

Screenshot of choosing server type in PE console

Again, the user only has access to run the script I’ve created on a group of targets that I specify in the user’s RBAC permissions, and they don’t need to know (and can’t access) the credentials that I added for them.

Example output:

Ideas for further development

For our team’s purposes and for code reusability, I’d like to see each piece of configuration broken down into a single task with input parameters for each item we’re configuring. We could still call each individual task as part of a larger provisioning task (or plan), but this would create a more flexible code-base that could be reused and repurposed as we take on new kinds of hardware, or for more ad-hoc configuration changes. It should also be possible to stage and run patching / upgrades, and do all sorts of administrative things with either Bolt and / or the console. I’ll be exploring this further (and blogging more about it) as we continue to build out and maintain this environment.

Erik Hansen is a site reliability engineer at Puppet.

Learn more

Puppet sites use proprietary and third-party cookies. By using our sites, you agree to our cookie policy.