Configuration Management Series Part 3 of 3: Puppet

At the start of 2013, the MSP that I was working for at the time landed its largest client in the history of the company. The issue: this meant that I needed to set up approximately 40 servers in a relatively small period of time. This aside, the prospect of manually installing and configuring 40 physical nodes did not entertain me at all. Network installs with Kickstart were fine, sure, but I needed more this time.

I had been entertaining the thought of implementing some sort of configuration management for our systems for a while. I had always hit a wall as to exactly what to use it for. Implementing it for the simple management of just our SSH keys or monitoring system configs seemed like a bit of a waste. I had lightly touched Chef before, but as mentioned in the last article, did not have much success in getting it set up fully, and did not have much time to pursue it further at that point.

I now had a great opportunity on my hands. I decided to take another look at Puppet, which I had looked at only briefly, and found that it seemed to have a robust community version, a simple quickstart workflow, and quite functional Windows support. Call it a matter of circumstance, but it left a much better impression on me than Chef did.

I set up a Puppet master, set up our Kickstart server to perform the installs and bootstrap Puppet in the post-install, and never looked back.

For this client, I set up Puppet to manage several things, such as SSH keys, monitoring system agents (Nagios and Munin), and even network settings, with excellent results. I went on to implement it for the rest of our network and administration became magnitudes easier.

Since then I have used Puppet to manage web servers, set up databases, manage routing tables, push out new monitoring system agents and configure them, and even implement custom backup servers using rsnapshot. One of my crowning achievements is using Puppet to fully implement the setup of private VPN software appliances, including OpenVPN instances complete with custom Windows installers.

Ladies and gentlemen, I present to you: Puppet.


I am used to this process by now, but I’d probably have to say that standing up a Puppet master is still an extremely easy process. There are plenty of options available – if it’s not possible to use the official repository (although it is recommended), most modern distros do carry relatively up to date versions of the Puppet master and agent (3.4.3 in Ubuntu 14.04 and 3.6.2 in CentOS 7). Installation of both the WEBrick and Passenger versions are both very straightforward and not many (if any) configuration settings need to be changed to get started.

There is also the emergent Puppet Server, which is a new Puppet master that is intended to replace the old implementations. This is a Clojure-based server (meaning it runs in a JVM), but don’t let that necessarily dissuade you. If the official apt repositories are being used, installing puppetserver well install everything else needed to run it, including the Java runtime.

Funny enough, in contrast to the beefy 1GB requirements of Chef, I was able to get Puppet Server up and running with a little over 300 MB of RAM used. Even though the installation instructions recommend a minimum of 512MB, I was able to run the server with JAVA_ARGS="-Xms256m -Xmx256m" (a 256MB heap size basically) and perform the testing I needed to do for this article, without any crashes.

After installing the server, things were as easy as:

  • apt-get installing the puppet agent,
  • Configuring the agent to contact the master,
  • Starting the agent (ie: service puppet start),
  • And finally, using puppet cert list and puppet cert sign to locate and sign the agent’s certificate request.

The full installation instructions can be found here:

Windows Support

As mentioned last article, one or the reasons that I actually did choose Puppet over Chef was that its Windows support seemed to be further along.

One of the things that I do inparticularly like about Puppet’s Windows support was the way they made the package resource work for Windows: it can be used to manage Windows Installer packages. This made deployment of Nagios agents and other software utilities to Windows infrastructure extremely easy.

Other than that, several of the basic Puppet resources (such as file, user, group, and service) all work on Windows.

Head on over to the Windows page for more information.


In the very basic model, Puppet agent nodes are managed through the main site manifest located in /etc/puppet/manifests/site.pp. This could look like:

node "srv2.lab.vcts.local" {
  include sample_module

Where sample_module is a module, which is a unit of actions to run against a server (just like cookbooks in Chef).

Agents connect to the Puppet master, generally over port 8140 and over an SSL connection. Nodes are authenticated via certificates which Puppet keeps in a central CA:

root@srv1:~# puppet cert list --all
+ "srv1.lab.vcts.local" (SHA256) 00:11:22... (alt names: "DNS:puppet", "DNS:srv1.lab.vcts.local")
+ "srv2.lab.vcts.local" (SHA256) 00:11:22...

This makes for extremely easy node management. All that needs to be done to authorize an agent on the master is puppet cert sign the pending request. puppet cert revoke is used to add a agent’s certificate to the CRL, removing their access from the server.

As far as data storage goes, I have already written an article on Hiera, Puppet’s data storage mechanism. Check it out here. Again, Hiera is a versatile data storage backend that respects module scope as well, making it an extremely straightforward way to store node parameter data. Better yet, encryption is supported, as are additional storage backends other than the basic JSON and YAML setups.

Execution Model

Natively, right now I would only say that Puppet natively supports a pull model. This is because its push mechanism, puppet kick, seems to be in limbo, as illustrated by the redmine and JIRA issues. The alternative is to apparently use MCollective, which I have never touched.

By default, Puppet runs every 30 minutes on nodes, and this can be tuned by playing with settings in /etc/puppet/puppet.conf.

Further to that, one-off runs of the puppet agent are easy enough, just run puppet agent -t from the command line, which will perform a single run with some other options (ie: slightly higher verbosity). This can easily be set up to run out of something like Ansible (and Ansible’s SSH keys can even be managed through Puppet!)

Puppet also supports direct, master and agentless runs of manifests thru the puppet apply method. Incidentally, this is used by some pretty well-known tools, notably packstack.


This was neat to come back to during my evaluation yesterday. The following manifest took me maybe about 5 minutes to write, and supplies a good deal of information on how the declarative nature of Puppet works. Here, srv2.lab.vcts.local is set up with a MySQL server, a database, and backups, via Puppet’s upstream-supported MySQL module.

node "srv2.lab.vcts.local" {
  class { '::mysql::server':
    root_password => 'serverpass',
  mysql::db { 'vcts':
    user => 'vcts',
    password => 'dbpass',
  file { '/data':
    ensure => directory
  class { '::mysql::server::backup':
    backupuser => 'sqlbackup',
    backuppassword => 'backuppass',
    backupdir => '/data/mysqlbackup',
    backupcompress => true,
    backuprotate => 7,
    file_per_database => true,
    time => [ '22', '00' ],

The DSL is Ruby-based, kind of like Chef, but unlike Chef, it’s not really Ruby anymore. The declarative nature of the DSL means that it’s strongly-typed. There is also no top-down workflow – dependencies need to be strung together with require directives that point to services or packages that would need to be installed. This is a strength, and a weakness, as it is possible to get caught in a complicated string of dependencies and even end up with some circular ones. But when kept simple, it’s a nice way to ensure that things get installed when you want them to be.

Templates are handled by ERB, like Chef. The templates documentation can help out here.

The coolest part about developing for Puppet though has to be its high-quality module community at Puppet Forge. I have used several modules from here over the years and all of the ones that I have used have been of excellent quality – two that come to mind are razorsedge/vmwaretools and pdxcat/nrpe. Not just this, but Puppet has an official process for module approval and support with Puppet Enterprise. And to ice the cake, Puppet Labs themselves have 91 modules on the Forge, with several being of excellent quality and documentation, as can be seen by looking at the MySQL module above. It’s this kind of commitment to professionalism that really makes me feel good about Puppet’s extensibility.


A good middle ground that has stood the test of time

Puppet is probably the first of a new wave of configuration management tools that followed things the likes of CFEngine. I really wish I knew about it when it first came out – it definitely would have helped me solve some challenges surrounding configuration management much earlier than 2013. And it’s as much as usable today, if not more so. With the backing of companies like Google, Cisco, and VMware, Puppet is not going away any time soon. If you are looking for a configuration management system that balances simplicity and utility well, then Puppet is for you. I also can’t close this off without mentioning my love for Puppet Forge, which I personally think is the best module community for its respective product, out of the three that I have reviewed.


In the circles that I’m a part of, and out of the three tools that I have reviewed over this last month, Puppet doesn’t nearly get the love that I think it should. Some people love the Ansible push model. Other developers love Chef because that’s what they are used to and what they have chosen to embrace. Puppet – for no fault of its own, to be honest – seems to be the black sheep here.

Maybe I just need to get out more. A little over two years ago, RedMonk published this article comparing usage of major configuration management tools. If you look at this, at least 2 years ago, there would be no reason to say that Puppet should be put out to pasture yet.

The End

I hope you enjoyed this last month of articles about configuration management.

I thought I’d put forward some thoughts I have had while working with and reviewing these tools for the last month. I came into this looking to do a “shootout” of sorts – basically, standing up each tool and comparing their strengths and weaknesses based off a very simple set of tests and a checklist of features.

I soon abandoned that approach. Why? First off – time. I found that had very limited time to do this, sometimes mainly a Sunday afternoon of an otherwise busy week, to set up, review, and write about each tool (and even at that the articles sometimes came out a few days late).

But more importantly for me I didn’t want to hold on to some stubborn opinion about one tool being better than the other. So I decided to throw all of that stuff out and approach it with an open mind, and actually immerse myself in the experience of using each tool.

I think that I have come out of it better, with an actually objective and educated opinion now about the environment that suits each tool best. Ultimately, this has made me a better engineer.

If you have read this far, and have read all three major articles, first off, thank you. Second off, if you are having trouble choosing between one of the three reviewed in this series, or another altogether, you might want to ask yourself the following few questions:

  • What does my team have knowledge on already?
  • Will I be able to ramp up on the solution in a timely manner with the least impact?
  • Will the solution be equally (or better supported) by future infrastructure platforms?

Most of all, keep looking forward and keep an open mind. Don’t let comfort in using one tool keep you closed off to using others.

Thanks again!


TL;DR Tutorials: Hiera

I thought this would be a neat idea for a series, but we will see how it goes. The idea being of course, that I will aim to give you, the reader, a moderately quick crash course in a specific technology or product, basically giving you enough knowledge to go out in the world and and have a good working knowledge or whatever topic is covered, from which you can build on.

And with that, I present my first topic: Hiera.

About Hiera

A configuration management system is all about reducing work, and keeping the data points that a person needs to manually touch to a minimum. Puppet can go a long way in helping you out with this, but as time goes on and you set up more and more features, with modules that accept parameters, you might find that things will get a little messy. Speaking from my own experience, the place that I used to stick unique parameter data would be the node’s manifest:

node '' {
    include module1
    class { 'module2':
        param1 => 'value1',
        param2 => 'value2'
    class { 'module3':
        param_list => [ 'value1', 'value2', 'value3' ]

You can imagine that after you repeat this several hundred times, try to apply different settings with say node inheritance (which, in fact, is now deprecated), it can become possibly as annoying as managing the node directly.

Worse, what if you have to store secure data, such as passwords?

Hiera solves these problems for you. It offers a central place to store all of your configuration data, allowing node manifests to be greatly simplified, and hopefully eliminating the need for inheritance completely. Better yet, multiple options exist for not only encryption, but data storage as well.

In this article we will cover the basics of using Hiera with a YAML backend, and even better, we will use hiera-eyaml to demonstrate how data can be encrypted as well.

Before we start with examples, it is important to mention the Hiera documentation – which will help you with anything missed here. Namely, we don’t cover direct or merge lookups – if you want to know more about those, check the docs. Be sure to read the README on the hiera-eyaml project page too, so you know how to properly use eYAML.

The Example

Check out the following project on my GitHub, which I will be referencing for the rest of this article.

If you download this, and run it according to the instructions in the, using, say, the following command:

FACTER_tldrnode=dataset1 puppet apply --hiera_config hiera.yaml \
 --modulepath modules/ -e 'include tldr_hiera'

The generated /tmp/ should look something like this:


In the example, you can change FACTER_tldrnode=dataset1 to FACTER_tldrnode=dataset2 to get different output:


Different output happens as well if you omit or change the value to something outside of these two, but we are going to skip that for now (this is supposed to be short and sweet, after all).

The Module

If you look in the included module, you might find that it does not look too much different than a standard puppet module:

class tldr_hiera (
  $tldr_string = 'Not found!',
  $tldr_array = [ 'Not found!' ],
  $tldr_hash = { 'tldr_key' => 'Not found!' },
  $tldr_encstring = 'Not found!'
  ) {

  # File resource where Hiera values get written to
  file { '/tmp/':
    content => template('tldr_hiera/')

Parameters are still declared and passed along as per normal, so where is the data coming from?


This is where Hiera comes in. Note that we including --hiera_config hiera.yaml when we invoke puppet. Let’s take a look at an abridged version of that file:

  - eyaml
  - yaml
  :datadir: hieradata/
  :datadir: hieradata/
  :pkcs7_private_key: hieradata/keys/private_key.pkcs7.pem
  :pkcs7_public_key: hieradata/keys/public_key.pkcs7.pem
# We use a custom fact for hiearchy, just for easy demonstration
  - "%{::tldrnode}"
  - common

What’s going on here:

  • We are using both the eYAML and the YAML backends for Hiera data sources (it’s important that eYAML gets included first to ensure encrypted values are queried first, if you are separating content).
  • Data for both backends are stored in the hieradata/ subdirectory
  • Private and public keys for eYAML are configured. These are generated with eyaml createkeys, and should be stored appropriately (the private key especially should have restricted permissions. Never distribute a private key like we have here for anything else other than demonstration purposes!)
  • The data hierarchy is defined in the hierarchy section. We first reference a custom fact, tldrnode, and then fall back to a common YAML if that cannot be found.

hiera.yaml is just a configuration file for Hiera. After Hiera knows where to look and what to look for, the next step is getting the data.

A Hiera Data File

Here is the directory listing of the hieradata directory on our Vagrant instance.

-rw-r--r-- 1 vagrant vagrant 702 Mar  9 04:59 common.eyaml
-rw-r--r-- 1 vagrant vagrant 267 Mar  9 04:37 common.yaml
-rw-r--r-- 1 vagrant vagrant 283 Mar  9 03:50 dataset1.yaml
-rw-r--r-- 1 vagrant vagrant 284 Mar  9 03:50 dataset2.yaml
drwxr-xr-x 1 vagrant vagrant 136 Mar  9 03:17 keys

In the spirt of keeping things short, we will give an abridged version of one of the data files, dataset1.yaml:

# String
tldr_hiera::tldr_string: string_value_dataset1
# Array
 - element1_dataset1
 - element2_dataset1
# Hash
    tldr_key: tldr_value_dataset1

And again, what’s going on:

  • Puppet has an auto-lookup function that will look up class parameters within Hiera within its own scope. So, if you want to define settings for the tldr_hiera module, you ensure that all of your values are set up within that appropriate namespace.
  • Strings, arrays, and hashes are all set up with an appropriate YAML syntax, seeing as this is the YAML backend.

Lookup Order

There is also a common.yaml file, which acts as a fallback file. This was the common entry in our hierarchy definition.

The lookup order that is defined in that section basically dictates the workflow. So in this instance:

  • The custom fact tldrnode is looked up, and a match is attempted on its value (ie: dataset1.yaml)
  • If that cannot be found, the static value common is used. Note that this is just a convention, you can use whatever name you want.

Values are then looked up according to class scope, as mentioned in the last section.


Finally, there is encryption. common.eyaml looks like so:

tldr_hiera::tldr_encstring: >

However, as long as you have the private key, you can go in and edit the file by using eyaml edit common.eyaml and you will get this instead:

tldr_hiera::tldr_encstring: >

You can then edit the value. When you are using eyaml edit and an appropriate text editor, you will get additional instructions on how to add new values as well.

Another note: if you are using vim to edit the eYAML, you will probably need to change the default tab behaviour or it will break the YAML when adding a multi-line hash. Add something like the following to your .vimrc:

filetype plugin indent on
set tabstop=2
set shiftwidth=2
set expandtab

set expandtab is the key directive here, this is the one that turns tabs into spaces.

Use With a Puppet Master

To use this appropriately with a Puppet master, keep in mind:

  • Open source Puppet expects the master Hiera config to be at /etc/puppet/hiera.yaml
  • Again, be sure to lock down the keys. It is recommended that you keep them out of the data dir – although as long as the permission are tight there should be no issue, but it is probably better to store them somewhere else, such as a location like /etc/puppet/secure/keys. Do not give the private key to anyone else!