Configuration Management Series Part 2.5 – Chef Follow-Up

As luck would have it, I have been working a lot more with Chef in the last couple of weeks, so a supplemental article is in order.

There have been a number of things that I have found out in this period, and there is a good amount that I probably will not cover, but I am going to try and cover some of the key challenges that I have encountered upon my journey. This should give you, the reader, some insight into why I have chosen to continue with Chef, and maybe help you make some good decisions on data management and code design that will save you time on your own ramp-up.

Provisioning with Knife Plugins

First off, the reason why I have decided to keep going with Chef in the first place. Incidentally, it would seem that Chef has the best support for provisioning out of the three tools that I have looked at, especially as far as vSphere is concerned.

As it currently stands, I am in a situation where I need a tool that is going to handle the lifecycle of an instance, end-to-end. Ansible’s vsphere-guest module, unfortunately, was not satisfactory for this job at all, and in fact, working with this in particular piece of code was a lesson in frustration. The code is highly inconsistent across operations – specifically, very few features that are available in instance creation are available in template clone operations, and the available functionality in the former was not suitably portable. Puppet seemingly lacks such a feature altogether, outside of Puppet Enterprise. Both of these are show stoppers for me.

Enter knife-vsphere. An amazing tool, and 100% free. Incidentially, cloning from template is the only creation operation supported, which makes sense, considering that a brand new instance would more than likely be useless for provisioning on. Linux guest customization is fully supported, and programmable support for Windows customization is as well (see examples in the README). Finally, it can even bootstrap Chef onto the newly created node. The tool will even destroy instances and remove their entries from the Chef server if the instance label has a matching node.

Having these features in Chef, it was pretty much a no-brainer to see what wins out here.

Of course, vSphere is by far not the only knife plugin that supports stuff like this. Check out knife-ec2 and knife-openstack, both of which support similar behaviour.

Data Bags and Vault, Oh My!

I think my assessment that data bags were close to Hiera was premature. In the time that I wrote that, I have learned a few key takeaways.

First off, data bags are global. And even though I used global lookups directly in my first article about the subject, Hiera objects are designed to be used in a scoped auto-lookup mechanism, as per their documentation. Incidentally, data bags are the ones that are intended to be accessed in a fashion that is more in line with what I described in that article. Live and learn eh?

Conventions can be created that can help with this. For example, one can set up a bag/item pair like foo/bar if the cookbook’s name is foo. Also, cookbooks can still be parameterized elsewhere, and in fact Chef features like roles and environments are better suited for this, by design. (Note: Please read the last section of this article before investing too much into roles and environments, as roles in particular are frowned upon in parts of the Chef community).

Considering the latter part of the last paragraph, one would wonder: what exactly are data bags ultimately good for then? What indeed. A little bit of research for me revealed this good commentary on the subject by awesome community Chef 2015 winner Noah Kantrowitz. Incidentally, Noah recommends creating resources for this, versus using roles or environments.

As it currently stands, my general rule of thumb for using data bags is encryption. Does it need to be encrypted? Then it goes into a bag. Not a regular encrypted bag though – a regular encrypted bag requires a pre-shared key that needs to be distributed to the nodes somehow. There is no structure around controlling access to the data. How does this problem get solved then? Enter chef-vault.

Vault allows encryption via the public keys of the nodes that need the data, effectively ensuring only these nodes have access. This can explicitly be controlled by node or through a search string that allows searching on a key:data pair (ie: os:linux). This addresses both concerns mentioned in the previous paragraph. The only major issue with this setup is that the data needs to be encrypted across the key that will ultimately be supplied to the node, creating a bit of a chicken-before-the-egg problem. Luckily, it looks like Chef is catching up to this, and recently vault options for knife bootstrap were added to get past this. Now, when nodes are created, vault items can be updated by the host doing the provisioning, allowing a node access to vault items even during the initial chef-client run. This is not supported on validator setup, as the ability to do this anonymously (well, with the validator key) could possibly mean a compromise of the data.

Down the Rabbit Hole

Lastly, I thought I would share some next steps for me. The really fascinating (and frustrating) part of Chef for me is how the community has adopted its own style for using it. There are several years of social coding practices, none of which have been in particularly well documented by mainline.

First off, I am currently working on re-structuring my work to be supported by Berkshelf. This is now a mainline part of the Chef DK, and is used to manage cookbooks and their dependencies.

There are also a number of best practices for writing cookbooks, usually referred to as cookbook patterns in the community, which definitely reflects the developer-centric nature of Chef. Based on some very light and non-scientific observation, one of the more popular documents for this seems to be The Environment Cookbook Pattern by Jamie Winsor, the author of Berkshelf.

2 other articles that have helped me so far are found below as well:

You can probably expect to hear more out of me when it comes to Chef, stay tuned!

Advertisements

Configuration Management Series Part 3 of 3: Puppet

At the start of 2013, the MSP that I was working for at the time landed its largest client in the history of the company. The issue: this meant that I needed to set up approximately 40 servers in a relatively small period of time. This aside, the prospect of manually installing and configuring 40 physical nodes did not entertain me at all. Network installs with Kickstart were fine, sure, but I needed more this time.

I had been entertaining the thought of implementing some sort of configuration management for our systems for a while. I had always hit a wall as to exactly what to use it for. Implementing it for the simple management of just our SSH keys or monitoring system configs seemed like a bit of a waste. I had lightly touched Chef before, but as mentioned in the last article, did not have much success in getting it set up fully, and did not have much time to pursue it further at that point.

I now had a great opportunity on my hands. I decided to take another look at Puppet, which I had looked at only briefly, and found that it seemed to have a robust community version, a simple quickstart workflow, and quite functional Windows support. Call it a matter of circumstance, but it left a much better impression on me than Chef did.

I set up a Puppet master, set up our Kickstart server to perform the installs and bootstrap Puppet in the post-install, and never looked back.

For this client, I set up Puppet to manage several things, such as SSH keys, monitoring system agents (Nagios and Munin), and even network settings, with excellent results. I went on to implement it for the rest of our network and administration became magnitudes easier.

Since then I have used Puppet to manage web servers, set up databases, manage routing tables, push out new monitoring system agents and configure them, and even implement custom backup servers using rsnapshot. One of my crowning achievements is using Puppet to fully implement the setup of private VPN software appliances, including OpenVPN instances complete with custom Windows installers.

Ladies and gentlemen, I present to you: Puppet.

Installation

I am used to this process by now, but I’d probably have to say that standing up a Puppet master is still an extremely easy process. There are plenty of options available – if it’s not possible to use the official repository (although it is recommended), most modern distros do carry relatively up to date versions of the Puppet master and agent (3.4.3 in Ubuntu 14.04 and 3.6.2 in CentOS 7). Installation of both the WEBrick and Passenger versions are both very straightforward and not many (if any) configuration settings need to be changed to get started.

There is also the emergent Puppet Server, which is a new Puppet master that is intended to replace the old implementations. This is a Clojure-based server (meaning it runs in a JVM), but don’t let that necessarily dissuade you. If the official apt repositories are being used, installing puppetserver well install everything else needed to run it, including the Java runtime.

Funny enough, in contrast to the beefy 1GB requirements of Chef, I was able to get Puppet Server up and running with a little over 300 MB of RAM used. Even though the installation instructions recommend a minimum of 512MB, I was able to run the server with JAVA_ARGS="-Xms256m -Xmx256m" (a 256MB heap size basically) and perform the testing I needed to do for this article, without any crashes.

After installing the server, things were as easy as:

  • apt-get installing the puppet agent,
  • Configuring the agent to contact the master,
  • Starting the agent (ie: service puppet start),
  • And finally, using puppet cert list and puppet cert sign to locate and sign the agent’s certificate request.

The full installation instructions can be found here: http://docs.puppetlabs.com/guides/install_puppet/pre_install.html

Windows Support

As mentioned last article, one or the reasons that I actually did choose Puppet over Chef was that its Windows support seemed to be further along.

One of the things that I do inparticularly like about Puppet’s Windows support was the way they made the package resource work for Windows: it can be used to manage Windows Installer packages. This made deployment of Nagios agents and other software utilities to Windows infrastructure extremely easy.

Other than that, several of the basic Puppet resources (such as file, user, group, and service) all work on Windows.

Head on over to the Windows page for more information.

Manageability

In the very basic model, Puppet agent nodes are managed through the main site manifest located in /etc/puppet/manifests/site.pp. This could look like:

node "srv2.lab.vcts.local" {
  include sample_module
}

Where sample_module is a module, which is a unit of actions to run against a server (just like cookbooks in Chef).

Agents connect to the Puppet master, generally over port 8140 and over an SSL connection. Nodes are authenticated via certificates which Puppet keeps in a central CA:

root@srv1:~# puppet cert list --all
+ "srv1.lab.vcts.local" (SHA256) 00:11:22... (alt names: "DNS:puppet", "DNS:srv1.lab.vcts.local")
+ "srv2.lab.vcts.local" (SHA256) 00:11:22...

This makes for extremely easy node management. All that needs to be done to authorize an agent on the master is puppet cert sign the pending request. puppet cert revoke is used to add a agent’s certificate to the CRL, removing their access from the server.

As far as data storage goes, I have already written an article on Hiera, Puppet’s data storage mechanism. Check it out here. Again, Hiera is a versatile data storage backend that respects module scope as well, making it an extremely straightforward way to store node parameter data. Better yet, encryption is supported, as are additional storage backends other than the basic JSON and YAML setups.

Execution Model

Natively, right now I would only say that Puppet natively supports a pull model. This is because its push mechanism, puppet kick, seems to be in limbo, as illustrated by the redmine and JIRA issues. The alternative is to apparently use MCollective, which I have never touched.

By default, Puppet runs every 30 minutes on nodes, and this can be tuned by playing with settings in /etc/puppet/puppet.conf.

Further to that, one-off runs of the puppet agent are easy enough, just run puppet agent -t from the command line, which will perform a single run with some other options (ie: slightly higher verbosity). This can easily be set up to run out of something like Ansible (and Ansible’s SSH keys can even be managed through Puppet!)

Puppet also supports direct, master and agentless runs of manifests thru the puppet apply method. Incidentally, this is used by some pretty well-known tools, notably packstack.

Programmability

This was neat to come back to during my evaluation yesterday. The following manifest took me maybe about 5 minutes to write, and supplies a good deal of information on how the declarative nature of Puppet works. Here, srv2.lab.vcts.local is set up with a MySQL server, a database, and backups, via Puppet’s upstream-supported MySQL module.

node "srv2.lab.vcts.local" {
  
  class { '::mysql::server':
    root_password => 'serverpass',
  }
  
  mysql::db { 'vcts':
    user => 'vcts',
    password => 'dbpass',
  }
  
  file { '/data':
    ensure => directory
  }
  
  class { '::mysql::server::backup':
    backupuser => 'sqlbackup',
    backuppassword => 'backuppass',
    backupdir => '/data/mysqlbackup',
    backupcompress => true,
    backuprotate => 7,
    file_per_database => true,
    time => [ '22', '00' ],
  }
}

The DSL is Ruby-based, kind of like Chef, but unlike Chef, it’s not really Ruby anymore. The declarative nature of the DSL means that it’s strongly-typed. There is also no top-down workflow – dependencies need to be strung together with require directives that point to services or packages that would need to be installed. This is a strength, and a weakness, as it is possible to get caught in a complicated string of dependencies and even end up with some circular ones. But when kept simple, it’s a nice way to ensure that things get installed when you want them to be.

Templates are handled by ERB, like Chef. The templates documentation can help out here.

The coolest part about developing for Puppet though has to be its high-quality module community at Puppet Forge. I have used several modules from here over the years and all of the ones that I have used have been of excellent quality – two that come to mind are razorsedge/vmwaretools and pdxcat/nrpe. Not just this, but Puppet has an official process for module approval and support with Puppet Enterprise. And to ice the cake, Puppet Labs themselves have 91 modules on the Forge, with several being of excellent quality and documentation, as can be seen by looking at the MySQL module above. It’s this kind of commitment to professionalism that really makes me feel good about Puppet’s extensibility.

Conclusion

A good middle ground that has stood the test of time

Puppet is probably the first of a new wave of configuration management tools that followed things the likes of CFEngine. I really wish I knew about it when it first came out – it definitely would have helped me solve some challenges surrounding configuration management much earlier than 2013. And it’s as much as usable today, if not more so. With the backing of companies like Google, Cisco, and VMware, Puppet is not going away any time soon. If you are looking for a configuration management system that balances simplicity and utility well, then Puppet is for you. I also can’t close this off without mentioning my love for Puppet Forge, which I personally think is the best module community for its respective product, out of the three that I have reviewed.

Dated?

In the circles that I’m a part of, and out of the three tools that I have reviewed over this last month, Puppet doesn’t nearly get the love that I think it should. Some people love the Ansible push model. Other developers love Chef because that’s what they are used to and what they have chosen to embrace. Puppet – for no fault of its own, to be honest – seems to be the black sheep here.

Maybe I just need to get out more. A little over two years ago, RedMonk published this article comparing usage of major configuration management tools. If you look at this, at least 2 years ago, there would be no reason to say that Puppet should be put out to pasture yet.

The End

I hope you enjoyed this last month of articles about configuration management.

I thought I’d put forward some thoughts I have had while working with and reviewing these tools for the last month. I came into this looking to do a “shootout” of sorts – basically, standing up each tool and comparing their strengths and weaknesses based off a very simple set of tests and a checklist of features.

I soon abandoned that approach. Why? First off – time. I found that had very limited time to do this, sometimes mainly a Sunday afternoon of an otherwise busy week, to set up, review, and write about each tool (and even at that the articles sometimes came out a few days late).

But more importantly for me I didn’t want to hold on to some stubborn opinion about one tool being better than the other. So I decided to throw all of that stuff out and approach it with an open mind, and actually immerse myself in the experience of using each tool.

I think that I have come out of it better, with an actually objective and educated opinion now about the environment that suits each tool best. Ultimately, this has made me a better engineer.

If you have read this far, and have read all three major articles, first off, thank you. Second off, if you are having trouble choosing between one of the three reviewed in this series, or another altogether, you might want to ask yourself the following few questions:

  • What does my team have knowledge on already?
  • Will I be able to ramp up on the solution in a timely manner with the least impact?
  • Will the solution be equally (or better supported) by future infrastructure platforms?

Most of all, keep looking forward and keep an open mind. Don’t let comfort in using one tool keep you closed off to using others.

Thanks again!

Configuration Management Series Part 2 of 3: Chef

Be sure to read the follow up to this article as well!

Back when I was looking at configuration management about 4 years ago, I explored the possibility of using Chef over Puppet. A few things threw me off at that point in time: the heavy reliance on Ruby for the setup and configuration (at least via the main documented methods), the myth that you needed to know Ruby to work with it, and ultimately, time: the ramp-up time to get started seemed so long that it was actually time prohibitive for me at a period in my life where I had very little to spare. A couple of years later, when it came to pick one to work with, I ultimately chose Puppet.

Over the years, I have heard talk every now and then about use of configuration management tools. It saddened me to hear not so much talk about Puppet, but interestingly, a lot of developers that I spoke with really like Chef, and there’s no doubt that some consider it an essential part of their toolbox. Read on, and you will probably see why.

About Chef

Chef seems to have come about as a matter of necessity, and ultimately was the manifestation of the end result of DevOps – infrastructure automation. Creator Adam Jacob probably tells it way better than I could in this video, where he explains the name, his journey in making the software, and OpsCode’s beginnings.

What I actually got from the video was the dispelling of a misconception I had – that Chef actually came from former employees of Puppet that wanted to create something better. If that was the case, it may have just been a natural evolution of the philosophies that Puppet was built on, and how Adam and company thought they could make it better to suit OpsCode’s needs.

Installation

I’m not going to lie – this was probably the most frustrating part of my teardown on Sunday.

Installation is pretty straightforward. Actually, very straightforward now, as opposed to when it tried it out so many years ago. The server install documentation now gives a much better path to installation, versus how it used to be when it involved gems and what not.

Unfortunately, the base install of Chef server is extremely resource intensive. The installer package for the server on Ubuntu is about 460 MB in size, which takes sizeable amount of time to download and install. You need about 2 GB of RAM to run the whole stack – after installation the total footprint is a little over 1 GB in RAM. I actually had to rebuild my test instance after the initial install with 512 MB failed.

From here, what to do next was a little confusing for me. This ultimately is due to the very modular nature of the Chef management lifecycle – which is great for developers (see below), but is a bit intimidating for first-time admin users or people looking to stand up with little time and minimal knowledge – only having a Sunday afternoon for review is a great example of this.

Ultimately, I pushed through and figured it out. The full path to getting a node up is basically:

This will get you set up with the server, a workstation to do your work, and a node to test with. What this does not do, apparently, is get chef-client to run on boot on the bootstrapped node. I’m not too sure why this is, but the installer package does not include any init scripts, unfortunately. With me being overdue for press time as it is, I chose not to investigate for now, and just ran chef-client manually to test the cookbook that I wrote.

Windows Support

Chef has pretty mature Windows support. Actually, one of the reasons that I chose Puppet over Chef in the first place was that Puppet’s Windows support was further along. I would imagine both are in the same place now.

Chef has also been working on support for PowerShell DSC, MS’s own approach to configuration management through PowerShell.

Head over to the Windows page to see the full feature set for Chef on Windows.

Manageability

Chef takes a very modular approach to management. Ideally, one has the server, and changes are made from workstations thru knife, save possibly organization creation and user management. This includes cookbook design, cookbook uploading, node bootstrapping, run list editing (the list of items that get run on a node during a run), and pretty much everything else about the Chef development lifecycle.

Again, hosts are bootstrapped usually via knife bootstrap – but there are other deployment options as well. See the bootstrap page for more options.

Data storage for nodes is done thru data bags – JSON-formatted data with support for encryption, similar to Hiera and hiera-eyaml for Puppet. I haven’t had much of a chance to look yet, but it looks like the features to not only automate this process and have encrypted and unencrypted data co-exist within the same data bag is a lot more automated and developer-friendly, probably better than Puppet and much better than Ansible.

Finally, there are the premium features. Check the Chef server page for more details. These features include a web interface, extra support for push jobs, analytics, and high availability. It should be noted that these features are free for up to 25 nodes.

Execution Model

Some more time needs to be taken by myself to evaluate all of these methods, but generally, chef-client is your go-to for all execution. Take a look at the run model here. This can be run as a daemon, periodically thru cron, direct on the server, or via knife ssh.

Speaking of which, there is also knife, where most of your Chef management needs will be taken care of thru.

There is also chef-solo – basically, a standalone Chef that does not require a server to be run. This kind of supplies an agentless push-execution model for Chef that can be useful for orchestration.

Programmability

Chef really is a developer’s dream come true.

The Chef DK installation model encourages a developer-centric development and deployment cycle, allowing all management and development to be done from a single developer’s workstation, with changes checked into central source control.

The DSL is pretty much pure Ruby. When Chef first started to become a thing, this was yet another excuse thrown out to not use it – ie: you would need to learn Ruby to learn Chef. But really, nearly all configuration management systems these days use some sort of language for their DSL, be it Ruby, YAML, JSON, or whatever. I would even submit that one can begin teaching themselves a specific language by taking up a configuration management system – learning Puppet actually helped me understand Ruby and ERB a bit. In addition to that, Chef has a great doc that can help you out – Just Enough Ruby for Chef.

Templating is done in ERB, just like in Puppet. See the template documentation for more details.

Terminology and actual concepts are more in line with Puppet. Cookbooks are pretty much the analog to modules or classes in Puppet, with recipes translating to individual manifests (the run data). You can use knife cookbook to create, verify, and upload cookbooks.

Speaking of development – this is actually a functional recipe:

file '/etc/motd' do
  content 'hello world!'
end

Set up within a cookbook – this would write “hello world!” to /etc/motd. Pretty simple!

I think the killer app for development for me here though has to be the testing support that Chef has built in:

  • kitchen is Chef’s built-in test suite that allows you to orchestrate a test scenario from scratch using a wide range or virtualization technologies (including Vagrant – so you don’t even need to have code leave your workstation)!
  • learn.chef.io is a great resource for learning Chef, versus testing Chef. It is a hosted training platform that will set up temporary instances for you to use, along with a guided tutorial.

For me, when people say that Chef is more developer-friendly – it’s not necessarily what you can do or what you can use, but the fact that the toolset enables developers to get code out all that faster.

Conclusion

Great for developers

Chef is awesome for developers, and it shows. The Ruby DSL is extremely easy to work with – that fact that they have made it more programmatic than other configuration management systems would seem to allow it to do more without having to extend it too much or look to third parties for extensions. Also, the Chef DK and Kitchen provide for a very pleasant development experience.

A little tough to get ramped up on

If you are in a pinch and don’t have a lot of time to set up and do not know Chef, you might want to figure something else out until you can block a bit of time to teach yourself. There are a lot of options for you to use Chef, which can make it overwhelming. Setting up the Chef server as well is a bit of a commitment of resources that you may want to consider carefully before you undertake it.

Next up in the series – an oldie but a goodie, the tool that got me started down the configuration management rabbit hole – Puppet.

Ansible: Handling Different Operating Systems with Variables

A fundamental part of working with configuration management tools is the ability to handle different operating systems within the same logic stream, such as a module.

I’ve found a few challenges addressing this topic within Ansible, and hence I think it’s worth noting certain things that you can’t do, and I think is the most logical way (so far).

Things That do not Work

As with most things I learn in life, I figured these out via trial and error, via an approach something along the lines of “I should be able to do this, shouldn’t I?”

Variables and facts cannot be directly referenced in a handler

I tried this:

# handlers/main.yml
---
- name: (Ubuntu) restart ssh
  service: name=ssh state=restarted

- name: (CentOS) restart ssh
  service: name=sshd state=restarted

# tasks/main.yml
---
- name: test SSH restart
  lineinfile: dest=/etc/ssh/sshd_config line=#testing
  notify: ({{ ansible_distribution }}) restart ssh

This gives you this:

ERROR: change handler (({{ ansible_distribution }}) restart ssh) is not defined

Hmm.

Logic does not work in handler

You also can’t do something like this:

# handlers/main.yml
---
- name: restart ssh
  service: name=ssh state=restarted
  when: ansible_distribution == 'Ubuntu'

- name: restart ssh
  service: name=sshd state=restarted
  when: ansible_distribution == 'CentOS'

In this instance, only the first handler is looked at, and if the logic in when: does not evaluate to true, the handler will be skipped. Completely.

Defining variables within a task or handler

You cannot use a vars directive within a task or handler: you simply get this:

ERROR: vars is not a legal parameter in an Ansible task or handler

Note that facts can be set using the set_fact module but this is better used when there is a need to define host-specific variables for use later, in more of a host scope versus a role scope.

The Right Way

I mentioned that variables within a role should not be used, but that’s only half the story. Specifically, variables should not be used for parameters – if you are going to define those variables later within a playbook or via group_vars/ or host_vars/, it is better to make them defaults.

But on the other hand, they are better suited as variables if they are not intended for assignment outside of the scope of the role. This is perfect for what we are planning on using them for here.

Variables can be defined in a role pretty easy – see roles. Note that in here there is a vars/ directory, from which vars/main.yml is loaded with high priority.

This is not the file I am looking for though, as variable files are not very flexible: no additional files or logic can go into it, just values.

So how can variables be included logically then?

include_vars and with_first_found

include_vars is a module that allows you to include variables from within a task. Perfect for the kind of logic flow we are looking for, and allows you to specify a file based off either a jinja2 template handler or the special {{ item }} token.

That’s only part of it though. Logic is needed to handle different operating systems, and can get pretty messy if all that is being used is a bunch of when: clauses. Enter with_first_found:

# tasks/main.yml
---
- name: set distro-specific variables
  include_vars: '{{ item }}'
  with_first_found:
    - '{{ ansible_os_family }}.yml'
    - default.yml

Using this, files can now be placed like so:

# vars/Debian.yml
---
ssh_svc: ssh

# vars/RedHat.yml
---
ssh_svc: sshd

# vars/default.yml
---
ssh_svc: ssh

Now, if you have a handler such as:

# handlers/main.yml
- name: restart ssh
  service: name={{ ssh_svc }} state=restarted

Either ssh or sshd would be used, depending on distro.

Note that with_first_found can be used to set up logic for other things, not just include_vars, it can be especially useful within your task workflow, to avoid including logic that may artificially take up log space or time in a playbook. These includes could be used in the place of when: clauses, for example.

More detail on logic and templating below. Most with_ clauses can be found in the Loops page, but there are a couple of more within the other pages. I wish Ansible was better at documenting this!

http://docs.ansible.com/playbooks_loops.html
http://docs.ansible.com/playbooks_lookups.html
https://docs.ansible.com/playbooks_conditionals.html#selecting-files-and-templates-based-on-variables

Configuration Management Series Part 1.5 of 3 – Ansible Follow-Up

Before we get started on working with our next configuration management system, Chef, I thought I’d post a follow-up to last week’s article covering Ansible. In this article, I will be touching up on some of the conclusions I drew last week, and my opinion on them now, and also some of the things I have learned about working with playbooks.

Programmability

First off, some things on programmability:

Advantages of sequential execution

I really do enjoy the top-down nature of Ansible. You don’t really have to set up any sort of dependency chain for classes or resources, which can really work to your advantage.

Consider the scenario where you have a file that needs to be written to or a command that needs to be run only when a specific package is installed.

In Puppet, this needs to be written:

file { '/etc/example/example.conf':
  content => template('example_mod/example.conf.erb'),
  require => Package['example']
}

package { 'example':
  ensure => installed
}

This is not so bad, and in fact, the declarative nature of Puppet makes it pretty easy to manage this stuff.

But what if you try to require a different resource, like a class? Or multiple packages?

Your child resource could end up with dependencies like this, which would then need to be repeated:

file { '/etc/example/example.conf':
  content => template('example_mod/example.conf.erb'),
  require => [ Package['example1'], Package['example2'], Package['example3'] ]
}

You may also try to chain off a class or service, however when I have attempted these kinds of things, I have just ran into dependency loops.

Then you get so far down the rabbit hole you need a grapher. Consider the following article regarding dependency issues. When I need a whole article to try to figure out how to properly structure dependencies, I honestly tune out.

In Ansible, you don’t really need all of this. You just make sure your package tasks are set up before the tasks that depend on them:

- name: Install packages
  yum: name='{{ item }}' state=present
  with_items:
    - example1
    - example2
    - example3

-name: Put the file where it needs to go now
  template: src=example.conf.j2 dest=/etc/example/example.conf mode=0644

In this instance, if the first task fails, the other ones will not execute.

Another easy check that you can stick at the start of a playbook is an OS check or what not, such as:

- name: check for supported OS
  fail: msg=Unsupported distribution (Ubuntu 14.04 supported only)
  when: ansible_distribution_release != 'trusty'

This will fail if the playbook is not running against anything but Ubuntu 14.04, allowing you to have the rest of the playbook run without worrying about it being run against an unsupported OS.

Frustrations with bad error messages

A few things I have noticed so far with Ansible is the Python-ish error messages you get out of it, sometimes, unfortunately, out of context. I have gotten error messages with stack traces in them (ie: when processing Jinja2 stuff and also when a module fails thru some sort of untrapped Python method). I actually don’t mind this, because at least I know where to start looking.

It’s error messages like this that drive me up the wall.

msg: Error in creating instance: 'str' object has no attribute 'get' 
FATAL: all hosts have already failed -- aborting

This happened to me while trying to mess with the nova_compute module. Where exactly am I supposed to start looking for something wrong here? I have even attempted to use pdb to try and chase it down, but I had to give up because I had other stuff to do. I may try again, but I really am on my own here, barring outside help. This error message does nothing to tell me where my problem is or how to correct it.

It almost would have been better if the error was left un-handled, so I could see the Python stack trace.

Playbooks

And also, some thoughts and bits of knowledge on playbooks.

The playbook as a site configuration

If you looked at the previous article, I had a bit of trouble trying to figure out what a playbook really was in relation to how Puppet is laid out. At first glance, it seemed to me that a playbook was close to what a module was, but that’s not really the case at all.

Playbooks are modular, sure, but they are more like individual site configurations more than anything else. Think about them as individual Puppet or Chef site installs.

According to the Ansible playbook best practices guide there is a very specific layout. This is reproduced below:

production                # inventory file for production servers
stage                     # inventory file for stage environment

group_vars/
   group1                 # here we assign variables to particular groups
   group2                 # ""
host_vars/
   hostname1              # if systems need specific variables, put them here
   hostname2              # ""

library/                  # if any custom modules, put them here (optional)
filter_plugins/           # if any custom filter plugins, put them here (optional)

site.yml                  # master playbook
webservers.yml            # playbook for webserver tier
dbservers.yml             # playbook for dbserver tier

roles/
    common/               # this hierarchy represents a "role"
        tasks/            #
            main.yml      #  <-- tasks file can include smaller files if warranted
        handlers/         #
            main.yml      #  <-- handlers file
        templates/        #  <-- files for use with the template resource
            ntp.conf.j2   #  <------- templates end in .j2
        files/            #
            bar.txt       #  <-- files for use with the copy resource
            foo.sh        #  <-- script files for use with the script resource
        vars/             #
            main.yml      #  <-- variables associated with this role
        defaults/         #
            main.yml      #  <-- default lower priority variables for this role
        meta/             #
            main.yml      #  <-- role dependencies

    webtier/              # same kind of structure as "common" was above, done for the webtier role
    monitoring/           # ""
    fooapp/               # ""

Note how everything is accounted for here. For someone like me who comes from Puppet, there are some easy analogs here that have helped me:

  • site.yml and the other playbooks are pretty similar to Puppet site manifest files. From them entire configurations can be laid out without a need for anything else, or different roles can be referenced.
  • group_vars and host_vars behave similar to Hiera for storing variable overrides.
  • Finally, the roles structure is the closest thing that Ansible has to Puppet modules.

Honourable mentions to the library and filter_plugins directories. Ansible modules are actually closer to the core than Puppet’s, and normally do not need to be written, but you may find yourself doing so if your roles require functionality that cannot be reproduced in any existing module and you want that recycled across modules. Also, jinja2 filter plugins can be written (we give a small example below).

Roles

As mentioned, roles are probably the closest thing that Ansible has to a puppet module. These are recyclable “playbooks” that are self-contained and are then included within a proper playbook via the following means:

- hosts: webservers
  sudo: yes
  roles:
    - common
    - webtier
    - monitoring

This would ensure that webservers got any tasks related to the common, webtier, and monitoring roles, running under sudo. Imagine that this means that some basic common tasks are done (ie: maybe push out a set of SSH keys and make sure that DNS servers are set correctly), set up a webserver, and install a monitoring agent.

Similar to Puppet or Chef, there is a file structure to roles that allows you to split out various parts of the roles to different directories:

  • tasks for role tasks, the main workflow of a role and Ansible itself.
  • handlers for server actions and what not (ie: restart webserver)
  • templates to store your jinja2 templates for template module tasks
  • files for your static content
  • defaults to define default variables
  • vars to define local variables (NOTE: I have decided to stay away from these and I recommend you do too. Use defaults!)

main.yml is the universal entry point for all of these. from here, you can include all you wish. Example:

- name: check for supported OS
  fail: msg=Unsupported distribution (Ubuntu 14.04 supported only)
  when: ansible_distribution_release != 'trusty'

- include: install_pkgs.yml

This would be a tasks/main.yml file and demonstrates how you can mix includes and actual tasks in the role.

Group and host variables

This is where I would recommend you store variables if you need them. The obvious place would be group first, then host if you need specific host variables.

For example, this would apply foo=bar to all webservers for all plays:

# group_vars/webservers.yml
---
foo: bar

You can also include them in the playbook directly, ie: like so:

- hosts: webservers
  sudo: yes
  roles:
    - common
    - webtier
    - monitoring
  vars:
    foo: bar

Secure data

ansible-vault is a very basic encryption system that encrypts an entire file (does not need to be YAML) off the command line. It hooks into ansible and ansible-playbook via the --ask-vault-pass switch that will take any encrypted files it encounters and decrypt them.

It is recommended that anything with sensitive data be encrypted. The one main difference here is that as opposed to Hiera-eyaml, for example, the entire file is encrypted, so you may wish to separate encrypted and unencrypted data through separate files, if it can be helped. This ensures non-sensitive data can still be published to source control and be reviewed.

Unfortunately, it does not look like there is a certificate or private/public key system in use here, so remembering the password is very important, as is picking a secure one. Passwords can also be stored in a file (make sure it’s set to mode 0600), and passed using the --vault-password-file option under all Ansible utilities, assumedly.

Filter Plugins for jinja2

And finally, I wrote that Jinja2 plugin I was hitting the wall with. 😉 It’s very simple and can be found below. This would go into the filter_plugins directory.

# Add string split filter functionality
def split(value, sep, max):
    return value.split(sep, max)

class FilterModule(object):
    def filters(self):
        return {
            'split': split
        }

This now gives you the split filter that can be invoked like so:

{{ (example|split('.'))[0] }}

Assuming example.com, this would give example.

“Final” Thoughts

To be honest, I think that Ansible is more of an orchestration tool rather than a configuration management system. The agentless push nature of it, combined with its top-down workflow, roll-your-own inventories, and loosely enforced site conventions give it a chaotic feel that, even after giving it a week, still does not feel like configuration management to me.

This is not to say it’s a bad tool in any right, in fact, quite the opposite. Ansible is a great addition to my toolbox, and I am looking forward to applying it in everyday engineering. The above points that may be “weaknesses” when considering configuration management can equally be strengths when you need to get something done quickly and with as little of a footprint as possible.

Further to that, Ansible can be combined with something like Chef or Puppet, with each tool playing on its strengths. Ansible can be used to bootstrap either tool, with Chef or Puppet doing things like managing SSH keys (such as one for Ansible), enforcing configuration, etc. Ansible can be used in place of something like knife or puppet kick to force updates as well.

Some tools need work: Vault, for example, could do better to be more than just a wrapper for encryption, and set up some sort of keypair structure to reduce the amount of typing you have to do. And error messaging could do with a bit of work so that engineers are not scratching their heads trying to figure out what variable caused Python to barf when errors are trapped.

All in all though, I really like Ansible. There are some other projects that I want to try it with, namely working with the nova_compute and vsphere_guest modules to help automate orchestration even further.

I know that I mentioned that Chef was going to be this Sunday. Unfortunately (or maybe fortunately) I spent a bit more time with Ansible than I thought I was going to, and the research has not been done to properly get a Chef article up yet – ie: I have not yet started. 😛 in any case, that will be happening next – so stay tuned for the it next weekend!

Configuration Management Series Part 1 of 3: Ansible

This is the first of a three-part series that I am doing regarding reviewing 3 major configuration management tools: Ansible, Chef, and Puppet.

You may have seen that I have written here about Puppet before – indeed, it is the configuration management tool that I have the most experience with and probably the one that I will be sticking with personally (famous last words :p). However, I think that it’s always in an engineer’s best interest to be able to understand, support, and thrive with technologies other than the ones that he or she may personally favour as part of their own toolkit – that is what teamwork is built on, after all. So with that, take the opinions in these next few articles with a grain of Salt (which, incidentally, is the name of another tool that I will not be reviewing at this point in time, heh). I hope that you can value my opinion while still understanding that is it subjective, and your experience will vary. Honestly, I think that all 3 of these tools have value, and in the hands of competent engineers, you would be hard-pressed to find shortcomings in any of them.

One other note: I have removed the ratings from this article and the future articles will not have them either. I thought it was a good idea at first, but as time went on, and my experience with the tools increase, they seem more and more pretentious and quite misinformed.

Now that I have that disclaimer out of the way, let’s get started shall we?

As mentioned in the first part of this series, we are covering Ansible.

About Ansible

A long time ago in a technology scene not so far away (but definitely quite different than the landscape today, I used to work at NetNation (you may now know them as Hostway Canada). On our management station we had a very simple shell script that performed a very valuable task: batch administration of our shared hosting server farm over SSH.

It has now been nearly 10 years since I left Hostway, yet some concepts never die, they just get better. Ridiculously better. Magnitudes of evolution better.

Maybe about 5 or so years ago now I actually wrote a better version of this script in Python – doing some research I fond a very nifty SSH library called Paramiko (http://www.paramiko.org/) and added a few things to make the experience better: namely, the ability to upload files and run shell scripts on the remote machine.

Admittedly, I haven’t really looked into Ansible recently, but imagine my surprise to find out that someone has basically taken that concept and made it leaps and bounds more awesome.

Ansible is basically this concept on – even though I loathe to use this analogy – steroids. Its main command transport is SSH, and does not require an agent on managed nodes to run. It allows for complex configuration management thru a YAML-based pseudo-DSL, one-off command execution, file upload, and also remote execution of complete shell scripts. A very powerful swiss-army knife for both one-off and ongoing configuration management.

Installation

Installation of Ansible is probably the most minimal out of the three that we will be covering. The three major methods are either: 1) git repo (https://github.com/ansible/ansible), 2) package, or 3) pip.

With either method you may notice something that you may not be used to if you have used a more traditional configuration management tool such as Chef or Puppet: no management server.

This is almost true. Ansible nodes normally do not use a check-in method to get their configuration data, rather it is pushed out to various nodes, so there is no need for a server with this capability. Nonetheless, there is a central configuration structure that can be found in a directory such as /etc/ansible and needs to be edited accordingly. More on that in brief later.

You can find the installation guide at http://docs.ansible.com/intro_installation.html.

Windows Support

This was going to detail the support of both major operating systems, but I felt the need to mention Linux support was redundant, as all 3 were obviously built to manage Linux first.

Ansible was written with Linux and other open source server systems in mind. With that said, they have had made a very recent push to supporting Windows, and it is looking pretty promising. Going with their agentless model again, they have decided to use WinRM instead. Some of the Windows specific features look amazing as well and definitely on par with something like Puppet.

Check out the Windows section at http://docs.ansible.com/intro_windows.html. The list of Windows-specific modules can be found at http://docs.ansible.com/list_of_windows_modules.html.

Manageability

Since Ansible does not rely on agents, there is no real host discovery in the sense that hosts would be checking in with your management sever. From what I’ve seen, there is not even any real language to refer to the “master” server.

Inventory is managed in /etc/ansible/hosts in a pretty simple configuration file that is hard to mess up. Here is my sample hosts file:

# Our lab webserver
[webservers]
172.16.0.12

# Our lab database server
[dbservers]
172.16.0.13

Hosts can be referred to by IP address, hostname, and can be further abstracted with aliases. They can also be grouped together using primitive expressions if your hosts all follow a certain convention. Various plugins exist to gather inventory dynamically too – see http://docs.ansible.com/intro_dynamic_inventory.html.

The only thing that I really see missing here is some native sort of host registration and grouping. This may not be something that, by design, would be Ansible’s domain though, considering it’s push-based architecture. Also, depending on the environment you are using it in, this may be something that you can handle with dynamic inventory.

Best practice for storing host parameters (variables) actually follows very closely with how Puppet handles things with Hiera. See here: http://docs.ansible.com/intro_inventory.html#splitting-out-host-and-group-specific-data and you will see that data is generally stored in YAML files in a very similar fashion. There is also Ansible Vault: http://docs.ansible.com/playbooks_vault.html which allows for the storage of encrypted data.

There is also an enterprise console, of course. Check out Ansible tower here: http://www.ansible.com/tower

Execution Model

Several options exist for you to use with Ansible’s powerful command-line structure. The ansible command line tool can be used to execute one-off commands, module runs, and even push shell scripts and files over to remote nodes. And since it’s all intended to be run over SSH via standard user accounts, existing access control schemes can be extended to configuration management as well.

Even though I don’t believe it’s a strength of the tool, there is a pull option available as well via ansible-pullhttp://docs.ansible.com/playbooks_intro.html#ansible-pull

Programmability

My opinion here might be a bit polarizing or just completely out there, but I have not had the best experience writing for Ansible so far, although mind you I have only been doing it for an evening, and obviously still have a lot to learn.

Modules are actually bit more of an advanced topic, with Playbooks taking the place of general modules like you would be used to in Puppet, or cookbooks in Chef. These are collections of YAML files that dictate orchestration and management of configuration files. Playbook documentation can be found at http://docs.ansible.com/playbooks.html.

Templating for Playbooks is done using jinja2 – a powerful template language written for Python. You can find the template reference at http://jinja.pocoo.org/docs/dev/templates/. Ansible also extends this a bit and its worth reading http://docs.ansible.com/playbooks_variables.html#using-variables-about-jinja2 to find out how.

Unfortunately, this is where I got a little hung up on a learning curve. My impressions were:

  • The YAML structure for Playbooks was open to a bit of interpretation, and the ambiguity of it was throwing me off at points. I found that I was trying to figure out the best way to nest, if I should be at all, how much code to recycle, etc. Ironically, there seems to be several ways to do it, which kind of flies in the face of one of the core philosophies of what Ansible is written in, Python (see https://wiki.python.org/moin/TOOWTDI).
  • I hit a straight up wall when I found out that jinja2 does not have a split filter – you need to write one of your own. An easy task in Python, but annoying nonetheless.

Lastly, when it comes to config management, I prefer to not have to write any of my own modules if I can help it. Unfortunately, briefly looking at Ansible Galaxy left me unfulfilled. Standard modules that I was used to having in Puppet, such as mainline designed modules like Apache and MySQL, were not up to par or were just flat out missing. Other highly-rated modules that purported to do one thing did others (ie: one Apache module was trying to manage SSH keys as well). Several user accounts (even one named “ihateansible”) exist with zero contributions. I hate to say it but my at least first impression is that there are some quality control issues here.

Conclusion

  • Strengths:
    • Agentless push-style management
    • Powerful inventory management
  • Weaknesses
    • Core programmability features will take some getting used to if you are coming from Chef/Puppet.
    • Ansible Galaxy possibly has some quality control issues.

Ansible is great, no doubt about that. It has taken a tried and reliable concept and turned it into something so much more than I could have imagined.

Installation is dead simple, and hence it excels in environments that need to stand up some sort of configuration management fast and with as little impact to existing infrastructure as possible. Its simple yet powerful approach to host management compliments this process even more.

Windows support is now available and looks very promising, and personally I am impressed on how fast they have been able to stand this side of it up.

Programmability could be worked on a bit, but again this could be my inexperience talking. At the very least, I would prefer to see more mainline created and managed Playbooks, which would then raise the overall quality of the modules seen on Galaxy.

Stay tuned for part 2: Chef, either later this week or on next Sunday. There will probably be another article or two regarding Ansible as well before we wrap up!