Just add Code: Fun with Terraform Modules and AWS

Note: This article originally appeared in the 2016 AWS Advent. For the original article, click here.

Note that this article was written for Terraform v0.7.x – there have been several developments since this release that makes a number of the items covered here obsolete and they will be covered in the next article. 🙂

This article is going to show you how you can use Terraform, with a little help from Packer and Chef, to deploy a fully-functional sample web application, complete with auto-scaling and load balancing, in under 50 lines of Terraform code.

You will need the sample project to follow along, so make sure you load that up before continuing with reading this article.

The Humble Configuration

Check out the code in the terraform/main.tf file.

It might be hard to think that with this mere smattering of Terraform is setting up:

  • An AWS VPC
  • 2 subnets, each in different availability zones, fully routed
  • An AWS Application Load Balancer
  • A listener for the ALB
  • An AWS Auto Scaling group
  • An ALB target group attached to the ALB
  • Configured security groups for both the ALB and backend instances

So what’s the secret?

Terraform Modules

This example is using a powerful feature of Terraform – the modules feature, providing a semantic and repeatable way to manage AWS infrastructure. The modules hide most of the complexity of setting up a full VPC behind a relatively small set of code, and an even smaller set of changes going forward (generally, to update this application, all that is needed is to update the AMI).

Note that this example is composed entirely of modules – no root module resources exist. That’s not to say that they can’t exist – and in fact one of the secondary examples demonstrates how you can use the outputs of one of the modules to add extra resources on an as-needed basis.

The example is composed of three visible modules, and one module that operates under the hood as a dependency

  • terraform_aws_vpc, which sets up the VPC and subnets
  • terraform_aws_alb, which sets up the ALB and listener
  • terraform_aws_asg, which configures the Auto Scaling group, and ALB target group for the launched instances
  • terraform_aws_security_group, which is used by the ALB and Auto Scaling modules to set up security groups to restrict traffic flow.

These modules will be explained in detail later in the article.

How Terraform Modules Work

Terraform modules work very similar to basic Terraform configuration. In fact, each Terraform module is a standalone configuration in its own right, and depending on its pre-requisites, can run completely on its own. In fact, a top-level Terraform configuration without any modules being used is still a module – the root module. You sometimes see this mentioned in various parts of the Terraform workflow, such as in things like error messages, and the state file.

Module Sources and Versioning

Terraform supports a wide variety of remote sources for modules, such as simple, generic locations like HTTP, or Git, or well-known locations like GitHub, Bitbucket, or Amazon S3.

You don’t even need to put a module in a remote location. In fact, a good habit to get into is if you need to re-use Terraform code in a local project, put that code in a module – that way you can re-use it several times to create the same kind of resources in either the same, or even better, different, environments.

Declaring a module is simple. Let’s look at the VPC module from the example:

module "vpc" {                                                       
  source                  = "github.com/paybyphone/terraform_aws_vpc?ref=v0.1.0"
  vpc_network_address     = "${var.vpc_network_address}"             
  public_subnet_addresses = ["${var.public_subnet_addresses}"]       
  project_path            = "${var.project_path}"                    
} 

The location of the module is specified with the source parameter. The style of the parameter will dictate what kind of behaviour TF will undertake to get the module.

The rest of the options here are module parameters, which translate to variables within the module. Note that any variable that does not have a default value in the module is a required parameter, and Terraform will not start if these are not supplied.

The last item that should be mentioned is regarding versioning. Most module sources that work off of source control have a versioning parameter you can supply to get a revision or tag – with Git and GitHub sources, this is ref, which can translate to most Git references, be it a branch, or tag.

Versioning is a great way to keep things under control. You might find yourself iterating very fast on certain modules as you learn more about Terraform or your internal infrastructure design patterns change – versioning your modules ensures that you don’t need to constantly refactor otherwise stable stacks.

Module Tips and Tricks

Terraform and HCL is a work in progress, and there may be some things that seem like they may make sense that don’t necessarily work 100% – yet. There are some things that you might want to keep in mind when you are designing your modules that may reduce the complexity that ultimately gets presented to the user:

Use Data Sources

Terraform 0.7+’s data sources feature can go a long way in reducing the amount of data needs to go in to your module.

In this project, data sources are used for things such as obtaining VPC IDs from subnets (aws_subnet) and getting the security groups assigned to an ALB (using the aws_alb_listener and aws_alb data sources chained together). This allows us to create ALBs based off of subnet ID alone, and attach auto-scaling groups to ALBs with knowing only the listener ARN that we need to attach to.

Exploit Zero Values and Defaults

Terraform follows the rules of the language it was created in regarding zero values. Hence, most of the time, supplying an empty parameter is the same as supplying none at all.

This can be advantageous when designing a module to support different kinds of scenarios. For example, the alb module supports TLS via supplying a certificate ARN. Here is the variable declaration:

// The ARN of the server certificate you want to use with the listener.
// Required for HTTPS listeners.
variable "listener_certificate_arn" {
  type    = "string"
  default = ""
}

And here it is referenced in the listener block:

// alb_listener creates the listener that is then attached to the ALB supplied
// by the alb resource.
resource "aws_alb_listener" "alb_listener" {
  ...
  certificate_arn   = "${var.listener_certificate_arn}"
  ...
}

Now, when this module parameter is not supplied, its default value becomes an empty string, which is passed in to aws_alb_listener.alb_listener. This is, most times, exactly the same as if the parameter is not passed in at all. This allows you to not have to worry about this parameter when you just want to use HTTP on this endpoint (the default for the ALB module as a whole).

Pseudo-Conditional Logic

Terraform does not support conditional logic yet, but through creative use of count and interpolation, one can create semi-conditional logic in your resources.

Consider the fact that the terraform_aws_autoscaling module supports the ability to attach the ASG to an ALB, but does not explicit require it. How can you get away with that, though?

To get the answer, check one of the ALB resources in the module:

// autoscaling_alb_target_group creates the ALB target group.
resource "aws_alb_target_group" "autoscaling_alb_target_group" {
  count    = "${lookup(map("true", "1"), var.enable_alb, "0")}"
  ...
}

Here, we make use of the map interpolation function, nested in a lookup function to provide essentially an if/then/else control structure. This is used to control a resource’s instance count, adding an instance if var.enable_alb is true, and completely removing the resource from the graph otherwise.

This conditional logic does not necessarily need to be limited to count either. Let’s go back to the aws_alb_listener.alb_listener resource in the ALB module, looking at a different parameter:

// alb_listener creates the listener that is then attached to the ALB supplied
// by the alb resource.
resource "aws_alb_listener" "alb_listener" {
  ...
  ssl_policy        = "${lookup(map("HTTP", ""), var.listener_protocol, "ELBSecurityPolicy-2015-05")}"
  ...
}

Here, we are using this trick to supply the correct SSL policy to the listener if the listener protocol is not HTTP. If it is, we supply the zero value, which as mentioned before, makes it as if the value was never supplied.

Module Limitations

Terraform does have some not-necessarily-obvious limitations that you will want to keep in mind when designing both modules and Terraform code in general. Here are a couple:

Count Cannot be Computed

This is a big one that can really get you when you are writing modules. Consider the following scenario that totally did not happen to me even though I knew of of such things beforehand 😉

  • An ALB listener is created with aws_alb_listener
  • The arn of this resource is passed as an output
  • That output is used as both the ARN to attach an auto-scaling group to, and the pseudo-conditional in the ALB-related resources’ count parameter

What happens? You get this lovely message:

value of 'count' cannot be computed

Actually, it used to be worse (a strconv error was displayed instead), but luckily that changed recently.

Unfortunately, there is no nice way to work around this right now. Extra parameters need to be supplied or you need to structure your modules in way that avoids computed values being passed into countdirectives in your workflow. (This is pretty much exactly why the terraform_aws_asg module has a enable_alb parameter).

Complex Structures and Zero Values

Complex structures are not necessarily good candidates for zero values, even though it may seem like a good idea. But by defining a complex structure in a resource, you are by nature supplying it a non-zero value, even if most of the fields you supply are empty.

Most resources don’t handle this scenario gracefully, so it’s best to avoid using complex structures in a scenario where you may be designing a module for re-use, and expect that you won’t be using the functionality defined by such a structure often.

The Application in Brief

As our focus in this article is on Terraform modules, and not on other parts of the pattern such as using Packer or Chef to build an AMI, we will only touch up briefly on the non-Terraform parts of this project, so that we can focus on the Terraform code and the AWS resources that it is setting up.

The Gem

The Ruby gem in this project is a small “hello world” application running with Sinatra. This is self-contained within this project and mainly exists to give us an artifact to put on our base AMI to send to the auto-scaling group.

The server prints out the system’s hostname when fetched. This will allow us to see each node in action as we boot things up.

Packer

The built gem is loaded on to an AMI using Packer, for which the code is contained within packer/ami.json. We use chef-solo as a provisioner, which works off a self-contained cookbook named packer_payload in the cookbooks directory. This allows us a bit more of a higher-level workflow than we would have simply with shell scripts, including the ability to better integration test things and also possibly support multiple build targets.

Note that the Packer configuration takes advantage of a new Packer 0.12.0 feature that allows us to fetch an AMI to use as the base right from Packer. This is the source_ami_filter directive. Before Packer 0.12.0, you would have needed to resort to a helper, such as ubuntu_ami.sh, to get the AMI for you.

The Rakefile

The Rakefile is the build runner. It has tasks for Packer (ami), Terraform (infrastructure), and Test Kitchen (kitchen). It also has prerequisite tasks to stage cookbooks (berks_cookbooks), and Terraform modules (tf_modules). It’s necessary to pre-fetch modules when they are being used in Terraform – normally this is handled by terraform get, but the tf_modules task does this for you.

It also handles some parameterization of Terraform commands, which allows us to specify when we want to perform something else other than an apply in Terraform, or use a different configuration.

All of this is in addition to standard Bundler gem tasks like build, etc. Note that install and releasetasks have been explicitly disabled so that you don’t install or release the gem by mistake.

The Terraform Modules

Now that we have that out of the way, we can talk about the fun stuff!

As mentioned at the start of the article, This project has 4 different Terraform modules. Also as mentioned, one of them (the Security Group module) is hidden from the end user, as it is consumed by two of the parent modules to create security groups to work with. This exploits the fact that Terraform can, of course, nest modules within each other, allowing for any level of re-usability when designing a module layout.

The AWS VPC Module

The first module, terraform_aws_vpc, creates not only a VPC, but also public subnets as well, complete with route tables and internet gateway attachments.

We’ve already hidden a decent amount of complexity just by doing this, but as an added bonus, redundancy is baked right into the module by distributing any network addresses passed in as subnets to the module across all availability zones available in any particular region via the aws_availability_zones data source. This process does not require previous knowledge of the zones available to the account.

The module passes out pertinent information, such as the VPC ID, the ID of the default network ACL, the created subnet IDs, the availability zones for those subnets as a map, and the ID of the route table created.

The ALB Module

The second module, terraform_aws_alb allows for the creation of AWS Application Load Balancers. If all you need is the defaults, use of this module is extremely simple, creating an ALB that will answer requests on port 80. A default target group is also created that can be used if you don’t have anything else mapped, but we want to use this with our auto-scaling group.

The Auto Scaling Module

The third module, terraform_aws_asg, is arguably the most complex of the three that we see in the sample configuration, but even at that, its required options are very slim.

The beauty of this module is that, thanks to all the aforementioned logic, you can attach more than one ASG to the same ALB with different path patterns (mentioned below), or not attach it to an ALB at all! This allows this same module to be used for a number of scenarios. This is on top of the plethora of options available to you to tune, such as CPU thresholds, health check details, and session stickiness.

Another thing to note is how the AMI for the launch configuration is being fetched from within this module. We work off the tag that we used within Packer, which is supplied as a module variable. This is then searched for within the module via an aws_ami data source. This means that no code or variables need to change when the AMI is updated – the next Terraform run will pick up the most recent AMI with the tag.

Lastly, this module supports the rolling update mechanism laid out by Paul Hinze in this post oh so long ago now. When a new AMI is detected and the auto-scaling group needs to be updated, Terraform will bring up the new ASG, attach it, wait for it to have minimum capacity, and then bring down the old one.

The Security Group Module

The last module to be mentioned, terraform_aws_security_group, is not shown anywhere in our example, but is actually used by the ALB and ASG modules to create Security Groups.

Not only does it create security groups though – it also allows for the creation of 2 kinds of ICMP allow rules. One for all ICMP, if you so choose, but more importantly, allow rules for ICMP type 3 (host unreachable) are always created, as this is how path MTU discovery works. Without this, we might end up with unnecessarily degraded performance.

Give it a Shot

After all this talk about the internals of the project and the Terraform code, you might be eager to bring this up and see it working. Let’s do that now.

Assuming you have the project cloned and AWS credentials set appropriately, do the following:

  • Run bundle install --binstubs --path vendor/bundle to load the project’s Ruby dependencies.
  • Run bundle exec rake ami. This builds the AMI.
  • Run bundle exec rake infrastructure. This will deploy the project.

After this is done, Terraform should return a alb_hostname value to you. You can now load this up in your browser. Load it once, then wait about 1 second, then load it again! Or even better, just run the following in a prompt:

while true; do curl http://ALBHOST/; sleep 1; done

And watch the hostname change between the two hosts.

Tearing it Down

Once you are done, you can destroy the project simply by passing a TF_CMD environment variable in to rake with the destroy command:

TF_CMD=destroy bundle exec rake infrastructure

And that’s it! Note that this does not delete the AMI artifact, you will need to do that yourself.

More Fun

Finally, a few items for the road. These are things that are otherwise important to note or should prove to be helpful in realizing how powerful Terraform modules can be.

Tags

You may have noticed the modules have a project_path parameter that is filled out in the example with the path to the project in GitHub. This is something that I think is important for proper AWS resource management.

Several of our resources have machine-generated names or IDs which make them hard to track on their own. Having a easy-to-reference tag alleviates that. Having the tag reference the project that consumes the resource is even better – I don’t think it gets much clearer than that.

SSL/TLS for the ALB

Try this: create a certificate using Certificate Manager, and change the alb module to the following:

module "alb" {
  source                   = "github.com/paybyphone/terraform_aws_alb?ref=v0.1.0"
  listener_subnet_ids      = ["${module.vpc.public_subnet_ids}"]
  listener_port            = "443"
  listener_protocol        = "HTTPS"
  listener_certificate_arn = "arn:aws:acm:region:account-id:certificate/certificate-id"
  project_path             = "${var.project_path}"
}

Better yet, see the example here. This can be run with the following command:

TF_DIR=terraform/with_ssl bundle exec rake infrastructure

And destroyed with:

TF_CMD=destroy TF_DIR=terraform/with_ssl bundle exec rake infrastructure

You now have SSL for your ALB! Of course, you will need to point DNS to the ALB (either via external DNS, CNAME records, or Route 53 alias records – the example includes this), but it’s that easy to change the ALB into an SSL load balancer.

Adding a Second ASG

You can also use the ASG module to create two auto-scaling groups.

module "autoscaling_group_foo" {
  source            = "github.com/paybyphone/terraform_aws_asg?ref=v0.1.1"
  subnet_ids        = ["${module.vpc.public_subnet_ids}"]
  image_tag_value   = "vancluever_hello"
  enable_alb        = "true"
  alb_listener_arn  = "${module.alb.alb_listener_arn}"
  alb_rule_number   = "100"
  alb_path_patterns = ["/foo/*"]
  alb_service_port  = "4567"
  project_path      = "${var.project_path}"
}

module "autoscaling_group_bar" {
  source            = "github.com/paybyphone/terraform_aws_asg?ref=v0.1.1"
  subnet_ids        = ["${module.vpc.public_subnet_ids}"]
  image_tag_value   = "vancluever_hello"
  enable_alb        = "true"
  alb_listener_arn  = "${module.alb.alb_listener_arn}"
  alb_rule_number   = "101"
  alb_path_patterns = ["/bar/*"]
  alb_service_port  = "4567"
  project_path      = "${var.project_path}"
}

There is an example for the above here. Again, run it with:

TF_DIR=terraform/multi_asg bundle exec rake infrastructure

And destroy it with:

TF_CMD=destroy TF_DIR=terraform/multi_asg bundle exec rake infrastructure

You now have two auto-scaling groups, one handling requests for /foo/*, and one handling requests for /bar/*. Give it a go by reloading each URL and see the unique instances you get for each.

Acknowledgments

I would like to take a moment to thank PayByPhone for allowing me to use their existing Terraform modules as the basis for the publicly available ones at https://github.com/paybyphone. Writing this article would have been a lot more painful without them!

Also thanks to my editors, Anthony Elizondo and Andrew Langhorn for for their feedback and help with this article, and the AWS Advent Team for the chance to stand on their soapbox for my 15 minutes! 🙂

Advertisements

Configuration Management Series Part 2.5 – Chef Follow-Up

As luck would have it, I have been working a lot more with Chef in the last couple of weeks, so a supplemental article is in order.

There have been a number of things that I have found out in this period, and there is a good amount that I probably will not cover, but I am going to try and cover some of the key challenges that I have encountered upon my journey. This should give you, the reader, some insight into why I have chosen to continue with Chef, and maybe help you make some good decisions on data management and code design that will save you time on your own ramp-up.

Provisioning with Knife Plugins

First off, the reason why I have decided to keep going with Chef in the first place. Incidentally, it would seem that Chef has the best support for provisioning out of the three tools that I have looked at, especially as far as vSphere is concerned.

As it currently stands, I am in a situation where I need a tool that is going to handle the lifecycle of an instance, end-to-end. Ansible’s vsphere-guest module, unfortunately, was not satisfactory for this job at all, and in fact, working with this in particular piece of code was a lesson in frustration. The code is highly inconsistent across operations – specifically, very few features that are available in instance creation are available in template clone operations, and the available functionality in the former was not suitably portable. Puppet seemingly lacks such a feature altogether, outside of Puppet Enterprise. Both of these are show stoppers for me.

Enter knife-vsphere. An amazing tool, and 100% free. Incidentially, cloning from template is the only creation operation supported, which makes sense, considering that a brand new instance would more than likely be useless for provisioning on. Linux guest customization is fully supported, and programmable support for Windows customization is as well (see examples in the README). Finally, it can even bootstrap Chef onto the newly created node. The tool will even destroy instances and remove their entries from the Chef server if the instance label has a matching node.

Having these features in Chef, it was pretty much a no-brainer to see what wins out here.

Of course, vSphere is by far not the only knife plugin that supports stuff like this. Check out knife-ec2 and knife-openstack, both of which support similar behaviour.

Data Bags and Vault, Oh My!

I think my assessment that data bags were close to Hiera was premature. In the time that I wrote that, I have learned a few key takeaways.

First off, data bags are global. And even though I used global lookups directly in my first article about the subject, Hiera objects are designed to be used in a scoped auto-lookup mechanism, as per their documentation. Incidentally, data bags are the ones that are intended to be accessed in a fashion that is more in line with what I described in that article. Live and learn eh?

Conventions can be created that can help with this. For example, one can set up a bag/item pair like foo/bar if the cookbook’s name is foo. Also, cookbooks can still be parameterized elsewhere, and in fact Chef features like roles and environments are better suited for this, by design. (Note: Please read the last section of this article before investing too much into roles and environments, as roles in particular are frowned upon in parts of the Chef community).

Considering the latter part of the last paragraph, one would wonder: what exactly are data bags ultimately good for then? What indeed. A little bit of research for me revealed this good commentary on the subject by awesome community Chef 2015 winner Noah Kantrowitz. Incidentally, Noah recommends creating resources for this, versus using roles or environments.

As it currently stands, my general rule of thumb for using data bags is encryption. Does it need to be encrypted? Then it goes into a bag. Not a regular encrypted bag though – a regular encrypted bag requires a pre-shared key that needs to be distributed to the nodes somehow. There is no structure around controlling access to the data. How does this problem get solved then? Enter chef-vault.

Vault allows encryption via the public keys of the nodes that need the data, effectively ensuring only these nodes have access. This can explicitly be controlled by node or through a search string that allows searching on a key:data pair (ie: os:linux). This addresses both concerns mentioned in the previous paragraph. The only major issue with this setup is that the data needs to be encrypted across the key that will ultimately be supplied to the node, creating a bit of a chicken-before-the-egg problem. Luckily, it looks like Chef is catching up to this, and recently vault options for knife bootstrap were added to get past this. Now, when nodes are created, vault items can be updated by the host doing the provisioning, allowing a node access to vault items even during the initial chef-client run. This is not supported on validator setup, as the ability to do this anonymously (well, with the validator key) could possibly mean a compromise of the data.

Down the Rabbit Hole

Lastly, I thought I would share some next steps for me. The really fascinating (and frustrating) part of Chef for me is how the community has adopted its own style for using it. There are several years of social coding practices, none of which have been in particularly well documented by mainline.

First off, I am currently working on re-structuring my work to be supported by Berkshelf. This is now a mainline part of the Chef DK, and is used to manage cookbooks and their dependencies.

There are also a number of best practices for writing cookbooks, usually referred to as cookbook patterns in the community, which definitely reflects the developer-centric nature of Chef. Based on some very light and non-scientific observation, one of the more popular documents for this seems to be The Environment Cookbook Pattern by Jamie Winsor, the author of Berkshelf.

2 other articles that have helped me so far are found below as well:

You can probably expect to hear more out of me when it comes to Chef, stay tuned!

Configuration Management Series Part 2 of 3: Chef

Be sure to read the follow up to this article as well!

Back when I was looking at configuration management about 4 years ago, I explored the possibility of using Chef over Puppet. A few things threw me off at that point in time: the heavy reliance on Ruby for the setup and configuration (at least via the main documented methods), the myth that you needed to know Ruby to work with it, and ultimately, time: the ramp-up time to get started seemed so long that it was actually time prohibitive for me at a period in my life where I had very little to spare. A couple of years later, when it came to pick one to work with, I ultimately chose Puppet.

Over the years, I have heard talk every now and then about use of configuration management tools. It saddened me to hear not so much talk about Puppet, but interestingly, a lot of developers that I spoke with really like Chef, and there’s no doubt that some consider it an essential part of their toolbox. Read on, and you will probably see why.

About Chef

Chef seems to have come about as a matter of necessity, and ultimately was the manifestation of the end result of DevOps – infrastructure automation. Creator Adam Jacob probably tells it way better than I could in this video, where he explains the name, his journey in making the software, and OpsCode’s beginnings.

What I actually got from the video was the dispelling of a misconception I had – that Chef actually came from former employees of Puppet that wanted to create something better. If that was the case, it may have just been a natural evolution of the philosophies that Puppet was built on, and how Adam and company thought they could make it better to suit OpsCode’s needs.

Installation

I’m not going to lie – this was probably the most frustrating part of my teardown on Sunday.

Installation is pretty straightforward. Actually, very straightforward now, as opposed to when it tried it out so many years ago. The server install documentation now gives a much better path to installation, versus how it used to be when it involved gems and what not.

Unfortunately, the base install of Chef server is extremely resource intensive. The installer package for the server on Ubuntu is about 460 MB in size, which takes sizeable amount of time to download and install. You need about 2 GB of RAM to run the whole stack – after installation the total footprint is a little over 1 GB in RAM. I actually had to rebuild my test instance after the initial install with 512 MB failed.

From here, what to do next was a little confusing for me. This ultimately is due to the very modular nature of the Chef management lifecycle – which is great for developers (see below), but is a bit intimidating for first-time admin users or people looking to stand up with little time and minimal knowledge – only having a Sunday afternoon for review is a great example of this.

Ultimately, I pushed through and figured it out. The full path to getting a node up is basically:

This will get you set up with the server, a workstation to do your work, and a node to test with. What this does not do, apparently, is get chef-client to run on boot on the bootstrapped node. I’m not too sure why this is, but the installer package does not include any init scripts, unfortunately. With me being overdue for press time as it is, I chose not to investigate for now, and just ran chef-client manually to test the cookbook that I wrote.

Windows Support

Chef has pretty mature Windows support. Actually, one of the reasons that I chose Puppet over Chef in the first place was that Puppet’s Windows support was further along. I would imagine both are in the same place now.

Chef has also been working on support for PowerShell DSC, MS’s own approach to configuration management through PowerShell.

Head over to the Windows page to see the full feature set for Chef on Windows.

Manageability

Chef takes a very modular approach to management. Ideally, one has the server, and changes are made from workstations thru knife, save possibly organization creation and user management. This includes cookbook design, cookbook uploading, node bootstrapping, run list editing (the list of items that get run on a node during a run), and pretty much everything else about the Chef development lifecycle.

Again, hosts are bootstrapped usually via knife bootstrap – but there are other deployment options as well. See the bootstrap page for more options.

Data storage for nodes is done thru data bags – JSON-formatted data with support for encryption, similar to Hiera and hiera-eyaml for Puppet. I haven’t had much of a chance to look yet, but it looks like the features to not only automate this process and have encrypted and unencrypted data co-exist within the same data bag is a lot more automated and developer-friendly, probably better than Puppet and much better than Ansible.

Finally, there are the premium features. Check the Chef server page for more details. These features include a web interface, extra support for push jobs, analytics, and high availability. It should be noted that these features are free for up to 25 nodes.

Execution Model

Some more time needs to be taken by myself to evaluate all of these methods, but generally, chef-client is your go-to for all execution. Take a look at the run model here. This can be run as a daemon, periodically thru cron, direct on the server, or via knife ssh.

Speaking of which, there is also knife, where most of your Chef management needs will be taken care of thru.

There is also chef-solo – basically, a standalone Chef that does not require a server to be run. This kind of supplies an agentless push-execution model for Chef that can be useful for orchestration.

Programmability

Chef really is a developer’s dream come true.

The Chef DK installation model encourages a developer-centric development and deployment cycle, allowing all management and development to be done from a single developer’s workstation, with changes checked into central source control.

The DSL is pretty much pure Ruby. When Chef first started to become a thing, this was yet another excuse thrown out to not use it – ie: you would need to learn Ruby to learn Chef. But really, nearly all configuration management systems these days use some sort of language for their DSL, be it Ruby, YAML, JSON, or whatever. I would even submit that one can begin teaching themselves a specific language by taking up a configuration management system – learning Puppet actually helped me understand Ruby and ERB a bit. In addition to that, Chef has a great doc that can help you out – Just Enough Ruby for Chef.

Templating is done in ERB, just like in Puppet. See the template documentation for more details.

Terminology and actual concepts are more in line with Puppet. Cookbooks are pretty much the analog to modules or classes in Puppet, with recipes translating to individual manifests (the run data). You can use knife cookbook to create, verify, and upload cookbooks.

Speaking of development – this is actually a functional recipe:

file '/etc/motd' do
  content 'hello world!'
end

Set up within a cookbook – this would write “hello world!” to /etc/motd. Pretty simple!

I think the killer app for development for me here though has to be the testing support that Chef has built in:

  • kitchen is Chef’s built-in test suite that allows you to orchestrate a test scenario from scratch using a wide range or virtualization technologies (including Vagrant – so you don’t even need to have code leave your workstation)!
  • learn.chef.io is a great resource for learning Chef, versus testing Chef. It is a hosted training platform that will set up temporary instances for you to use, along with a guided tutorial.

For me, when people say that Chef is more developer-friendly – it’s not necessarily what you can do or what you can use, but the fact that the toolset enables developers to get code out all that faster.

Conclusion

Great for developers

Chef is awesome for developers, and it shows. The Ruby DSL is extremely easy to work with – that fact that they have made it more programmatic than other configuration management systems would seem to allow it to do more without having to extend it too much or look to third parties for extensions. Also, the Chef DK and Kitchen provide for a very pleasant development experience.

A little tough to get ramped up on

If you are in a pinch and don’t have a lot of time to set up and do not know Chef, you might want to figure something else out until you can block a bit of time to teach yourself. There are a lot of options for you to use Chef, which can make it overwhelming. Setting up the Chef server as well is a bit of a commitment of resources that you may want to consider carefully before you undertake it.

Next up in the series – an oldie but a goodie, the tool that got me started down the configuration management rabbit hole – Puppet.