NOTE: Click here to get to the sample project quickly.
So I know it’s been a while – quite a while – since I’ve posted anything. I’ve been far from inactive though, and I will touch up on that in a later post.
I will be getting back on track here with my AWS World Tour series soon, but first, I want to take a bit of a detour and discuss some things that have been highly relevant to what I have been working on as of the last little while – infrastructure provisioning with Terraform.
My Journey the Last 6 Months
I have had a very eventful last 6 months since I started to use AWS “in anger”. I’m not the biggest fan of that term, but I can’t really think of other way to Put it right now. There have indeed been some frustrating moments, but I can’t really say I was angry.
In any case, my journey to try and find a good AWS provisioning platform took me to a few places, namely:
- Trying to use Chef – Which prompted me to write this pull request. However, I quickly realized that having to write AWS resources for everything that we needed to do at PayByPhone might not be the best use of my time.
- I also was going to write my own tooling around managing CloudFormation, but stopped short when, one weekend, I asked myself if I was reinventing the wheel, and I kind of was.
Enter Terraform.
Terraform
Terraform is a configuration management tool created by HashiCorp with an emphasis on virtual infrastructure, versus say something like Chef or Puppet, that has an emphasis on OS-level configuration management. In fact, the two can work in tandem – Terraform has a Chef provisioner.
What makes Terraform special is its ability to support many kinds of infrastructure platforms – and in fact, this is its stated goal. Things written in Terraform should be ultimately portable from one platform to another, say one wanted to move from AWS to Azure, or some kind of OpenStack-hosted solution somewhere. Admittedly, we had looked at Terraform earlier on in the year last year, and its AWS support had not been fully fleshed out yet, but that has seemed to have changed drastically, and by the end of last year its feature set was on par, and possibly even better, than CloudFormation itself.
So what if one just wanted to stick with AWS? What’s the point in using it over, say, simply CloudFormation? The answer to that is really dependent on the use case, and like many things in life, boil down to the little things. Some examples:
- Support for ZIP file uploads to AWS Lambda (versus being restricted to S3 on CloudFormation).
- Support for the AWS NAT Gateway. As of this writing, about a month since the NAT Gateway’s release, it still does not appear to be supported by
CloudFormation, according to my very light research (judging by the traffic on this thread). - A non-JSON DSL that has support for several kinds of programmatic interpolation operations, including some basic forms of loop operations allowing one to create a certain amount of resources with a single chunk of code. The DSL also has support for modules, allowing re-use and distribution of common templates.
This is all possible by the fact that by using Terraform, one is not beholden to the CloudFormation way of doing things. Rather, Terraform uses the AWS API (thru the Go SDK), which in some instances is much more versatile than CloudFormation is, or CloudFormation could be, for that matter (as a hosted configuration management platform, they have to restrict some of their data sources to some reliable services – this is more than likely why S3 is the only way to get a ZIP file read in for Lambda).
Terraform is a fast moving target. I have submitted several pull requests myself, for bug fixes and adding functionality alike, and don’t see an end in sight for that, not at least for a few months. For example, one of my more recent feature PRs allows one to get details on an AMI for use in a template later. In order to do this in CloudFormation, one would have to undertake the tedious process of writing a custom resource which ultimately leads to more out-of-band resources, and unnecessary technical debt, in my opinion.
Packer
I’ve been using Packer for quite some time now to build base images, namely for our VMWare infrastructure at PayByPhone.
Packer is basically an image builder. Think of it this way – gem
might build a Ruby gem to deploy on other systems, and one might build a static binary with go build
. Packer is like this, but for system images. In fact, one might not have a custom application, and simply may need to build an image with some typical software on it and configured a certain way – in this instance, Packer and its provisioning code could live in the same repository and serve as a complete “application”.
Incidentally, the process for building AMIs is actually much simpler than using VMWare. There is a lot less code, and it’s basically AMI-in, AMI-out.
Packer Provisioners
Just a note on provisioners – Packer supports several of these. I use the Chef provisioner. There are also Ansible and shell provisioners. I would recommend using one of the configuration management options – it allows for the better re-use code, and also allows for testing before moving on to the Packer build. Namely, with Chef, I am able to use Berkshelf to manage dependencies, and Test Kitchen to sort out most, if not all, errors that would happen with a cookbook before moving on to creating the packer JSON file.
This also makes it more portable for use with something like Vagrant – generally all that’s needed is to copy the provisioner configuration from Vagrant to Packer, or vice versa, with possibly some minor modifications.
Putting it Together
With the above tools and a competent build runner, I can actually write an entire pipeline that takes an application and deploys it to AWS with relative ease. Better than that, this all can live together! This enables someone to go to a single repository to:
- Get hands on the code, make changes, and run unit tests on it
- Deploy using the provisioner code to an EC2 instance, or local to their machine with Vagrant, to run integration tests and experiment with the code
- Fully test the infrastructure by building the AMI and deploying the infrastructure in an uniform fashion to multiple environments (ie: sandbox, staging, or production).
This allows for a pipeline that should ensure a near perfect deploy once the application is ready for production.
The Pattern in Action
As an example, I have taken the code from my previous article, (see here, here, here, and the code here), and applied the same idea, but with some changes. Again, I am deploying a VPC with ELB and 3 instances, but this time, I have skipped some of the details irrelevant to a VPC of this kind, especially since I will never be logging into it – namely the private network and NAT part of things.
The code can be found here. Let’s go over it together:
The application
The application is a simple Ruby “hello world” style application running with Sinatra. The application is bundled up as a Ruby gem – this is actually a pretty easy way to create this kind of application as it produces a single artifact that can be deployed where ever, especially on a “bare metal”, single- purpose system. This application could be even deployed even using the system’s base Ruby, if it is current enough (I don’t do that though, as I will show in later sections).
The layout of the application part of the sample repo is such:
exe/ vancluever_hello <-- Executable binary part of gem package lib/ vancluever_hello.rb <-- Application &amp;quot;entry point&amp;quot; and Sinatra code vancluever_hello/ version.rb <-- Gem version file pkg/ <-- Output directory (gem gets built to here) Gemfile <-- Bundler dependency file (chained to gemspec) Rakefile <-- Rake build runner configuration file vancluever_hello.gemspec <-- RubyGems package spec file
The bulk of the test code is in the lib/vancluever_hello.rb
file, a simple file whose contents is shown below:
require 'sinatra/base' require 'socket' module VanclueverHello # Run the test server. class Server < Sinatra::Base def self.run_server set :bind, '0.0.0.0' get '/' do "Hello from #{Socket.gethostname}!!!" end run! end end end
This is the Sinatra self-hosted version of what we were doing with the index.html files, Apache, and user data in the previous version of this stack. Rather than use a static file this time, we are using Sinatra and Ruby to demonstrate how this small app can be bundled onto an image and deployed from there, without any post-creation package installation and content writing.
The Rakefile
contains the bundler/gem_tasks
helper that allows us to easily build this gem from the details in the vancluever_hello.gemspec
file. By running rake build
, the .gem
is dropped into the pkg/
directory, ready for the next step.
The packer_payload
Chef cookbook
The next piece of the puzzle is the packer_payload
Chef cookbook, self-contained in the cookbooks
directory. This cookbook is not like other Chef cookbooks one might see – the metadata is stripped down (no version info, description, or even version locking). This is because this cookbook is not intended to be used anywhere else other than the Packer build that I will be discussing shortly.
Why Chef then? Why not shell scripts if this is all the cookbook is going to be used for? A couple of quick reasons that come to mind:
- I’m not the biggest fan of shell scripts – I will use them when necessary, but I’m more a fan these days of writing things in a way that they can be easily re-used, and in a way that makes it easy to pull in things that make my job easier. Using Chef allows me to do that. For example, rather than having to write code to manage a non-distro Ruby, I use
poise-ruby
andpoise-ruby-build
to manage the Ruby version and gem package. Taking it further, rather than having to write scripts to template outupstart
orsystemd
, I usepoise-service
, which supports both. - Even if this cookbook is not suitable for Supermarket, or to sit on a Chef server, it’s re-usability is not completely diminished. Test Kitchen can still be used with this, and in fact there is a
.kitchen.cloud.yml
file in the directory. Kitchen was used to test this cookbook before putting it into packer, ensuring that most, if not all, of the code worked before starting the process to build the AMI. This cookbook can also be used in Vagrant with minimal effort as well, should the need arise.
Getting the data to Chef
One thing that deserves mention is how I actually get the data to the Chef cookbook. The artifact does need to be delivered in some fashion to the cookbook itself. This is okay, mainly because I have Packer and Test Kitchen to help out with that.
In attributes/default.rb
I control the location of the artifact:
default['packer_payload']['source_path'] = '/tmp/gem_pkg'
This is where the artifact is copied to with Packer (more on that soon). However, with Kitchen, things are a little different, because of how the data directory stuff works:
provisioner: name: chef_zero data_path: ../../pkg/
data_path
controls the directory that contains any non-cookbook data that I want to send to the server. After this is done, I need to change the source_path
node attribute:
suites: - name: default run_list: - recipe[packer_payload::default] attributes: packer_payload: source_path: /tmp/kitchen/data app_version: <%= ENV['KITCHEN_APP_VERSION'] %>
Also note the ERB in app_version
– this is an environment variable passed in from Rake, which gets the data from the VanclueverHello::VERSION
module in the gem code. More on this below.
Testing
As mentioned, the cookbook’s .kitchen.cloud.yml
is fully functional, and the cookbook can be tested using the following command:
KITCHEN_APP_VERSION=0.1.0 \ AWS_KITCHEN_USER=ubuntu \ AWS_KITCHEN_AMI_ID=ami-123456abcd \ KITCHEN_YAML=.kitchen.cloud.yml \ ../../bin/kitchen verify
Note the environment variables. Depending on the target being tested for, kitchen-ec2
may have trouble finding an AMI for the target. The one I seem to have the most luck with right now is Ubuntu Trusty (14.04) – but I wanted to try this against some of the more recent versions like Wily. This necessitated the need to supply the login user and the AMI ID, which I do through environment so that it can be parameterized. Also, I pass the app version, which helps show how I can control the version of the gem that gets installed. Also, it is a popular pattern to name EC2 or other cloud Kitchen config files as .kitchen.cloud.yml
and call them by passing the KITCHEN_YAML
environment variable.
All of this is also in the Rakefile
, under the kitchen
task. By doing this, I don’t have to worry about running this from the command line all the time. Further to that it takes the work out of determining the AMI to use (more on that later).
The Packer template
The Packer template lives in the packer/ami.json
file, sitting at a nice 43 lines, fully beautified. It has variables that are used to tell Packer what AMI to get, what region to set it up in, and also some things to add to the description, such as the distribution and application version.
The Packer template is probably the most simplest part of this setup. All that is needed to kick off the AMI creation is a packer build packer/ami.json
. Of course, the parameters need to be passed via environment variables though, but the Rakefile
handles that.
Provisioners – artifact delivery and Chef
One thing that I will note about the Packer template is how it does its configuration work on the AMI – this is done via what are known in Packer as provisioners:
"provisioners": [ { "type": "shell", "inline": ["mkdir /tmp/gem_pkg"] }, { "type": "file", "source": "pkg/vancluever_hello-{{user `app_version`}}.gem", "destination": "/tmp/gem_pkg/vancluever_hello-{{user `app_version`}}.gem" }, { "type": "chef-solo", "cookbook_paths": ["berks-cookbooks"], "run_list": ["packer_payload"], "json": { "packer_payload": { "app_version": "{{user `app_version`}}" } } } ]
Note the first two – the shell
and file
provisioners, which deliver the artifact. Creation of the directory is necessary here (and Packer won’t fail if the directory does not exist – something that created about an extra 2 hours of troubleshooting work for me as I was making this example). The next one, the chef-solo
provisioner, runs the packer_payload
cookbook to configure things.
Note the cookbooks
directory. It’s not cookbooks
, but instead berks-cookbooks
. This is because I’m staging the full, dependency evaluated cookbook collection in the berks-cookbooks
directory, via Berkshelf. This is handled by the Rakefile
ahead of the execution of Packer. I haven’t dived into the Rakefile
yet, but I won’t just yet – first off, I want to introduce the star of the show.
Tags
One last thing before I move on. Tagging the AMI is important! This allows a search on this AMI afterwards. Also, it keeps me from having to parse the Packer logs for the AMI ID, which even though Packer makes easier with a machine readable output flag, I still find tagging to be less work to do (no capturing output or having to save a log file). It also gets one into the habit of tagging resources, which should be done anyway.
Note that it doesn’t have to be the application ID that is the tag or the “artifact”. In addition to this, one could also tag the build ID – which can provide even further granularity when searching on AMIs.
Terraform
Finally I get to the headliner. Terraform.
The Terraform file sits in the terraform/main.tf
file. This is only a single file, but it can be broken up in this directory into as much as it makes sense to break up such things. For example, a lot of my projects now have a variables.tf
outputs.tf
, vpc.tf
, instance.tf
, and more, depending on how things need to look. This allows for code chunks that are easy to read. As long as they are all in the same directory, Terraform will treat them all as the same plan.
Looking at the main.tf
file, one may notice several analogs to the CloudFormation template from my last article. The key differences, save the difference in infrastructure to remove things that were not necessary for this article, are that it’s not JSON, and also that there is only one aws_instance
resource, with a count
of 3. count
is a special Terraform DSL attribute that tells Terraform to make more than one of the specified resource. This is referenced in the aws_elb
block too, with a splat operator:
resource "aws_elb" "elb" { ... instances = ["${aws_instance.web_servers.*.id}"] }
Basically allowing me to reference all the instance IDs at once.
And of course, the Terraform file is parameterized. There are 3 parameters – region
, ami_id
, and vpc_subnet
. region
and ami_id
are both required as they don’t have a default, but vpc_subnet
does not need to be supplied if the default 10.0.0.0/24
network is okay.
Manually, if the variables were supplied as TF_VAR_
environment variables (ie: TF_VAR_region
or TF_VAR_ami_id
), one could just run terraform apply
and watch this thing go. In reality though, I want this going through Rake, and that’s what I do.
The Rakefile
Maybe not the star of the show, but definitely the thing keeping the lights on, is the Rakefile
.
In addition to having a full DSL of its own to run builds with, Rake can be extended with standard Ruby. This can come in the form of simple methods within the Rakefile
, or full-on helper libraries that can provide a suite of common tasks for a project. As mentioned, I used the bundler/gem_tasks
helper to provide the basic RubyGems building tasks (and if in the Rakefile
I have even disabled a few to ensure the gem doesn’t get accidentally pushed).
Incidentally, the Rake tasks make up only a small portion of this Rakefile
. There are 4 user-defined tasks, berks-cookbooks
, ami
, infrastructure
, and kitchen
.
ami
calls two prerequisites, build
and berks-cookbooks
. The former is a gem task, which builds the gem and puts it into the pkg/
directory. The second runs Berkshelf on the cookbooks/packer_payload/Berksfile
file, Vendoring the cookbooks into the berks-cookbooks
directory so that Packer has all cookbooks available to it during its chef-solo
run. After these are done, packer can run, after getting some variables, of course. The same goes for the infrastructure
task, which has no prerequisites, but sends some variables to the terraform
command. Finally, the kitchen
task allows for me to easily run tests on the packer_payload
cookbook.
The Rakefile
helper methods
This is where things really come together. There are a couple of kinds of helpers that I have here. The first kind are very simple and handle a few variables from the environment. Honestly, if Rake had something better to handle this, I would use that instead, but from what I’ve seen, it doesn’t (a bit more RTFM may be necessary). Also, I would rather use environment instead of the parameter system that Rake uses by default to provide parameters – it’s more in line with what the rest of our toolchain uses.
The second kind is where Rake really shines though. These are the functions ubuntu_ami_id
, app_ami_id
, and rfc3339_to_unix
:
ubuntu_ami_id
uses theubuntu_ami
gem to find the latest Ubuntu AMI for the distribution (defaulttrusty
) and root store type (ebs-ssd
). This is fed to Packer.- After Packer is done and has tagged the AMI with our application tag,
app_ami_id
can go and get the latest built image for our system. Sorting is helped by therfc3339_to_unix
, which helps convert the timestamp. This AMI ID is then fed to Terraform.
The tasks to deploy
Finally, after all of this write up, what is needed to make this run? Three very easy commands:
- After mirroring the repo,
bundle install --binstubs --path vendor/bundle
ensures all the dependencies are gathered up within the working directory tree. - Then,
bundle exec rake ami
will build the AMI. - Finally,
bundle exec rake infrastructure
will deploy the infrastructure with Terraform.
And presto! A 3-node ELB cluster in AWS. The Terraform output will have the DNS name of the ELB – after a few minutes when everything is available, connect over HTTP to port 4567
to see what has been created – refresh to see the page cycle through the IP addresses.
Destroying the infrastructure
After completing the exercise, I want to shut down these resources to ensure that they are not going to rack up a nice big bill for me. This is easily done with the setup I have:
TF_CMD=destroy bundle exec rake infrastructure
This will destroy all created resources. Afterwards, I delete the AMI and the snapshot that Packer made using the AWS CLI or console. And it’s like it never existed!
Final Word – the terraform.tfstate
File
One important thing about this file. The terraform.tfstate
contains a working state of the infrastructure and must be treated with respect. This is the part that’s actually hidden from when CloudFormation is used as AWS handles this part.
The terraform.tfstate
file can be managed in one of two ways:
- By checking it into source control (if using the example repo as a starting point, note that it is, by default, in the
.gitignore
file). - By using remote state to store the config. There are several options, such as S3, Consul, or Atlas, and a few others not mentioned here. Remote state also has the advantage of easy retrieval for use with other projects (other Terraform files, for example, can access its outputs).
Final Final Word – Modules
One thing that didn’t get mentioned here at all has been Terraform’s ability to use modules.
I suggest checking this out if you plan to really dive into Terraform. So much of your repeatable infrastructure code can be put in a module. In fact, the template in this example can serve as a module, if it lived in its own repository – then, all the template that referenced it would need is the 3 variables defined at the top, which would ultimately turn the main.tf
file in its referencing project into an approximately half-dozen lines of code.
Well, that’s all for this article! As usual, I hope you found the material informative! I hope to get back on track with my initial intention of evaluating AWS services soon, but don’t hold it against me if things are slow. I have been far from inactive though – I will mention some of the things I have been up to in my next post. Until then, take care!