How I Chose An Infrastructure As Code Tool
Background
Let me set the scene so I can explain the journey I have been on and the rationale behind some key decisions made along the way.
My journey with Infrastructure as Code (IaC) started at the end of 2018 when I was involved with a large scale data centre exit project. The goal was to move all infrastructure to a cloud provider and hit a contractual deadline, but working in a regulated business meant we had to go through a request for proposal (RFP) process to ensure we selected the right strategic partner.
Our strategy was clear that we weren’t just going to lift and shift all our virtual machines as they were, and we planned to adopt modern cloud practices and techniques so IaC was firmly on the agenda.
I took it upon myself to research the tools available in the IaC space so that once we had completed the RFP we could hit the ground running. At this time the key options were to use one of the big 3 cloud providers’ native solutions, or look for something independent. Azure had ARM templates, Amazon Web Services (AWS) had CloudFormation and Google Cloud (GCP) had Deployment Manager. All 3 of these solutions used varying combinations of JSON, YAML & Python.
Next I looked to see what independent tools were out there, which led me to research HashiCorp Terraform. This tool used a proprietary language called HCL, and was cloud agnostic so worked across multiple providers including Azure, AWS & GCP.
Anything was on the table at this early stage, so I looked into using the native CLI options for each provider as well.
Before I continue, it’s important to point out that the engineers that would be working on this project all had traditional infrastructure backgrounds, and none of them had development skills. Servers were built using CDs/VM images, and almost all configuration work was done using the GUI. This meant I needed to select a tool with the right capabilities to deliver value, and the fewest blockers to adoption.
NOTE: In 2018 Bicep did not exist for Azure, so wasn’t included in this process.
Imperative vs Declarative Which Approach Was Best?
Scripting languages such as PowerShell, VBScript & Python are imperative. This means they tell you HOW they are going to get to the end result, and require a good level of understanding of the syntax and functions used in order to figure out what the end result will be.
By contrast, formats such as JSON, YAML & HCL are declarative. This means they show you WHAT the end result will be, which makes them simpler to read and understand.
To illustrate this, I’ll show 2 simple examples of the same outcome using the different approaches.
Imperative Approach
- At 5pm take a steak out of the fridge to get to room temperature.
- Peel some potatoes and cut them into chips.
- Put a pan on the hob to heat and turn on the fryer/oven.
- Cook the steak in the pan until it is done.
- Take the steak out to rest.
- Cook the chips until done.
- Serve up on a plate at 6pm.
- Eat dinner.
Declarative Approach
- Cook steak and chips for dinner and eat at 6pm
WINNER: Declarative Approach The declarative approach was a better fit for engineers in my organisation who didn’t have a development background. This meant that using the native CLI tools to provision infrastructure was ruled out.
Which File Format Was Easiest To Understand?
With different options on the table, I wanted to investigate the pros and cons of the different file formats associated with the available tools.
JSON
JavaScript Object Notation (JSON) is a format for storing and transmitting data in a key-value pair format and is stored in objects. It is most commonly interpreted by applications but can be read by humans too.
Each object is surrounded by curly brackets {}
(also known as curly braces), each key-value pair consists of a key and a corresponding value, and each key-value pair is separated by a comma ,
.
When elements are added to an array they are surrounded by square brackets []
and separated by a comma. The order in which the key value pairs are listed in the object is irrelevant.
If a multi line value is required, such as commands that need to be run, each value will need to be separated by \n
. Values that have double quotes "
around them need to be escaped using backslash \
.
An example of JSON using different data types is below:
{
"string": "some text that can contain spaces",
"number": 10,
"boolean": true,
"emptyValue": null,
"nestedObject": {
"firstname": "darren",
"surname": "johnson"
},
"nestedArray": [
"friday night",
6,
"january",
true
],
"commands": "$message = \"Hello World\"\nWrite-Host $message"
}
The JSON syntax requires good attention to detail as it is prone to validation errors, but using editors such as VS Code will help highlight any syntax issues. JSON is often collapsed into a single line of text for API calls, but can be linted to make it easier to read using VS Code or online tools such as JSONLint .
YAML
Yet Another Markup Language or YAML Ain’t Markup Language (YAML) is a popular human readable format used for configuration files for any programming language. YAML is a superset of JSON meaning a YAML file can also contain JSON within it. It uses indentations to specify its structure and it’s syntax is also prone to validation errors.
YAML is made up of maps and lists and always begins with 3 dashes ---
.
A map allows you to specify key-value pairs just like in JSON and a new map can be created by increasing the indentation level. The order of the keys does not matter. If the value of a map’s key is a multi line string, you can use the ‘pipe’ character |
to specify a literal block which is very useful when specifying CLI commands.
A list, which is in fact an array, includes values listed in a specific order and starts with a dash followed by a space.
Most YAML values do not require quotes, however when you need to specify any special characters such as :-{}[]!#|>&%@
you will need to quote them. I use double quotes for this purpose, but if I have a value that contains double quotes I can escape it using a backslash.
An example of a YAML file using the same data types as the JSON file:
---
string: some text that can contain spaces
number: 10
boolean: true
emptyValue:
nestedObject:
firstname: darren
surname: johnson
nestedArray:
- friday night
- 6
- january
- true
command: |
$message = "Hello World"
Write-Host $message
Note that the empty value above could also be specified as emptyValue: null
HCL
HashiCorp Configuration Language (HCL) is a proprietary configuration language made up of Arguments & Blocks.
An argument assigns a value to a name much in the same way as a key value pair. An argument can have a list as a value and this is surrounded by square brackets.
A block is a container for other content and each block type defines how many labels must follow the block keyword. The block body is surrounded by curly brackets.
Strings need to be surrounded by double quotes, however numbers and boolean values don’t. Multi line strings support using the ‘heredoc’ style of string.
An example of a fictitious resource defined in Terraform using the same data types would look as follows:
resource "provider_resource_type" "resource_name" {
string = "some text that can contain spaces"
number = 10
boolean = true
emptyValue = null
nestedObject {
firstname = "darren"
surname = "johnson"
}
nestedArray = [
"friday night",
"6",
"january",
"true"
]
command = <<POWERSHELL
$message = "Hello World"
Write-Host $message
POWERSHELL
}
A key benefit of using HCL is that Terraform will ignore bad linting and continue to function normally if you get indentations wrong. There is also a built in command terraform fmt
that will automatically lint your code into the correct canonical format and style as well as removing any legacy syntax which is cool and saves valuable time!
NO WINNER: All 3 formats had syntax checking and tools available to lint them correctly, so I couldn’t separate them.
Which Tool Met The Requirements Best?
Initially I didn’t have any hard requirements to adhere to for implementing an IaC tool, other than ensuring it was secure, scalable and worked with the cloud provider we needed up selecting. I wanted any tool to be extensible so I could use it to manage multiple cloud resource types and not just virtual machines.
As we still had to complete the RFP process and select a cloud provider I couldn’t rule out any tool based solely on requirements, but only Terraform worked across multiple providers including some on premises technologies such as VMware.
NO WINNER: Terraform was the leader here, but I couldn’t rule out other tools just because of extensibility.
Which Tool Was The Easiest To Troubleshoot?
I’m not going to compare all the tools here, as we ended up selecting Azure as our cloud provider via the RFP process, so I will just compare ARM Templates against Terraform.
To illustrate this, I’ll compare the content of the files required to deploy an empty resource group named ‘resourcegroupname’ in the West Europe region having a single tag of ’tagkey: tagvalue’ via both an ARM template and via a Terraform configuration.
ARM Template
2 files are required to deploy an ARM template in a reusable manner:
- template.json - the template of the resources to be deployed
- parameters.json - the parameters that are fed into the template that are unique to the deployment
template.json contains:
{
"$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
"contentVersion": "1.0.0.1",
"parameters": {
"rgName": {
"type": "string"
},
"rgLocation": {
"type": "string"
},
"tags": {
"type": "object",
"defaultValue": {}
}
},
"variables": {},
"resources": [
{
"type": "Microsoft.Resources/resourceGroups",
"apiVersion": "2018-05-01",
"location": "[parameters('rgLocation')]",
"name": "[parameters('rgName')]",
"properties": {},
"tags": "[parameters('tags')]"
}
],
"outputs": {}
}
parameters.json contains:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"rgName": {
"value": "resourcegroupname"
},
"rgLocation": {
"value": "westeurope"
},
"tags": {
"value": {
"tagkey": "tagvalue"
}
}
}
}
Terraform Configuration
To deploy the same resource group via Terraform the following code is required:
resource "azurerm_resource_group" "resource_group_name" {
name = "resourcegroupname"
location = "westeurope"
tags = {
tagkey = "tagvalue"
}
}
I hope you agree that the Terraform configuration file is a lot simpler to read and understand. This is due to the fact that Terraform uses the concept of providers which know exactly which API has to be used, and what version it needs to be, so this extra information does not need to be specified in the configuration. This results in the number of lines of text being far less than the JSON equivalent that is required for ARM template deployments, which get exponentially more complex as more resources are deployed.
WINNER: Terraform: From simply looking at the code, Terraform was the easier to understand and had a lot less text to review. Both solutions achieved the same business outcome.
What Support Was Available?
This was also key to my organisation as we were starting from scratch. I wanted to ensure when the project was complete, the engineers responsible for running the system could get access to sufficient support resources. I also wanted to minimise any support costs.
A lot of the community content around ARM templates at the time as either too basic, or super advanced, and I didn’t feel like there was a sweet spot. I tried deploying a few resources in ARM, and when I hit issues they took me a long time to resolve. I’m sure this would have got easier in time, but it felt like a steep learning curve for someone with an automation background.
With Terraform, there was plenty of content out there including sample code I could easily understand. It might not have all been authored in line with best practices at the time, but within minutes of starting I was able to successfully deploy resources! I also checked out the GitHub repository and noticed the community were very active in helping to solve issues and bugs.
WINNER: Terraform: Cloud enables business agility, and spending too long debugging ARM templates could slow the project down and increase the time to deliver value back to the business.
What Were My Industry Peers Using For IaC?
I was fortunate enough to attend a couple of cloud conferences before we had to make a final decision, and whilst I was there I caught up with some old colleagues I hadn’t seen for years who had been working for different companies deploying and managing cloud infrastructure using IaC. When asking their preference on the available tools. there was one common answer - Terraform!
After the conferences, I setup a couple of sessions with them and they kindly walked me through how they were managing their cloud infrastructure using Terraform and it just clicked.
Final Decision
After gathering some material from Gartner, I made a recommendation that we adopt Terraform as our IaC tool of choice. This was approved and we haven’t looked back since. I quickly built out a landing zone in Azure to enable us to prove some key design concepts, and we were able to deploy services in Azure with a consistent configuration in minutes.
After working with Terraform for over 4 years now, I know I made the right choice as its capabilities have been enhanced significantly and it supports a whole range of cloud services, not just infrastructure.
OVERALL WINNER: Terraform