Ansible Crash Course

As we dawdle till every facet of our lives are containerised, we still find ourselves in need of ways to automate provisioning of actual servers and operating systems. My current favourite way to do this is through Ansible.

There are a few reasons why I prefer Ansible over its heavier-weight peers, like Chef or Puppet.

Ansible is not opinionated. Each framework brings a lot to the table. Ansible does too but it does not force all its features down your throat. I do not need to understand the nuances of how variables are evaluated in 20 different layers or how to deploy to a Google-size cluster when all I want to do is install a simple Tomcat server.

Ansible manages to scale based on the user needs. It is worryingly similar to The One Ring, expanding and shrinking based on its wearer. If you just want to execute a couple of commands against a bunch of remote servers, you can. If you want to leverage reusable composable scripts to automate configuration of a whole server farm, you can do that too.

Ansible has a gentle learning curve. It may be side-effect of having-to-use-only-what-you-need but I find Ansible a lot easier to get into. Its simple readable scripts, that just execute in the order you write them, are as straight-forward as it gets.

Ansible requires very little upfront investment. Ansible is server-less. There is no need to set up a provisioning server, bootstrapping agents on each target machine, setting up messaging, etc. Some of the other frameworks actually need a mini-framework just to automate the setup of the thing that is going to be setting up everything else. But not Ansible. You just point it at a bunch of servers and watch it do its magic.

After singing its praises, I do need to point out that while Ansible is suitable for a great many use cases, it is not for EVERY use cases. If you are in charge of 1000 servers, first I’d like to congratulate you on your job security but I also recommend looking at one of the heavier frameworks for your automation needs. They do tend to scale better at such large volumes.

But for the rest of us mere mortals, Ansible is plenty.


Ansible only needs to be installed on the control machine. This can be a developer’s machine, a build server or whatever. From here we remotely execute Tasks on all our target servers. Only current restriction is that the control machine has to be Unix-based operating system, i.e. Unix/Linux/MacOS, even when managing Windows servers.

Ansible does its work by SSH‘ing into the target machine and making the necessary changes. The minimum requirement for the target machines is to have Python 2.4 or later installed, as pretty much any self-respecting server already does.

Ansible lets you execute ad-hoc commands against a bunch of target servers but that is not very useful in real-life scenarios so we move right along to scripting.

Ansible scripts are called Playbooks and they are written in YAML syntax. Each playbook consists of one or more Plays which are a set of Tasks executed against one or more target machines.

Baby’s First Playbook

Let’s create our first playbook. If you are not familiar with YAML, I strongly recommend you stop here and crash course that too. It won’t take long. I’ll wait for you here.

Here is a minimalist playbook with a single play, that has a single task:

- hosts: node1
  remote_user: root

  - name: create an amazing folder
    file: path=/tmp/amazing state=directory

That’s it! Above I am running a single task as the root user, to create an amazing directory on the target machine, node1. But you did not need me to tell you this because the script is stupidly readable.

Each Task has a name which provides heading and documentation for what that Task does. In some scenarios, you also refer to the task using this name. Each task also configures a Module, in this case file. Modules are what really provide the functionality of Ansible. For example, file Module provides functions to interact with file system, such as create directories (above), set permissions, symlink, etc.

There are hundreds of Modules shipped with Ansible. Some other good ones to know are copy (copies files to remote location), command (executes commands on remote server), yum/apt (I let you guess what these do), template (renders templated files and copy them to remote location), docker (because Docker!), etc.

Something that may look a bit peculiar about our first task is that it actually defines an end state, not an action. This is intentional. The plays describe the desired outcome while the Modules make that happen. As an Ansibler, you are concerned with the outcome, not the how.

This ties into the idempotency rule of Ansible plays. This is a fancy way of saying if I ran the same play on a server a 100 times consecutively, the server end state should remain the same, whether I check that outcome after run #1 or run #100. In the above script, there will always be an amazing folder in that path, regardless of how many times I run the playbook.

Keep this in mind when creating your own playbooks. You should be confident that running them multiple times will not stuff up your target machines.

Speaking of running the playbook, here is how you execute the one above from the command line of the control machine:

$ ansible-playbook -i hosts my-playbook.yml

hosts file is an Inventory. It is just a plain file that describes the servers we are targeting. Here is an example of its content:

node1 ansible_host=

We can define more hosts, one per line. We can also put them in Groups and execute playbooks against a Group, instead of single hosts. There are many more aspects to the inventory files, such as using Patterns. See Ansible’s guide on Inventory for more information.

For cloud platforms, you may be interested in Dynamic Inventory which sources the list of hosts from a third-party inventory, like Cobbler or a cloud platform’s own inventory.

Once you run a playbook, you will receive outputs like this:

TASK [create an amazing folder] *******************************************************************
ok: [node1]

This means that the task was executed against node1 and the amazing folder is already there so no changes are required. In other scenarios, you may receive changed , skipped or failed, instead of ok. changed means that the folder was not found and Ansible went ahead and created it.


Static Ansible playbooks have fairly short shelf-life. What you want is to give the user of your playbooks the ability to control some aspects of it. Ansible sets up with Variables.

Check out the following script:

- hosts: node1
  remote_user: root
    java_version: 1.8.0
  - name: install JDK 
    yum: name=java-{{ java_version }}-openjdk-devel state=present
  - name: set Java version as default
      name: "{{ item }}"
      link: /usr/bin/{{ item }} 
      path: /usr/lib/jvm/jre-{{ java_version }}-openjdk.x86_64/bin/{{ item }}
    - java
    - jar

As you can probably tell {{ }} notation is used for variables. Here we externalised the Java version for the user to play with.

I have also thrown in an example of how you can break down the module configuration across multiple lines when it starts getting too long. This is done by using a sub-map, instead of x=a y=b ... notation.

If that was not enough, the same tasks also demonstrates how loops are used in Ansible. with_items sub-element iterates through its array and execute the task for each item, also assigning the array element to the item variable.

Variables are not limited to playbooks though. They can also be used in a supporting file that you like to copy to the target machines. The following playbook copies one such template across:

- hosts: node1
  remote_user: root
    site_name: "Joe's Rusty Nails"
  - name: copy home page
    template: source=index.html.j2 dest=/var/www/html/index.html

Ansible uses jinja2 as its templating engine.

    <h1>Welcome to {{ site_name }}.</h1>


To get our playbooks ready for primetime, we need to discuss a couple of more Ansible features:


Handlers are a way to trigger some special tasks based on whether a play task was run or not. The only useful use case I have found for this is to restart an application which cannot automatically pick up the changes I have made without a restart.

For example, here we restart the Mule server after updating the log configuration file:

    - name: copy in the log config
      copy: dest={{ mule_home }}/conf/
        - restart mule
    - name: restart mule
      service: name=mule state=restarted

The handlers name must be globally unique but beside that they are just like normal tasks. They are executed at the end of the play if one or more other tasks have “notified” them.

Gathering Facts

Before Ansible executes a playbook, it runs a fact-gathering task to get some details about the target node. Depending on the privilege of the executing user, we can get info about the operating system, devices, mounts, and much more.

These information can be useful if you have mixed ecosystem and you need to execute different tasks depending on the target machine.

This, in conjunction, with conditionals can let you create very flexible playbooks. Here is a contrived example:

- hosts: all
  remote_user: root

  - name: yum install perl
    yum: name=perl state=present
    when: ansible_distribution == 'CentOS'
  - name: apt install perl
    apt: name=perl state=present
    when: ansible_distribution == 'Debian'

This is a contrived example since Ansible now give you the package module to do this in an OS-agnostic manner.

Getting Advanced

By now, you know enough about Ansible to be dangerous. Initially you will write single-file playbooks. Then after a while, the little voice inside, tells you to break things into more reusable components and organise the supporting files.

Ansible lets you use whatever convention you want here. You can load common and/or reusable scripts using the include command. You can refer to the supporting files, like templates, using a relative path off the playbook location.

Ansible lets you do this because it is not opinionated and flexes to fit your style. That said, for the more submissive of us, Ansible does provide a series of conventions for creating reusable playbooks. These are called Roles.

A role is a set of tasks that can be assigned to a target host. You can slice-and-dice roles as you see fit. For example, you can have roles like “build node” vs. “esb node”; or you can split roles by product, e.g. “fuse” vs. “amq” vs. “jon-server”. It is entirely up to you.

- hosts: esb-prod
    - fuse
    - amq

The roles themselves are just a convention of laying out your tasks. It removes the need to explicitly include other scripts. The following files get get included automatically:

File(s) Description
./my-playbook.yml The playbook above - you explicitly execute this file
./roles/fuse/tasks/main.yml The tasks for the role fuse
./roles/fuse/handlers/main.yml The handlers for the role
./roles/fuse/defaults/main.yml The default variables for the role
./roles/fuse/files/* The static files that will be copied to the remote server as part of this role’s tasks
./roles/fuse/templates/* The Jinja2 templates that will be copied to the remote server as part of this role’s tasks
./roles/fuse/vars/main.yml Any additional variables that the role wants to add to the play
./roles/fuse/meta/main.yml Contains metadata about the role, including role dependencies

You do not need to have all these folders in your role. A minimal role could be only the tasks/main.yml file.

Another handy feature of the roles is that any files in files or templates folders are automatically accessible without needing to prefix them with those directory names.

Again, you do not need to use this structure but once you start doing larger playbooks or if you want to share your reusable assets with other people, this structure becomes very useful.

Going Further

Here are some final tips to help you along the way.

Ensure Idempotency

As mentioned before, you need to ensure that you can run your playbook over and over against the same target machine without adverse effects. Ansible ensures this in most scenarios but sometimes you need to step in.

For example, the following scripts ensure that Fuse is only downloaded and install if not already there:

- name: check if Fuse is already installed
  stat: path={{ fuse_home }}/bin/fuse
  register: fuse_binary
- block:
  - name: clean up download directory
      path: "{{ fuse_installation_path }}/{{ fuse_downloaded_file }}"
      state: absent
  - name: download Fuse zip file
      url: "{{ fuse_download_url }}" 
      dest: "{{ fuse_installation_path }}/{{ fuse_downloaded_file }}"
  - name: explode Fuse zip
      src: "{{ fuse_installation_path }}/{{ fuse_downloaded_file }}" 
      dest: "{{ fuse_installation_path }}"
      copy: no 
      group: "{{ fuse_shell_group }}" 
      owner: "{{ fuse_shell_user }}"
  when: not fuse_binary.stat.exists

The first tasks captures statistics about the fuse executable and stores it in fuse_binary variable. The next block of tasks are only executed when the binary file does not exist.

stats module is a good way of checking the status of files and folders on the remote system.

The above example also introduces the concept of block, new to Ansible 2. Prior to Ansible 2, you had to repeat the when element in each task. Now, it can factored out and applied to an entire block.

Target Specific Hosts

While running a playbook over the entire ecosystem is a powerful feature, more often than not, I find myself wanting to execute a playbook against certain set of nodes.

One solution to this problem is to use the --limit flag. The following invocation limits the execution to specific group of hosts from the inventory file.

$ ansible-playbook -i hosts --limit preprod my-playbook.yml

However I still find this clunky since if someone forgets to include this flag, they will essentially run the playbook over every host configured.

My preferred approach is to paramterise the target host:

- hosts: "{{ target }}"
  remote_user: root

Now I can execute the playbook against node1 only by simply invoking the following command:

$ ansible-playbook -i hosts -e target=node1 my-playbook.yml

Turn Off Fact Gathering

A lot of my use cases involve dealing with an ecosystem of identical nodes. In these scenarios, I hardly ever need to refer to any of the gathered facts as part of the playbook.

In these situations, I prefer to turn off fact-gathering. Fact gathering is a fairly time-consuming tasks and disabling it significantly shorten the running time of a playbook.

Follow Best Practices

Finally, I recommend a close study of Ansible’s official Best Practices to ensure you are writing the most optimal playbooks possible.

Final Words

I hope this crash course has given you the confidence to start writing your own playbooks.

Now go ahead. Spin up a VM or a EC2 instance and start Ansible-ing! (The verb is work-in-progess.)

You might also enjoy:

Dock Tales: Docker Authoring, with Special Guest Mule ESB 30 March 2015

Microservices with Apache Camel, Spring Boot and Docker 31 March 2016

Developing Bulk APIs with Mule, RAML and APIKit 02 December 2014

Advanced File Handling in Mule 15 June 2015

Dock Tales, Or: How I Learned to Stop Worrying and Love the Container 23 March 2015

comments powered by Disqus