Terraforming a Nomad cluster

All the code from his blog is available on Git.

At Hashiconf, Hashicorp announced Nomad. According to the website, it's a cluster manager and scheduler for microservices and batch workloads, that works with Docker out of the box. And it's all in just one binary! Of course that aroused my interest and I decided to play around with it. But rather than set up Vagrant boxes, I prefer to use proper VM's in Google's cloud. And by coincidence, Hashicorp has a very neat tool to automate that. It's called Terraform!

The scripts

I started with the credentials for the provider. The actual values will be read from an accompanying variables.tf file.

 
provider "google" {
    account_file = "${var.account_file}"
    project = "${var.project}"
    region = "${var.region}"
}

Then for the instance I used the default stuff for disk and network. I copy in a server.hcl containing a basic server definition, and provide the install script, for which I used the contents from the Vagrantfile provided by Hashicorp.

 
 resource "google_compute_instance" "nomad-node" {
    count = "${var.numberofnodes}"
    name = "nomad${count.index+1}"
    machine_type = "${var.machine_type}"
    zone = "${var.zone}"
    
    disk {
      image = "${var.image}"
      type = "pd-ssd"
    }
 
    # network interface
    network_interface {
      network = "${google_compute_network.nomad-net.name}"
      access_config {
        // ephemeral address
      }
    }
    
    # nomad version
    metadata {
      nomad_version = "${var.nomad_version}"
    }
 
    # define default connection for remote provisioners
    connection {
      user = "${var.gce_ssh_user}"
      key_file = "${var.gce_ssh_private_key_file}"
      agent = "false"
    }
 
    # copy files
    provisioner "file" {
      source = "resources/server.hcl"
      destination = "/home/${var.gce_ssh_user}/server.hcl"
    }
 
    # install 
    provisioner "remote-exec" {
      scripts = [
        "resources/install.sh"
      ]
    }
}

The network definition speaks for itself:

resource "google_compute_network" "nomad-net" {

    name = "${var.name}-net"

    ipv4_range ="${var.network}"

}

And I need to open up some ports in the firewall to be able to access the servers. I setup ssh access, and I'll allow free traffic between the nodes. I also add my local public address so I can reach any port on the machines.


resource "google_compute_firewall" "nomad-ssh" {
    name = "${var.name}-nomad-ssh"
    network = "${google_compute_network.nomad-net.name}"
 
    allow {
        protocol = "tcp"
        ports = ["22"]
    }
 
    target_tags = ["ssh"]
    source_ranges = ["0.0.0.0/0"]
}
 
resource "google_compute_firewall" "nomad-internal" {
    name = "${var.name}-nomad-internal"
    network = "${google_compute_network.nomad-net.name}"
 
    allow {
        protocol = "tcp"
        ports = ["1-65535"]
    }
    allow {
        protocol = "udp"
        ports = ["1-65535"]
    }
    allow {
        protocol = "icmp"
    }
 
    source_ranges = ["${google_compute_network.nomad-net.ipv4_range}","${var.localaddress}"]
 
}

The last piece is the variables file.

 
## credential stuff
# path to the account file
variable "account_file" {
  default = "/path/to/account.json"
}
# the username to connect with
variable "gce_ssh_user" {
  default = "user"
}
# the private key of the user
variable "gce_ssh_private_key_file" {
  default = "/home/user/.ssh/google_compute_engine"
}
 
## google project stuff
# the google region where the cluster should be created
variable "region" {
  default = "europe-west1"
}
# the google zone where the cluster should be created
variable "zone" {
  default = "europe-west1-d"
}
# the name of the google project
variable "project" {
  default = "myproject"
}
# image to use for installation
variable "image" {
    default = "ubuntu-os-cloud/ubuntu-1504-vivid-v20150911"
}
variable "machine_type" {
    default = "g1-small"
}
 
## network stuff
# the address of the subnet in CIDR
variable "network" {
    default = "10.11.12.0/24"
}
# public local address for unlimited access to the cluster, in CIDR
variable "localaddress" {
  default = "0.0.0.0"
}
 
# the name of the cluster
variable "name" {
  default = "nomad"
}
 
# the version of nomad to use
variable "nomad_version" {
  default = "0.1.0"
}

All of this is available on Github, so you can just pull from there, adjust the variables and terraform plan & apply.

Starting the cluster

It should be possible to bootstrap the process as shown in the Atlas Examples on Github. I'm currently working on implementing that, but I ran into yakshaving issues with Systemd. I also chose to run Nomad in both agent and server mode on each node and this gives a nice Terraform variable interpolation challenge. I have no clue yet whether running agent and server combined on each node is recommended or not but I guess I'll find out along the way.
So for now we'll start the cluster manually. Once the terraform apply is done, open an ssh connection to each of the servers and start the Nomad agent. Be sure to use the address of the network interface, if you don't it will bind to the loopback interface and other agents won't be able to reach it.

 
sudo nomad agent -config server.hcl -bind=10.11.12.4

Once all the agents are running, open an extra connection to one of the nodes and join the other ones from there. Specify the url of the remote server using the -address parameter. Use the local address as the argument to server-join.

 
nomad server-join -address http://$OTHER_SERVER:4646 $MYADDRESS

Do this for every OTHER_SERVER in your cluster. Once they are all connected you can start a job, for instance by doing a nomad init which will create an example.nomad file with a Redis task.
Before you run nomad run example.nomad however, set the NOMAD_ADDR environment variable, or else you'll connect to localhost.

export NOMAD_ADDR="http://nomad1:4646"

I will be trying out some more of Nomad's features, so expect more blogs on that. Also I'll be trying to bootstrap this stuff so watch the GitHub page for development in that direction. As always, keep those comments and suggestions coming!

Terraforming a Nomad cluster

The scripts

Starting the cluster

MiniMesos - Testing Infrastructure for Mesos Frameworks

From Development to Production with Otto

Talk to sales

Stay In Touch