My previous blog described how to create a DC/OS cluster on Packet with Terraform. Even though running Terraform is simple and straightforward it doesn't give much insight into the core installation process of DC/OS. To get a deeper understanding let's pick one of the custom DC/OS installers. There are currently 3 custom DC/OS installers suitable for bare metal environments: the GUI, CLI and the Advanced installer. The GUI installer is a web application on the bootstrap node that asks you to enter configuration for the cluster. When you then hit 'Install' the app goes off and installs everything. The cli installer performs the same steps as the GUI installer but uses a script instead of a GUI. Besides these there are also cloud specific DC/OS installers.
Despite its name the advanced installer is in fact more basic than the UI or CLI installer. It is also easier to use in my experience. Why? Because it only uses SSH, plain shell scripts and config files. The GUI and CLI instead might seem easier to use but they are more unforgiving. When the install using the GUI or CLI fails it's harder to figure out what went wrong and recover from a broken installation. Another benefit of a basic installation process built on shell scripts is that it's easier to automate. Therefore I recommend using the advanced installer. Now let's create our infrastructure and kick-off the installation process.
Installing a DC/OS cluster involves setting up a `bootstrap` machine which hosts the configuration and packages DC/OS requires. The other machines are installed from the boostrap node. To run a small cluster we need 5 machines: 1 bootstrap machine, 1 master and 3 agents. One of the agents will be a public agent. You can create the Packet devices from their web ui or use the `packet baremetal` cli command.
To make things easier map `bootstrap`, `master1`, `agent1`, `agent2` and `agent3` to their private IPs and add them to `/etc/hosts` on all devices. Alternatively you can run your own DNS server for all nodes in the cluster. To secure the cluster you can use an IP tables firewall as I described in my previous blog post.
The advanced installer performs 6 steps
Let's look at each step in more detail.
This 700+MB script contains everything you need to configure DC/OS. You can download it from the DC/OS release page.
Plain and simple. It creates the genconf folder and an initial config.yaml file.
Now edit the generated config.yaml. You only have to fill in the private IP addresses of the bootstrap and master nodes and the port the bootstrap nginx server will be running. See Run Nginx below.
---
bootstrap_url: http://10.11.12.13:5000
cluster_name: 'my-cluster'
exhibitor_storage_backend: static
ip_detect_filename: /genconf/ip-detect.sh
master_discovery: static
master_list:
- 11.12.13.14
resolvers:
- 8.8.4.4
- 8.8.8.8
use_proxy: 'false'
The purpose of `ip-detect.sh` is to return the private IPv4 address that your master or agent node will be associated with during its lifetime. This IP is meant to be stable for this node. If it changes the node should be wiped and installed. This script can either use `ip addr` commands or use a cloud API to determine the IP address. Since Packet devices use a bonded network interface the script will output the private IP of the node on the `bond0` interface.
#!/bin/bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip address show bond0 label bond0:0 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)
Now we are ready to run nginx which will host the packages and config under `genconf/serve`.
sudo docker run -d -p 5000:80 -v $PWD/genconf/serve:/usr/share/nginx/html:ro nginx
Make sure you use the same port as specified in `config.yaml` above.
Now check if you can download the DC/OS install script at `http://10.11.12.13:5000/dcos_install.sh` from one of the agents. It this works you are ready for the next step.
Everything is set up and we can now run the install script on all the master and agent machines. The `dcos_install.sh` script gives good feedback on what the script will install. It lists all packages and services that it will start. If one of the ports is blocked by a process that is already running the script will exit with an error code and show what is wrong. See the Troubleshooting section below how to fix this.
During installation a few things can go wrong:
To get rid of an existing partial installation stop all DC/OS related processes. These can be found by running
systemctl show -p Wants dcos.target
(out)Wants=dcos-metrics-agent.socket dcos-logrotate-agent.service dcos-logrotate-agent.timer dcos-docker-gc.service dcos-adminrouter-agent-reload.service dcos-signal.timer dcos-gen-resolvconf.service dcos-spartan-watchdog.service dcos-3dt.service dcos-adminrouter-agent-reload.timer dcos-mesos-slave.service dcos-rexray.service dcos-epmd.service dcos-spartan.service dcos-3dt.socket dcos-gen-resolvconf.timer dcos-spartan-watchdog.timer dcos-navstar.service dcos-docker-gc.timer dcos-metrics-agent.service dcos-pkgpanda-api.socket dcos-log-agent.socket dcos-adminrouter-agent.service dcos-pkgpanda-api.service dcos-log-agent.service
You can stop them with this oneliner
systemctl stop -- $(systemctl show -p Wants dcos.target | cut -d= -f2)`
Note that running `systemctl stop dcos.target` has no effect. See why on the how to stop all systemd units belonging to same target Unix StackExchange thread.
Now you have removed the partial installation. Now remove the following folders and files
Configuration folders
State folders
This has to be done manually as the installers do not yet support removing partial installations. See on the DC/OS JIRA.
The Components view in the DC/OS Dashboard shows the health of all Components. If some of them are unhealthy run systemctl status 'dcos-service-name'
for the unhealthy service or use journalctl -u
to check the logs. Another problem is that machines cannot communicate with eachother because of firewall rules. Add a log rule to your firewall if you don't already have one.
When the installation is complete you can visit `http://master1` and login with your social account to start using DC/OS. The advanced installer gives the best insight in a DC/OS installation and can be a building block for further automation.
* DC/OS 1.9 advanced installer documentation
* Unix StackExchange 'How to stop all systemd units belonging to the same target?'
Thanks for reading! Questions? Comment on the blog or talk us at @ContainerSoluti or to myself at @Frank_Scholten.