In this blog post, I describe how we created DeployDocs to improve the continuous integration of the micro-services in the Socks Shop and the platforms it can be deployed to.
Weave's Socks Shop is a reference micro-service application, to which Container Solutions has contributed quite a bit.
The shop can be deployed on a number of platforms, including your local machine via Docker Compose, or the cloud via Kubernetes, Mesos, Docker Swarm and Nomad. I am sure that more platforms will follow over time.
My colleague Alex has written a blog post, which defined different levels of testing of a single micro-service. However, the deployment scripts of the supported services are not automatically tested.
We do have a Kubernetes cluster that we use as a staging environment. Any service running on top of this cluster is updated as soon as PR is merged in the master branch of the respective git repository. We perform this update of the staging environment from within the Travis Builder. After all tests have passed and the container has been created and pushed to the Docker Hub, Travis runs a deploy step. See the .travis.yml file in the orders micro-service, for an example of how this works. In this deploy step, Travis runs the
deploy.sh script via SSH on a bastion host which has access to the Kubernetes cluster.
I started to repair the documentation and code to deploy the shop to some platform. This involved copy & pasting the steps in the documentation in a terminal to test whether or not the deployment documentation and code was up to date and working.
I realized that I could automate the laborious and error-prone process by annotating the documentation, and letting a program perform these steps. I implemented exactly this, and after thinking about how to name this, we call them DeployDocs.
Running all DeployDoc tests after any change in one of the git repositories is a costly affair; each of the deployment targets will create a cluster of virtual machines in EC2.
To reduce the cost, but still have continuous integration, we created a helper program that is run once a day on Travis CI via Cron Jobs This program checks if any of the micro-services and/or deployment instructions has changed since the last Cron Job started, and triggers a build via the request API. In Travis, each Build can consist of multiple Jobs, which are run in parallel. The program creates jobs for each of the deployment platform that has to be tested.
Failures are reported to a Slack channel via a Travis add-on.
In the previous section I explained the role of the DeployDoc's in the CI process, and how the CI process is implemented. I'll explain how a single DeployDoc works next.
I have defined four phases:
destroy-infrastructure. The documentation writer can add multiple steps to each phase. These steps are executed in the order that they are defined, within each phase. The order of the phases is hard-coded.
pre-install phase, software needed to do the deployment is installed. The
destroy-infrastructure phases are just what their name describes. The
destroy-infrastructure phase is always run, unless the
pre-install phase failed to complete, since no cloud infrastructure has been created until that point.
There is an additional special phase
require-env, which makes it possible to inject environmental variables that are available to all steps. These are checked to be present before any of the phases are executed.
Given the constraint that the documentation should be generated via GitHub Pages, I looked at different options to add comments to Markdown files. I ended up choosing HTML comments, because they impose the least syntactical distraction.
There are a few different ways to annotate a step, see the documentation for more details
Jekyll, which is the system that is used by GitHub Pages, required that the top of each Markdown file contains a YAML block. A Markdown file is considered to be a DeployDoc, if it has a
deployDoc: true line in the YAML block.
--- layout: default deployDoc: true --- # Deploy the socks shop with some-tool ## Preparation Before we continue, we should install some-tool and some-other-tool <!--deploy-test-start pre-install --> apt-get install -yq some-tool some-other-tool <!-- deploy-test-end --> ## Get credentials Get your credentials for some-service, and export them in your environment export SOME_TOOL_VAR1=yadda export SOME_TOOL_VAR2=yadda <!-- deploy-test require-env SOME_TOOL_VAR1 SOME_TOOL_VAR2 --> ## Creating the cluster Now we can create the cluster with the following commands. <!-- deploy-test-start create-infrastructure -->; some-tool provision-infrastructure some-tool create-a-cluster some-other-tool deploy-application-to-test <!-- deploy-test-end -->; ## Experiment a bit Congratulation! You have deployed your application! Check out the output of this command to find the IP address of the app. some-other-tool get IP address of application <!-- deploy-start-test run-tests # This step is hidden in the rendered HTML # Test the deployment using curl ip = `some-other-tool get IP address of application` curl ip --> ## Tearing down After you are done, use the following instructions to tear down the cluster. <!-- deploy-start-test destroy-infrastructure -->; some-other-tool remove-application some-tool destroy-infrastructure <!-- deploy-test-end !-->;
To conclude, we have implemented Continuous Integration for the deployment documentation and scripts. Now we just have to fix document and/or repair the actual deployment instructions.
PS The possibilities of re-arranging code that has to be executed is limited on purpose. I have some experience with using Literate Programming tools, and allowing code to be recombined in arbitrary ways requires a lot of discipline.