The media is full of discussions of microservices and cloud-based infrastructure and for good reason. Encapsulating tasks into higher and higher abstractions is being shown by various large companies to provide a competitive advantage in any number of domains. Take the most documented example of Netflix or more recent contributions from Spotify. Both highly disruptive customer facing examples of microservices-gone-right.
Legacy IT systems are still running consolidated virtual machines (VMs) with little to no democratic process over how resources are utilised. This fundamental belief is and will prevent organisations from developing products and services that are being provided by innovators and will ultimately lead to their demise. The solution is to work with technologies that are able to free a developer from antiquated bureaucracy so that they are able to spend time on developing products and services, not work-arounds for legacy systems.
This article is about Apache Mesos, and taking advantage of the abstractions that Mesos provides. The first part will introduce the reader to Apache Mesos and then discuss how and why to use containers within such an environment. The final section discusses some of the current issues with Mesos and microservices more generally, and what could be done to solve them.
Introduction
The goal of Apache Mesos, here on referred to as Mesos, is to provide an abstraction layer for compute resources (CPU, RAM, Disk, Ports, etc.). This allows applications and developers to request arbitrary units of compute power without an IT provider having to worry about how this translates to bare-metal or VMs. Mesos democratises computational resources.
The Mesos process runs on all machines and one machine is defined as a master, which has the responsibility of managing the cluster. All other machines are slaves which work on a particular task using a set number of resources. Each slave utilises linux’s cgroups to ensure that processes are isolated and are only allowed to consume a set amount of resources. The combination of the master and a set of slaves creates a Mesos cluster.
It is possible to run applications directly on a Mesos cluster. Any application that can run on a Linux distribution can run on Mesos. Commonly, applications are started by an orchestration layer, like Marathon or Kubernetes, which monitor the state of the application to provide lifecycle management and resiliency. However, this article proposes using a Mesos framework to develop applications.
Mesos Frameworks
Developing a cluster-agnostic application will be at the forefront of the minds of developers. But with this restriction, it is common that the advantages of using a specific cluster technology will be lost. Developing for a Mesos framework (a framework, from here on) exposes a host of extra features that are not available, and sometimes not possible, when writing your own framework to overlay Mesos.
To summarise, a framework provides an application with an API to monitor the state of the tasks that the application is responsible for. Based upon the state, the application can perform some action. The management is performed by an implementation of the Scheduler interface, and the tasks are performed by implementations of the Executor interface. Mesos will communicate with the Scheduler via the API to inform it of the current state of the cluster and the state of the tasks that have been started. Using this API, it is possible to encode a level of scheduling that would not be possible using a standard orchestration tool like Marathon. (http://mesos.apache.org/documentation/latest/app-framework-development-guide/)
The key use case for a framework is to provide enhanced resilience against failure. For example, take HDFS, which requires a NameNode, DataNode and JournalNode. These need to be started in a certain order and the cluster topology is complex. For HA, two NameNodes and at least three JournalNodes on separate machines are required. Another example could be for a database like Elasticsearch, Riak, Neo4j, Cassandra, etc. For best resiliency, it would be optimal to place only a single instance of a data nodes on each physical machine.
Frameworks also provide other features like authorisation and roles, to allow applications to only use certain resources and reserve resources, and the ability to install applications on Mesos in a single request.
Another reason is to enable application-specific scaling. For example, for a database, scaling the data nodes, not controller nodes, may be important. Frameworks also enable the prospect of auto-scaling, depending on a series of sensors that are only known to Mesos (e.g. RAM/Disk usage).
Finally cluster specific business logic, like when and how backups are performed, or upgrading an installation can only occur seamlessly using the information provided by Mesos. For example, it may not be wise to perform a snapshot backup when tasks are under heavy load, it might be better to scale horizontally first, to add capacity, then backup, then scale back. This complexity would be difficult, if not impossible to capture in a traditional microservices environment.
Containerising Mesos Frameworks
Containerisation is a concept at the heart of Mesos and Linux’s cgroups. Containers provide a clean, packaged, decoupled, cohesive abstraction that yields many positives. The only drawback, along with microservices more generally, is that it does take some time to iterate towards an optimal development strategy. For example, it takes several minutes to build a docker image and many more to deploy it to a cluster for testing; this is too long. However, the positives are many.
When packaging applications, there are often many dependencies. Even then, the application usually requires cross-compiling or differing install instructions. Worse, it may be required to write distribution specific code. Using a container as a packaging tool removes all of these problems; the only requirement is an installation of a container runtime like Docker. Also, since Marathon or Mesos may start an application on any slave, all slaves require access to the binary. It is fragile and cumbersome to copy the binary to all machines, a simple solution is to host the container in a repository, that all slaves have access to. The public Docker Hub is a very simple solution.
Because the containers are stateless by default, it makes it very easy to kill and restart an application. For example, if a slave died, it is very easy to re-pull and restart a container. This simplicity will be crucial when it comes to scaling. This also adds to the resiliency of the system, since the framework can monitor for failures and restart tasks immediately.
Precise system tests should exercise the interaction between discrete components. Using containers enables sophisticated system-wide automated testing which can significantly enhance the chances of catching bugs that affect core functionality. Example system tests might include whether the containers are able to form a cluster and to test the resiliency of the application. When coupled with a continuous integration tool like Jenkins, system tests can be performed on every commit or PR, before pushing to the releasable branch. This significantly improves the confidence in the software.
System Tests with Mini-Mesos
After producing the application binaries, the next step is to upload them to a Mesos cluster and perform testing. The very aim of Mesos, an abstraction of multiple servers, makes it difficult and time consuming to test. Developers should not waste significant amounts of time and effort on deploying and performing a simple test. With this in mind, it was clear that the Mesos community was crying out for a simple, in-memory Mesos cluster for testing.
Mini-mesos (http://minimesos.org/) is a project that encapsulates a Mesos cluster in a single docker image. Inside the container there is a Master and several Slave processes. Wrapped around the container is a Java API to interact with the cluster. Tests are then written as JUnit tests and can be run as standard JUnit tests. Since these tests exercise a large system, it is recommended that these tests are ran separately from traditional unit tests. An example of using mini-mesos in a simple Mesos framework can be found here: https://github.com/ContainerSolutions/mesos-hello-world.
Scaling
When an application is under demand, it must scale to cope with the load. When the application is over-provisioned, it must scale back to optimise resource costs (by removing an on-demand instance, or provisioning a server to another framework). Scaling comes in two forms: horizontal and vertical. The direction refers to a hierarchical graph of the tasks. To scale horizontally means that more instances are added. To scale vertically means that the amount of resources provisioned to each instance is increased.
Scaling is provided at a framework level, not at a Mesos level. Therefore, it is necessary to either implement a bespoke scaling mechanism, or integrate with another scaling framework like Apache Brooklyn.
The ease with which it is possible to scale horizontally depends on the application. A simple, stateless web-app is very easy to scale. It depends only on computational resources and a record in a load balancer. A database, on the other hand, is much more difficult to scale horizontally. First, the application must decide who has responsibility for synchronisation of the data; is that the databases responsibility, or some function of a distributed file system? This decision will likely be made based upon the choice of database. Next, the re-synchronisation and sharding of the database uses limited network resources. The framework must monitor the state of the database cluster and its use of the network bandwidth; the framework shouldn’t scale without a full understanding of the consequences. Finally, database settings relating to the cluster may not match the current state of the cluster. For example, there may be some good reason to alter the number of replicas.
Scaling vertically may be required in certain instances. It is difficult to imagine a situation where it would be valuable to scale a simple application, but it may be useful to scale more complex applications like databases. For example, Elasticsearch performance is closely linked to the amount of data and the amount of RAM. At some point it may be necessary to increase the amount of RAM given to an Elasticsearch instance, but it doesn’t make much sense until the volume of data begins to impact performance.
Future challenges
Although Mesos fulfils an important niche in cluster computing, there are still some important issues to address.
Automated scaling is a complex topic, especially when it has knock-on effects like the re-synchronisation of database data. It requires the abstraction of metrics that pose as source data for a scaling mechanism and an API to actually perform the scaling. Monitoring the state of the scaling process may also be important on an application level. For example, it may not be possible to write to parts of the cluster whilst a scale operation is in progress.
It is likely that Mesos itself will not and should not tackle this problem. Mesos’ core responsibility is to manage the abstraction of resources and the state of tasks. Because scaling is complex and application dependent, it does not make sense to place the burden on Mesos. Instead, other frameworks will provide this capability. Either a new scaling framework, or a port of a current one.
Despite being an important resource, Mesos does not concern itself with addressing. There is no simple networking component in Mesos. And the efforts that have been made are not integrated with Mesos (see mesos-dns and Consul on Mesos). Mesos already manages the state of tasks, it is not unfeasible to think that this could be extended to manage network state. For example, the only time where it is possible to obtain the IP address of a running task is when an offer is made by mesos. If the scheduler fails and restarts, there is no API to interrogate each task for its IP address. This results in more boilerplate state persistence code for the frameworks. Now is the time for a fully integrated Mesos networking module that allows NAT functionality for frameworks, or at least exposes an API to interrogate tasks directly.
Testing containerised cluster technologies continues to be a burden on the developer. A significant amount of time is spent developing software to test the core software. (Aside: I estimate that more than half of our time developing the Elasticsearch framework was spent on learning and writing software to perform tests) The community will provide solutions, but the vendors needs to make testing a first class citizen. For example, official Docker and Mesos testing frameworks do not exist.
One specific complaint about Mesos is the lack of commitment to developer support. Modern technologies like Docker or any of the Hashicorp products show that good documentation, learning tools and simple getting started guides and tools are crucial for uptake by the community. Mesos has none of these. The documentation is so sparse and poorly organised that it is common for developers to have to pour through the Mesos source code. For example, the protocol buffer definition file for Mesos must be one of the most visited pages on the internet. Finally, it should be far easier to use templates, an SDK or a comprehensive getting started guide to allow developers to quickly get up to speed.
Summary
To summarise this discussion, the goal was to abstract out the worry of resources using Apache Mesos. However it has been shown that for some applications it is still necessary to access some low level information to optimally schedule tasks and services. Apache Mesos Frameworks present the beginnings to a solution for this problem for the following reasons:
- Testability: By representing the application as a framework, it becomes possible to test the system as a whole. Previously it has been difficult to test clustered systems because of lack of standards and requirement for a cluster.
- Scalability: Many applications can benefit from accessing more detailed information about the state of a task. Horizontal and vertical scaling is now possible on a cluster of machines.
- Resiliency: If a task is lost due to a failure or a crash, Mesos is able to reliably inform the framework of the inconsistent state and the framework can take action as necessary. Details about the taks allow the optimal placement of tasks.
- Democracy: By using Mesos authorisation and roles, it is possible to reserve and control how resources are used. Priorities could be attached to allow important jobs priority access to limited resources. The cluster would be under developer control and the infrastructure can be left to IT or removed entirely.