At Container Solutions, one of our ‘bread and butter’ engagements is to help a software infrastructure team move from more traditional software delivery paradigms to Cloud Native ones.
While the CNCF landscape is still as bewildering as ever, over time we have noticed that there is a pattern forming in our solutions which comes up repeatedly, and which doesn’t seem to have been named yet. For some reason software developers gave us the LAMP stack, but infra engineers may be even worse at branding than them.
So we call it the ‘KATE stack’.
The KATE stack is composed of four layers, and the initials of the most commonly-used tools (in our experience) in these layers makes ‘KATE’.
It makes up a more or less standardised GitOps stack that can be applied to various infrastructure projects.
Using this stack (sometimes replacing layers with other tools), we’ve repeatedly reduced the time taken for different teams to get development and production environments from months to minutes, leveraging common design patterns that work in the varied contexts and constraints our customers operate under.
Let’s take each layer in turn.
This layer is the orchestration layer. Kubernetes is now the dominant container orchestration platform across the industry. Mesos has not had an update since 2020, and is almost impossible to find support for.
Kubernetes provides a declarative framework for managing the infrastructure on which your applications will run, meaning that your application deployment configuration is fully automated and auditable, a key requirement for a GitOps setup.
This GitOps approach can be taken further, to declare Kubernetes clusters themselves with declarative configuration stored as code. How exactly to do this is still an emergent area. It can be achieved with Terraform providers, but there is a long-standing project to standardise this with Cluster API. In other contexts, you can provision clusters in ways standard to that context. For example, in GCP, clusters can be created using Connect Gateway and ArgoCD.
This is the deployment layer, which takes your applications’ configurations and ensures that the current deployment matches what is declaratively defined in source.
ArgoCD is emerging as the de facto standard in the field. The main reason for that appears to be that the front end is richer than the alternatives, but FluxCD (created by the coiners of the GitOps term) is still a compelling alternative.
This layer takes care of the definition of the infrastructure the software needs to run in. Terraform compares the state of the infrastructure as defined by the Terraform code to the state of the system as it exists in the ‘real world’, and changes the real world if it is out of alignment with what is defined.
This can lead to unintended consequences if your application design is not fully Cloud Native. For example, if you change the machine type of an AWS EC2 instance in Terraform from (say) t2.small to t2.large, then Terraform will likely destroy the original instance and create a new one, destroying with it the state of the original instance.
Again, there are alternatives here, such as Pulumi, but Terraform is still the dominant player, and one our customers are most comfortable with using.
The challenge here is that secrets obviously shouldn’t be stored in plain text in your code. Which means that they must be retrieved from some trusted store and dynamically made available to the deployed applications within Kubernetes.
We may be biassed (as we are maintainers of this project), but External Secrets Operator (ESO) is our go-to choice for managing secrets within Kubernetes. It integrates with various secret stores (like AWS Secrets Manager, HashiCorp Vault, Google Secrets Manager, Azure Key Vault, IBM Cloud Secrets Manager, CyberArk Conjur and so on) in a consistent way, allowing you to configure the access to the secret store once for all your needed secrets. And because your cluster is deployed using GitOps, it’s fully automated, and can be accessed only in a break-glass situation, making security demands easier to fulfil.
Together, these tools fulfil the requirement of a robust GitOps implementation:
Other important components you then might want to consider as part of your wider architecture could be:
Most of these are becoming commoditized across the industry, and the particular tools are likely to be chosen based on which platform your infrastructure runs on. For example, if you use GitHub, you are more likely to use GitHub to store your Docker images as artifacts. If you are, or were recently, on-prem then you are likely to use Artifactory, JFrog or similar. Similar logic can help to dictate which secret store you use, as well as code storage.
If the KATE stack is put together in the right way, you can build a secure, automated, auditable, reproducible, and programmable set of environments for you to build, test and operate your software. This results in huge cost-savings as processes resulting in great toil, and costing time and money, are automated out of visibility. We have various references for our work in this area, so get in touch if you want to know more.