Build, WTF Is Cloud Native

How Monzo’s Opinionated Platform and Tools Support their Developer Experience

An interesting aspect of a good Developer Experience (DevEx) is that as more and more features are added to a given language, platform, or tool, it becomes harder and harder to maintain. The ubiquity of something like Kubernetes (to pick just one example) seemingly inevitably leads to proliferation of features and a mushrooming of complexity. Likewise, a popular programming language such as Java or Go needs to be evolved with tremendous care to avoid introducing unpleasant rough edges in the interactions between features, or increasing the surface area of the language to the point where it becomes harder to learn and to use.

To put this another way, provided your use case fits within the constraints, highly opinionated platforms such as Ruby on Rails, Spring Boot, or Heroku, can offer a better developer experience and corresponding higher productivity than those that are less opinionated. As a result, as RedMonk co-founder James Governor pithily puts it:

Monzo is a fully licensed and regulated bank in the UK, serving over 6 million customers. It runs on the cloud, and has no branches. From a technology perspective, the bank’s IT systems are built from over 2000 microservices written primarily in Go. Many of these microservices are small and context bound to a specific function. These services are responsible for the entire operation of the bank, covering everything from connecting to the various payment networks, moving money, maintaining a ledger, fighting fraud and financial crime, providing customer support, providing APIs to make money management easier, and more.

The whole system is built and maintained by a relatively small team of around 200 engineers. Updates are shipped hundreds of times a day, and the firm has found that the use of effective tooling, automation, and an opinionated DevEx platform has shortened the time to delivery and increased the number of changes that can be safely released into production. All of this strikes the right balance between meeting all of its regulatory requirements, whilst also allowing engineers to ship early and often.

Monzo’s paved road

Core to achieving their impressive economy of scale is the idea of the paved road (also sometimes called the golden path), an idea I first heard described by Netflix’s Diane Marsh at OSCON in 2017. Marsh’s definition was that the paved road was “a concept, formalising the expectations between the centralised teams and our engineering customers. It's an idea that if we have and publish a well-integrated set of tools and machinery, then people can focus on their own domain. They can do the things they came to Netflix to do”.

Suhail Patel, Senior Staff Engineer, told WTF that At Monzo “there is a defined paved road when you start work on a microservice, and as a result the microservices all look very uniform—You can go and contribute to another team's service because the structure is entirely familiar to you”.

Patel compared their approach to that taken by the aforementioned Ruby On Rails. A lot of the code is pre-generated, with hooks into the ORM layer, queueing system, and the like. Taking this approach means that observability can be pre-built into the microservices with logs and tracing enabled by default. “Services are emitting all of these metrics right from the time that their binary starts, and engineers haven't had to write these as custom metrics. So observability isn't an afterthought for the core system”, Patel told us.

Having all the microservices emitting standard metrics makes building generic alerting rules straightforward, and thus the monitoring system can immediately be looking out for anomalies.

Monzo’s Services Dashboard for observability across their services estateMonzo’s Services Dashboard for observability across their services estate

This uniformity has other virtues because updates to core libraries and runtimes, such as security patches or new versions of Go, can be rapidly and safely rolled out. It also makes auditing and patch management more straightforward because there is a uniformity of practice:”We don't have to go ‘For this particular system’ it works like this, for this other system it works like that, for this other system it works in a third way”, Patel said.

Because of this, Monzo’s emphasis is a little different from that of, say Spotify or Netflix, in that whilst you can still deviate from the paved road if you really want to, doing so at Monzo has a very high barrier to entry.

Developer tooling


Alongside the platform, the team at Monzo have made considerable investment in supporting tooling, including automation scripts, deployment pipelines, and IDE plugins. This is also an area where all developers are able to contribute, and bring in new innovations.

“Some of these tools are built as part of scheduled work within a squad that stands to benefit from the time investment, such as Security libraries or our back-office company hub. Others are built in an ad-hoc fashion, with engineers across squads getting together and working on improving their quality of life and sharing with others”, Patel told us.

Backstage

When you have such a large number of microservices, it becomes essential to centralise information about what services exist, what functionality they implement, what team owns them, how critical they are, service dependencies, and even cluster services as business-specific systems. Monzo had standardised on a single repository for all their services, but it still needed a tool to provide structured metadata encoding all this information

To solve this need, they adopted Spotify’s Backstage for the service catalogue, starting the project around September, 2020. “Moving to Backstage was a really good opportunity for us to unify all of the metadata for all of our services, make sure that information on code owners and everything is all fully up to date and services have good and searchable READMEs and things like that”, Patel told WTF.

The Monzo team have also taken advantage of Backstage’s extensibility to write a number of plugins such as a UI for their system to measure software excellence, and another to show deployment history and config change events, as well as useful links to dashboards and escalation points.

Static Analysis

The decision to standardize on Go as the main programming language has also paid dividends, according to Patel. “Go's surface area is quite small in terms of the language, and whilst it's getting more complicated as they add generics and things like that, they've been very thoughtful about how they add these things to keep that simplicity. It is also a very predictable language”, he said.

That predictability makes it comparatively easy to add support, for example, for static analysis tooling, which Mozno has done by leveraging Semgrep, an open source tool that allows you to write static analysis checks for multiple languages without needing a deep understanding of the underlying language and compiler implementation details.

Shipper

Once an engineer is ready to ship code, they use another in-house tool, called Shipper. Shipper is again opinionated, and is responsible for building the artefacts and checking whether the service has passed required checks and is ready for deployment. Once these checks are complete, it handles the management and roll-out via Kubernetes.

Shipper deploying a serviceShipper deploying a service

Shipper brings three key advantages:

  • Abstraction of Kubernetes
  • Auditability and policy enforcement
  • Speed to production

“For me, and for the teams that I work with, a mark of success is you can have your entire Monzo career in a product team and never have to interact with Kubernetes or write any YAML at all”, Patel told us.

From a company perspective, the deployment tooling mandates multiple layers of checks and safeguards that might, in a more traditional bank, be handled manually, as Patel explained:

“When we go through technology audits the first thing they look out for is, ‘Is your software being tested? Do you have a testing environment? Is it running through automated tests and checks? How are you making sure that your code is robust and of good quality’?

And this is something that we can encode as policies within our tooling within Shipper: ‘Has this been tested in our staging environment? Has the developer made sure that all the required checks on GitHub pass, gotten approval from the owning team, merged into the mainline’? We're going to add all of these as policies within our tooling”.

Combined with the ability to roll-back rapidly in the event of an issue being found, this is the key to the how Monzo is able to deploy to production so frequently, despite working in a highly regulated sector:

“All of these checks run within a minute, so no change-approval board is needed, no human is involved, and the process isn’t subjective. This is very much an objective process that runs and gives you a pass-fail result. And then, if it's a pass result, it will automatically go and build and deploy it for production release.

And that is a really powerful mechanism, because I'd be lying if I said that we didn't release code with defects. I don't think, from my experience and all the literature I’ve read, that a change approval board fundamentally prevents defects either. So for us, we've made it very, very easy, for example, to roll back using the same process. And because we can roll back again within a couple of minutes, the longevity of an incident occurring is minimised”.

Learning from incidents

The ultimate goal of Shipper is to allow engineers to release with confidence. But alongside the tooling, there is also an important cultural aspect here. Monzo has, especially for a bank, a strong culture around learning from incidents.

“At Monzo the whole act of incidents is very routine”. Patel told us. The bank has built a lot of tooling for managing incidents, and uses blameless postmortems. It also assumes that an incident “doesn't have a root cause; rather it has contributing factors. And those contributing factors could be decisions that were made many, many moons ago by people who no longer work at Monzo”.

The engineering team also works hard to ensure that “Folks can speak up and not feel like they're under pressure or threat, no matter where they are from, no matter where they are on the spectrum. Everyone can speak up and everyone can have a voice, and everyone can take the opportunity to learn and see what happens. A key part of that is also the transparency culture that we have at Monzo”.

Alongside learning from their own incidents, Patel told us, they actively talk about incidents that happen in other institutions and companies, and try to learn from those as well.

The right tool for the job

Patel suggested that the tools you decide to build for developers should be focused on automating the practices of your organisation. Internal tools like Shipper allow Monzo to fully automate what would be a complex Change Management Process in a highly regulated banking environment. “By standardising on a small set of technology choices, and continuously improving these tools and abstractions, we enable engineers to focus on the business problem at hand—rather than on the underlying infrastructure”, Patel said.

When building tools for your developers to use, spend time thinking about making those tools as accessible as possible for everyone. Treat them as you would any other product—to drive adoption you’ll need to make sure you are advocating and promoting tools internally, talking to your customers to ensure that you are building things that are genuinely useful, and monitoring usage through metrics and tools:

“We actually see this as a product that we provide to Monzo. We do a lot of user research interviews, embedding within the squads, just being a fly on the wall and understanding how developers are using this tooling. And across the board, it's not just tenured engineers. You want to see what the onboarding process is like. Is an engineer able to get up to speed within the first four weeks of joining Monzo and able to start writing and shipping production ready code? So a lot of it is around user research and seeing how we can make things better ”.

As we’ve seen, you’ll also need to reflect carefully on your company’s culture to try and ensure the things you are building naturally fit.

hiring.png

Comments
Leave your Comment