A question I often hear asked is “how many microservices should I have?" or “how big should a microservice be?” So, what is better, 10 microservices or 300?
If the main motivation for Cloud Native is deploying code faster then presumably the smaller the microservice the better. Small services are individually easier to understand, write, deploy, and debug.
Smaller microservices means you’ll have lots. But surely more is better?
Small microservices are better when it comes to fast and safe deployment, but what about physical issues? Sending messages between machines is maybe 100 times slower than passing internal messages. Monolithic, internal communication is efficient. Message passing between microservices is slower and more services means more messages.
A complex, distributed system of lots of microservices also has counter-intuitive failure modes. Smaller numbers are easier for everyone to grok. Have we got the tools and processes to manage a complicated system that no one can hold in their head?
Maybe less is more?
Somewhat visionary Cloud Native experts are contemplating not just 300 microservices but 3000 or even 30,000. Serverless platforms like AWS Lambda could go there. There’s a cost for proliferation in latency and bandwidth but some consider that a price worth paying for faster deployment.
However, the problem with very high microservice counts isn’t merely latency and expense. In order to support thousands of microservices lots of investment is required in engineer education and in standardisation of service behaviour in areas like network communication. Some expert enterprises have been doing this for years but the rest of us haven’t even started.
Thousands of daily deploys also means aggressively delegating decisions on functionality. Technically and organisationally this is a revolution.
Our judgment is distributed systems are hard and there’s lots to learn. You can buy expertise but there aren’t loads of distributed experts out there yet. Even if you find someone with bags of experience it might be in an architecture that doesn’t match your needs. They might build something totally unsuited to your business.
The upshot is your team’s going to have to do loads of on-the-job learning. Start small with a modest number of microservices. Take small steps. A common model is one microservice per team and that’s not a bad way to start. You get the benefit of deployments that don’t cross team boundaries but it restricts proliferation until you’ve got your heads round it. As you build field expertise you can move to a more advanced distributed architecture with more microservices. I like the model of gradually breaking down services further as needed to avoid development conflicts.
The benefit of small microservices is they're specialised and decoupled, which leads to faster deployment. However, there’s also cost in the difficulty of managing a complex distributed system, and many diverse stacks in production. Diversity is not without issues.
The big players mitigate this complexity by accepting some operational constraints and creating commonality across their microservices. Netflix use their Hystrix as a common connectivity library for their microservices. Linkerd from Buoyant serves a similar purpose of providing commonality, as does Istio from Google and Lyft. Some companies who used containerisation to remove all environmental constraints from developers have begun re-introducing recommended configurations to avoid fixing the same problem in 20 different stacks.
Our judgement is this is perfectly sensible. Help your developers use common operational tools where there’s benefit from consistency. Useful constraints free us from dull interop debugging.
Moving fast means quickly assessing if the new world is better than the old one. Devs must know what success looks like for a code deploy: better conversions, lower hosting costs or faster response times, for example?
Ideally, all key metrics would be automatically monitored for every deploy. Any change may have an unforeseen negative consequence (faster response times but lower conversions). Or an unexpected positive one (it fails to cut hosting costs but does improve conversion). We need to spot either.
If checking is manual that becomes the bottleneck in your fast process. So, assessing success is another thing that eventually needs to be encoded. At the moment, however, there’s no winning product to do metric monitoring or A/B testing. Most of the folk we talk to are still developing their own tools.
If you want feature velocity then a valuable engineer is one who knows your product and users and makes good judgments about changes.
At the extreme end, devs might make changes based only on very high level directions (CTO of the UK's Skyscanner, Bryan Dove, calls this “radical autonomy”). Training existing staff is particularly important in this fast-iteration world. If you go for radical autonomy then devs will be making decisions and acting on them. They’ll need to grok your business as well as your tech.
Folk can be bought or hired with skills in a particular tool, but you may need to change that tool. Your hard skills requirements will alter. You'll need engineers with the soft skills that support getting new hard skills (people who can listen, learn and make their own judgments). In the Cloud Native world, thinking skills are more important than familiarity with any one tool or language. You need to feel new tools can be adopted as your situation evolves.
Serverless aka Function-as-a-Service (like AWS Lambda or Google Cloud Functions or Azure Functions) sounds like the ultimate destiny of a stateless microservice? If a microservice doesn’t need to talk directly with a local database (it's stateless) then it could be implemented as a function-as-a-service.
So why not just do that and let someone else worry about server scaling, backups, upgrades, patches and monitoring? You’d still need to use stateful products like queues or databases for handling your data but they too could be managed services provided by your cloud provider. Then you’d have no servers to worry about. This world has a high degree of lock-in (con) but little or no ops work (pro).
That is pretty attractive. Most folk are trying to reduce their ops work. Serverless plus managed stateful services could do that.
However, it’s still early days for Functions-as-a-Service. At the moment, I suspect there's a significant issue with this managed world, which is the lack of strong tooling. In the same way that western civilisation rests on the dull bedrock of effective sanitation, modern software development depends on the hygiene factors of code management, monitoring and deployment tools. With Serverless you’ll still need the plumbing of automated testing and delivery. Tools will appear for Serverless environments but I suspect there isn’t a winning toolchain yet to save us from death by a thousand code snippets.
Modern team-based software development needs plumbing. Most folk will have to create their own right now for Function-as-a-Services so it’s probably still for creative pioneers.
Read more about our work in The Cloud Native Attitude.