Once upon a time, one of the nastiest business disasters a CTO could imagine was a fire destroying their data centre. Then along came the Cloud.
The Cloud takes care of that worst case scenario. The data centre fire, along with many other disasters, is rendered considerably less scary by cloud computing and distributed systems. Even a direct nuclear hit on London, New York or San Francisco, inconvenient as that might be, should hopefully still not bring your systems down.
However, even as the Cloud solves those old problems Cloud Native introduces a brand new threat: your very existence.
Cloud computing aka “the Cloud” is on-demand infrastructure (hardware/servers) PLUS storage, databases, queues and all kinds of other online managed services; Cloud Native (CN) is an approach to system design that is optimized for it.
Going Cloud Native is expensive and difficult, but it’s an existential adaptation. Companies that don’t see a strong reason to make this move will simply stay safely where they are... and eventually be superceded.
BTW A “lift and shift” of existing systems onto Cloud servers is not Cloud Native. It’s often a first step but it is not the whole journey. Companies that stop there have not understood what the Cloud offers (services!) and don’t fully benefit from it.
For our blog we’ve interviewed loads of enterprises about moving to the Cloud and their experiences adopting Cloud-optimized architectures like Microservices. In every case we heard about how hard the transition was. It was painful, complex, and with endless setbacks. It took years, required a fundamental cultural change to the organisation and it will probably never be fully complete. So why did they start? And, more importantly, why do they continue?
I contend that the existence of Cloud Native ushers in new menaces more scary to a corporate board than a downed data centre. But it is also the only way to battle those menaces.
Ten years ago I seldom saw an enterprise Disaster Recovery (DR) plan in which an engineer didn’t estimate a blast radius for London. Usually, their guess wasn’t based on their post graduate studies in geopolitics - they just found a secondary data centre they liked and conveniently set the disaster zone a few miles short of it. This demonstrates how lazy Ops used to be - they couldn’t even be bothered to get a simple doctorate in military affairs to do a thorough job of their strategic planning.
Fortunately, accurately predicting the destruction of major cities is just one of the tasks the Cloud has relieved of Ops. Planning region resilience in the face of environmental, civil or military disaster is something Cloud providers can afford to hire experts for. One of the reasons to move infrastructure to the Cloud is to handle DR more effectively than a normal operations team could. Does that mean with the Cloud we never need to worry about an existential threat ever again? Unfortunately not.
What's the worst operational threat you can think of?
Nope, not an existential threat. Even a limited “lift and shift” of an existing system onto a multi-region cloud platform, although not Cloud Native, should sort that out.
Still not it. In May 2017, the UK’s largest airline, British Airways, cancelled most of their services, hundreds of flights, for days due to a simple data center power failure. Was it terminal for them? No.
In September 2018, 50m Facebook accounts were successfully breached opening those users up to fraud. Did everyone close their accounts and move elsewhere? No they didn’t.
There are lots of obviously tech-related potential disasters: physical outages, catastrophic bugs or mistakes, security breaches. These are real problems that certainly need to be addressed but they’re not necessarily fatal. Both the companies mentioned earlier survived relatively unscathed.
I suspect Classic DR risks alone are not existential enough to motivate enterprises to successfully adopt complex Cloud Native architectures. The perils need to be bigger than that. And they are.
One interesting early CN adopter was the The Financial Times, who’ve been moving into the Cloud and adopting containers, microservices, orchestrators and Cloud services for well over five years. When they started, the FT was a leader in what many suspected was a dying sector: print newspapers. If they didn’t evolve, the FT knew they might disappear with their whole industry. No-one knew what would come next for news so they took a different bet. They wagered they couldn’t predict the future but they could massively improve their ability to react to it.
Today, the FT has used Cloud Native approaches to increase feature velocity (the time it takes to get an idea into production) by a factor of 500. They can now release new functionality in just 15 minutes when it used to take them six months. Their team built one of the earliest firewalled online news sites and were the first mainstream UK newspaper to report earning more from digital subscriptions than print sales.
Imagine you are a European retailer. In 2010 Amazon introduce a new sale event from the US - Black Friday - and your customers change their expectations of you overnight. Last decade I worked in retail and it was a slow and meticulous industry. Products, discounts and system capacity were carefully planned far in advance, which relied on accurately predicting customer demand many months ahead.
Retail is just one industry that’s being fundamentally changed by a huge player with arguably the most sophisticated IT infrastructure on the planet and the ability to move fast and radically change consumer expectations. Amazon announced 497 new services and features for their Web Services platform in one quarter of 2018 alone. To compete in any of Amazon’s markets, companies have to get functionality out far quicker than ever before.
The current threats don’t only come from huge players like Amazon or Google. It’s far faster and easier to build a radical, scalable product from scratch in the Cloud using Cloud Native techniques than to build on prem. In the UK, the challenger bank Starling created their entire banking system on AWS inside a year, and like the FT they can now release new functionality in minutes. The threat from such fast-moving cloud startups is one all incumbent industries face.
In a world that can change in minutes, the next existential business threat could literally be anything. The FT has been successful with optimizing reaction speed rather than powers of prediction. That is what a Cloud Native approach is about.
The modern threats described above are more subtle and far more challenging than just handling outages. Technology is vital to addressing these threats but it is only part of the solution.
Changing tech is necessary, but not sufficient.
“We do these things not because they are easy, but because
we thought they were going to be easy”
- The Programmers’ Credo
Many tech teams embark on a Cloud Native makeover because they think it’s going to be an easy win. Adopting many of the tools can indeed be straightforward to start with but it’s an incredibly tough project to complete successfully. It’s also easy to think that adopting Cloud Native is a technical move to keep your engineers from quitting in a huff to join a trendier company. If that’s your motive, there are easier options. Give them a pay rise and build a moat-shaped ball pond.
The payoff from building a full Cloud Native system is far greater than a slightly less grumpy tech team. It’s truly existential - the ability to increase the responsiveness of an organisation by 2 or 3 orders of magnitude. However, going Cloud Native is a difficult and lengthy project. In order to make such a major transformation enterprises require a significant motivation. And there is one.
The existential threat from Cloud Native is that other people are already doing it.
Image: Nasa https://images.nasa.gov/details-PIA03149.html