WTF Is Cloud Native

10 Predictions for the Future of Computing or; the Inane Ramblings of our Chief Scientist

TLDR;

  • WASM will be everywhere: compile target, deploy target, IoT, plug-in ecosystems. This is already happening! (1-5 years)
  • Rust will continue to grow in popularity and will overtake Go in the next few years by the RedMonk index. (2 - 4 years)
  • A serious rival to Kubernetes will emerge. Bonus points if it uses WASM and encourages a GitOps style paradigm. (2-5 years)
  • The Blockchain ecosystem will implode, but who knows when. Possibly it will happen quietly and years later we’ll talk about “the block-chain winter”. Who knows? (1-10 years)
  • Supply chain security will be big. There will be more hacks on the scale of SolarWinds (there probably have been already, we just don’t know) in the next ~2 years. Supply chain tooling (I hesitate to say “solutions”) will be a big growth area, but the industry will still be slow to achieve widespread uptake (e.g. getting everyone to use SBOMs). (~2-10 years)
  • Barely a prediction, but serverless will continue to grow and will slowly become the dominant paradigm. (10 years? There’s a lot of steam in Kubernetes.) However, it will also experience more backlash and “failure” stories as people struggle to figure out how to architect systems for the new paradigm. (next 2 years)
  • We will start to see companies moving partially back to on-prem for cost savings. (2-5 years) This might be the most controversial/unlikely idea here.
  • There’s an outside chance of an AI building a multi-billion company leveraging smart contracts which enslaves the whole of humanity. (10-20 years)
  • OK, hopefully not, but there’s the possibility of mass disruption across multiple industries from AI/ML advances. I don’t believe we will develop a general artificial intelligence, but instead will make big jumps in specific fields. This may involve jobs being wiped out en masse e.g. truck driving. It may surprise us what sectors are affected. (2-20 years) (I don’t have a clue when, but changes will be sudden.)
  • On the same topic, GPT3 style helpers - effectively autocomplete for everything - will be widespread. Artists, writers, developers, operations, composers will all be using them. (1-4 years)

Programming Languages

I’ll preface this by saying that I’m not a programming language expert. It’s one of those areas I feel I should know more about and would love to play more with!

In recent years there seems to have been a swing towards typed languages - perhaps most notably TypeScript and Rust. TypeScript is now used in the majority of JavaScript frameworks and is one of the top 10 languages according to a recent GitHub Octoverse report.  Rust in particular I think will see a lot of growth, with more and more low-level software being written in, and in some cases ported to, Rust in order to achieve safety and speed. It also fits very nicely in the WebAssembly (WASM) ecosystem, as it can compile to a small WASM binary, mainly due to the lack of a runtime or garbage collection (GC). Having no GC is almost an oddity in modern languages and is due to Rust’s unusual memory model and concepts of ownership and borrowing. Looking at the RedMonk index, and considering the factors pushing Rust forward, Rust is likely to overtake Go in popularity within the next few years.

Longer-term I think we will see new languages which build on concepts in Rust (primarily the memory model & borrow-check) with higher-level features becoming popular. Taking the type-system to the next level, I believe a language with dependent-types (such as Idris) will make the jump from academia (or hobby language) to become a popular language used within industry.

When developing microservices, especially for Kubernetes, it’s beneficial to use a language that can produce small stand-alone binaries. Languages that compile to WASM are likely to also become more important, as they will provide access to various PaaS and edge platforms. Both these factors may limit the growth of languages such as Elixir and Gleam, which rely on the Erlang VM. (Note that projects like LUMEN may prove me entirely wrong here.)

Kubernetes and Deployment Platforms

Over the next 5 years Kubernetes (also known as k8s) will continue to grow. But unless it does something to address the burgeoning complexity, we will start seeing serious competitors. We are getting to the stage where running and maintaining Kubernetes is complicated enough that users are turning to managed services like GKE or employing specialist companies like Giant Swarm and Container Solutions to take care of Kubernetes. Even companies on managed services will be looking to specialist companies for support. This isn’t necessarily a bad thing - these services empower organisations to focus on the core business - but it does mean that users who are reluctant to pay for these services will be attracted to simpler alternatives.

It’s worth noting that the complexity isn’t just hidden under the hood. It’s spilling out into the interface and impacting users. It’s still fairly easy to hack at `kubectl run` and get a demo up and running. But running production apps and figuring out how to expose them securely requires understanding a wealth of different features that inevitably result in YAML files longer than most microservice source code.

Why is there such complexity? A lot of it is evolution. We start with something simple (well, comparatively simple in the case of Kubernetes) and then add support for use case x. Then we realise it would be better if we did z and rewrite things but have to maintain backwards compatibility. This results in complexity that isn’t inherent to the problem (accidental complexity). Meaning a new competitor can come along and replace it, as they don’t have all the historical baggage and can learn from the achievements and mistakes of the past.

To put it another way, increasing the number of supported use cases has led to the “80/20” problem - 80% of users only use 20% of the features, but everyone uses a different 20%. Taking away features is difficult. New competitors don’t have this problem and can build a new offering around a smaller core set of features and potentially fix/avoid other issues (£100 says it doesn’t use YAML).

As ever, we will see changes first at the smaller scale. Small companies and individuals will avoid k8s in favour of simpler solutions, probably some sort of open-source PaaS and probably utilising WASM. Nomad may start to gain significant uptake over the next few years. To begin with people will say “yes but you can’t use x at scale”, but slowly the problems will be addressed and another sea-change in the industry will be upon us.

The other possibility is that Kubernetes becomes an underlying infrastructure layer that is built on-top of by everything else. So small projects might use what appears to be a simple, streamlined PaaS (or FaaS like Knative), but that PaaS will be k8s under the hood. I’m somewhat sceptical that this will achieve mass adoption due to the amount of resources required by Kubernetes and the tendency for Kubernetes complexities to “show through”. It may be simpler and far more efficient to distill the best bits of k8s into a new system - we are seeing a lot of exploratory work here like k3s, KCP and badidea. On a side-note, internal platforms and tooling like Humanitec, Backstage and Crossplane will become commonplace at large organisations, and this won’t go away even if Kubernetes does.

(For those of you interested in building a Kubernetes killer, it might be worth taking a look at Prolog and this discussion.)

Whatever happens, Kubernetes is staying with us in some form for a long time. It’s still evolving at a fast rate and we can see the technologies that are likely to influence the next few years. Custom operators and GitOps will become commonplace. Some innovative Kublet implementations like Krustlet (which supports running WebAssembly modules as pods) may start to get traction.

WASM

WebAssembly has been around for a few years, but may now be poised to become ubiquitous. To understand why, it’s probably easiest to think back (assuming you’re of a certain vintage!) to the original slogan of Java: "Write Once, Run Anywhere". We were told that Java would run everywhere and be completely portable. It was a big success, but nowhere near the levels claimed. Why not? Well:

  • It was (or at least was perceived to be) slow and memory hungry. This pretty much killed it at the edge in particular.
  • You needed to learn Java (there are now a lot more JVM languages, but the choice was limited before).
  • Writing JVM implementations was not trivial and differences between them led to the curse of “Write Once, Debug Everywhere”.
  • Running in the browser (applets) required installation of a plug-in.

Well, WASM addresses all of these points. It’s relatively simple, efficient and small. Many languages can be compiled to WASM. The major browsers already have mature implementations. The security story is compelling - the WASI project lets you control exactly what WASM is allowed to do, what input it can read from, what it can write to and what kernel calls it can make.

We’re already seeing multiple projects adopt WASM for their plug-in system including Envoy and Ethereum. This will only expand, as it makes so much sense; you can control what the plug-in is allowed to access at a granular level while allowing users to write the plug-in in whatever language they like.

WASM replaces containers for a lot of use cases, and I expect to see more integrations with Kubernetes, building on the promise already shown by Krustlet. More interesting is the use of WASM to power new PaaS and FaaS platforms, including Fastly compute@edge and Cloudflare workers

We’ll also see it used at the edge, primarily due to portability and disk size.

That being said, there are still challenges. I wrote above about there being support for compiling multiple languages to WASM. This is true, but support is not equal. Rust seems to be the number one language by a distance, because it has good support and creates relatively small files (due to the previously mentioned lack of GC and runtime). AssemblyScript - a version of TypeScript adapted for WebAssembly - is also popular. 

Whilst there is good support for other languages including Go, file sizes tend to be bloated by  garbage collector implementations or runtime features. Other language implementations tend to be in their infancy. 

The same can be said for a lot of important infrastructure projects like WASI, which defines how WASM interacts with the host environment. The ByteCode Alliance will need to play an important role in quickly building out the ecosystem.

Supply Chain Security

We’ve been awful as an industry at this (partially due to broken incentives in the security industry). What’s surprising is that it hasn’t led to more attacks. We will see more and more cases where organisations are accidentally running “poisoned” versions of software, because an attacker has been able to inject their own software at some stage - whether it is during compilation, distribution or updating. In some cases this will result in embarrassing crypto-ransoms, but we will start seeing more and more “intelligent” attacks where one organisation is compromised as a stepping stone to another organisation (ala SolarWinds).

The answer to this problem is to start thinking about how we prove the provenance of components running in production. It is imperative that SBOMs and similar metadata become standard practice, and tools such as in-toto and Notary v2 become commonplace. The GitOps approach described below also has a part to play, by cleanly separating privileges between CI and deployment, as well as providing a clear trail of who changed what and why.

The potential impact of future attacks is severe enough that governments are starting to wake up and take notice - the White House has issued an order to review the US government’s software supply chain and the UK has issued a call for views on supply chain cyber security. Hopefully this is the start of a coordinated effort to improve standard practice and build an ecosystem of tooling that is effective in preventing attacks.

The optimistic prediction here is that these projects and approaches (or equivalents) get significant uptake. The pessimistic one is they don’t and we see increasingly frequent, increasingly devastating, supply chain attacks. 

Blockchain and Cryptocurrency

I’m sorry bros, but whilst I think blockchain has its uses, the vast majority of companies in the area will fail. There just aren’t enough viable use cases to justify the amount of money in the ecosystem. If you’re in that area, I hope you’re selling spades.

One area that could prove me wrong is smart contracts. Perhaps this is only because it reminds me of Accelerando - could we have AIs building an empire on the back of smart contracts? (And what will smart contracts be written in? You guessed it - WASM.)

Another potential use case is in the previously mentioned area of supply-chain security - could we use a blockchain to identify the provenance of software?

On “crypto” more generally, I would love to see a real way to do micropayments and cheap (near zero cost) international money transfers. I’m sure this was one of the promises of cryptocurrency but it hasn’t been delivered. It’s so hard to assess the myriad of projects in cryptocurrency that I have no idea if this is likely to be achieved. At the moment we have companies like Coinbase, who charge significantly higher percentages than stock brokers for similar services.

We have to stop the ridiculous wastage of resources that Proof-of-Work entails. In the short term the only real alternative seems to be Proof-of-Stake and it’s imperative that we move to such a model. I honestly hope that Bitcoin comes to an end, but the amount of money and number of backers with money mean that’s probably not going to happen in the short term.

Regarding NFTs, I'm again sceptical but enjoyed this article from earlier in the year.

GitOps and x-as-code

The idea of GitOps is fabulously clean and simple. Store the required state of the Kubernetes cluster in Git. If the actual state of the cluster deviates, reconcile (which hides a lot of different possibilities). When you need to change the state, the Git repo is updated and the cluster is “reconciled” in turn. The beneficial side effects are fantastic: we should be able to bring up an identical cluster by just cloning the repo, we have a full log of all changes, and an established mechanism for discussing and approving changes (pull requests). Implementing GitOps isn’t as easy as it sounds however, and there are already a number of competing technologies - including Kubestack, Flux and Argo CD.

We are already applying GitOps to the stack below Kubernetes e.g. using Terraform to bring up the cluster. With the rise of microservices, serverless, service mesh and SaaS components like queues and DBs, what were once application concerns - eg wiring functions together - have to some extent been pushed into the cluster or infrastructure layer. The obvious corollary to this is that YAML files aren’t enough to build and define clusters any more. Instead we need full-blown programming languages. Pulumi saw this early on and jumped on it, but I think we might see a lot more iterations and potential solutions. Again, WASM may have a part to play here in allowing users to bring their own programming languages. The next few years will clarify this, but I expect a lot of hand-written YAML will be replaced with CDK, Pulumi and the likes, which are simpler to read and reason about - YAML and CloudFormation will effectively become compilation targets. 

Serverless and FaaS

The above point leads into the uptake of FaaS solutions such as Lambda. This will definitely happen, but it’s not the clean and simple change that some proponents seem to believe it is. Effectively using FaaS requires a different style of architecting applications. Queues and messaging infrastructure become essential components whose interaction must be fundamentally understood before reliable services can be built. What could previously be handled with data structures and function calls must be remodelled and thought out as a distributed system with support for error handling. It will take some time for best practices and design patterns in this space to become standardised and common knowledge.

At the same time, it’s not clear to me that Lambda will take all here. The edge computing FaaS offerings from Cloudflare and Fastly are compelling, offering impressive performance and scaling as well as language flexibility through WASM. The downside is they lack the supporting infrastructure of the cloud providers, who at the same time are building out their own CDNs to neutralise their advantage. All of these offerings suffer from being proprietary, which scares many companies with thoughts of “lock-in”. For this reason, open alternatives like Knative and OpenFaaS are popular and further fragmenting the market.

Serverless in the broad sense (both FaaS and SaaS apps like databases and queues) will become the dominant paradigm, but the road there may be bumpier than we expect. The next few years will see both success stories (“we saved 10k a month by moving to serverless”) and disaster stories (“we abandoned serverless after it cost us 10k a month”). 

AI and Machine Learning

This is the joker in the pack that scares me. I touched on AI companies running smart contracts, but that’s really the sci-fi fan in me rather than the pragmatist. We can get a better idea of what’s happening by looking at what GPT3 (original paper) can do and where we are with self-driving trucks and cars. Will I be able to write a blog post with the quality of a George Orwell essay? Will all authors start using AI as a co-author and editor? Truck driving is one of the biggest sources of employment in the US - how many of them will be replaced with AIs in the decade? Just how many jobs in how many industries will be displaced? (For some more - and better researched - predictions take a look at Sam Altman’s articles and interviews.) Or is it just another hype-cycle?

In the short term, the major change seems to be AI “helpers” and “autocomplete” based on GTP3 and its successors will be everywhere. If you’re writing a blog, it will help complete your sentences. If you’re developing a web app it will complete your methods. If you’re writing a song, painting a picture, sketching an engineering plan, “help” is at hand. Those of us that eschew such help are likely to be left behind. 

Bringing things back to concrete developments in cloud computing, this also mirrors the growth of AI Ops - where machine learning is used to analyse logs and telemetry data from a running application in order to identify issues and areas of improvement.

I don’t believe we will develop a general artificial intelligence any time soon, so drastic changes are likely to be limited to various industries and use-cases. But the changes to those sectors may still be a complete revolution. These changes are likely to happen suddenly and the benefits will go to a small number of companies that own the technology, furthering the economic splits in society.

My fear stems from knowing that I haven’t even imagined some of the possibilities, and with AI changes can happen almost overnight. Sci-fi authors often talk about the “singularity” - broadly speaking the idea that when AI crosses a certain point, change will accelerate and humans will be unable to predict or keep up with progress. Some views on this may be hyperbolic, but I absolutely believe that AI is going to have major societal impacts we haven’t foreseen. 

Rise of the Hybrids

There seems to be a lot of activity in the on-premise, bare-metal and hybrid markets at the moment, from both new players and old. This isn’t a sector I follow closely, so again this may be off-target, but I’m going to continue babbling anyway.

It might look from the outside that everything is moving inevitably towards the public cloud, but I believe we’re at the start of a swing back towards on-prem and a hybrid mix. The traditional hardware companies like Dell and HPE may have made a lot of mistakes along the way, but they seem to all be moving towards a *aaS model, where consumers pay for what they use as they go. At first this sounds incompatible with having on-premise hardware, but presumably it means vendors will ship HW with excess capacity with guarantees of fast delivery of further HW if required. An interesting thing about this model is it allows a balancing of commitment, CAPEX and OPEX. Want a lower monthly per instance cost? Agree to a 5 year contract and/or buy HW up front. Want a more flexible model as you figure out your business model? Take a 1 year contract but higher per-instance fees.

This model is exemplified by HPE’s GreenLake and Dell’s Project Apex. Given IBM’s recent acquisitions and existing products and solutions, it’s a fair guess that they will make similar moves in the market. Nutanix are also clearly in this area, providing a software control plane backing onto cloud resources and/or on-prem HW. The importance of the control plane is hard to overstate - the model will only work if it’s possible to easily integrate hybrid resources and maintain the infrastructure. Newcomer Oxide presumably also have some innovations planned in this area, through providing better integration between HW and the various software layers to the hypervisor. It’s also worth pointing out that this isn’t a million miles away from what bare metal and data centre companies like Equinix and Scaleway currently offer and are building out - the difference perhaps being what do we mean by “on premise”? Is it stuff that runs in my own data centre, or can it also be my hardware in someone else's data centre?  Do I have to own the hardware, or can I rent it? 

In the background, we also have an interesting set of dynamics between the cloud providers and the chip manufacturers. Cloud providers want to commoditise chips so that they’re cheap and fast to swap out every few years. Chip manufacturers want to make sure they sell as much as they can to the cloud providers whilst at the same time retaining control in the market. To retain a diverse customer base with varying needs, chip manufacturers are likely to be supportive of the HPE and Dell moves and anything which promotes diverse on-premise and edge computing platforms. In contrast, cloud providers have started building their own custom chips and pushing into the on-prem market

The cloud providers also have a fight with CDN providers such as Cloudflare and Fastly. Both these companies have started providing serverless computing services which utilise their data centres to operate as close to the customer as possible (a form of “edge” computing). By being so much closer to the end user, there are major advantages for speed and - it seems - cost. Their big disadvantage is that they don’t have access to the massive range of functionality offered by AWS etc - typically you get a data store and compute services and little more. Whilst I expect these services to grow enormously, the cloud providers are fighting back by aggressively expanding into the CDN space

Given the potential cost savings and “lock-in” avoidance, we will start to see some companies moving “back” to on-premise/hybrid. The cloud will continue to be dominant, especially in the start-up space, but established companies will be looking to see if they can make significant OPEX savings. Perhaps the more difficult question is who will be the biggest winner from this movement - the traditional hardware vendors, the bare metal and data centre providers, edge computing providers, cloud providers or management plane software vendors?

A Quantum Footnote

Quantum computing is another area where you could write everything I know on the head of a pin and poke it in my eye. 

Given quantum computing involves vacuums and temperatures near absolute zero, it seems unlikely that we will be getting quantum laptops anytime soon. In fact, the costs are so great, that only massive corporations and governments are able to afford their own quantum computers. This doesn’t cut out the public from quantum computing however - the major cloud providers have all announced research into quantum and services for rent. They offer potentially large breakthroughs in NP Complete problems, such as molecular simulations and optimising logistic problems. This may also mean that TLS is broken for those that can afford it. At the moment it seems that quantum computing will provide important speed-ups for some classes of problem, but won’t upend computing in the short term. The real impact may be in speeding up research in scientific fields (think physics, chemistry and biology simulations) which may in turn lead to breakthroughs elsewhere.

Quantum teleportation seems more likely to lead to important breakthroughs that are fundamentally important to the public - could we have faster than light high-bandwith communication between opposite ends of the earth (and beyond!)? Again, I think we are some way off the technology affecting Joe Public.

New call-to-action

Comments
Leave your Comment