WTF is SRE Conference is coming in hot for 2022 | Register for updates today!

WTF Is Cloud Native

Kubernetes is Doomed!

What does the future of hosting look like?

If a giant killer heat dome over Seattle isn’t a sign, what is? But are Google, Amazon and Microsoft listening? What are their plans and how do the rest of us fit into them?

In our earlier articles on the future direction of hosting technology (one, two), I talked about the EU commission, who reckon our industry has two problems to solve for climate change: embodied carbon in hardware (we need to make physical stuff last longer) and electricity use in DCs (we need to run on 100% carbon zero electricity).

In this article I’m going to talk about both again and take a look at a recent paper from Google about how they’re hoping to tackle these issues in their own DCs. It’s worth looking at Google because for the past 20 years, they’ve been leading the field in efficient DC operations.

But if Google is so engaged in future-proof IT and they invented Kubernetes, why am I certain it’s doomed?

Take a breath

Before we dive into the details, let’s step back and summarise the issue Google is facing up to now and the rest of us will be staring at in horror in less than 5 years:

  • Unless you’re in a country with very substantial nuclear, geothermal, or hydro power, you’ll struggle to get carbon zero electricity all the time. The sun isn’t always out and the wind isn’t always blowing. There are storage solutions, but they aren’t that great. The upshot is most countries are going to have VERY variable electricity pricing and it’ll have to be a behaviour-shifting cost difference, because that’ll be the point. (Capitalism does have some redeeming features and they are likely to be deployed, but for that, you need significant commercial motivation).
  • The expectation has been set by the big players that companies in the tech sector will be zero carbon by 2040 (for hosting, 2030).

That’s going to be tricky. What does Google have to say about it?

Before Kubernetes

Google has been doing programmable infrastructure for a long time, in search of hyper-efficient data centres. In Google DCs, workloads are packed onto machines to use as much of their physical resources as possible, at all times (CPU, bandwidth, network cards, disk space or whatever). This is sometimes described as maximising server density.

Google led the way in using cutting edge tech to accomplish this. Containers? Orchestrators? Cluster schedulers? Google has used them for decades and you can read about the stuff they do with their internal management system (Borg) in this 2013 paper (tl;dr Google’s great at using scheduling to get optimal utilisation of the two resources we’re worried about for the climate: electricity and hardware). As they say, “shifting execution of flexible workloads in time and space can decrease peak demand for resources and power. Since datacenters are planned based on peak power and resource usage, smaller peaks reduce the need for more capacity.”

A simplified version of Borg was open sourced as Kubernetes.

All of this clever stuff was wildly ahead of everyone else a decade ago, but it’s not enough to meet their current goal of being carbon zero (no carbon at all emitted by their operations) by 2030. To do that, they need to get even cleverer.

The received wisdom is that you can’t solve the green electricity problem using scheduling to avoid fossil-fueled electricity (i.e. not running your compute jobs on windless nights). The reason for that is you’ve got loads of carbon investment embodied in your infrastructure and letting it sit idle is also a major form of waste.

So, what are Google doing?

Temporal displacement

A vital part of Google’s cluster scheduling has always been an awareness of latency sensitivity - i.e. the ability to differentiate between urgent and non-urgent workloads.

For example, opening your emails is urgent. If you have to wait too long, you’ll use Outlook. However, uploading your videos to YouTube, which involves hanging around for them to be expensively transcoded, is not urgent. Sometimes that takes 10 minutes, sometimes it takes hours. Users live with it. Google has a good mix of different jobs like this and their cluster schedulers use this kind of variability in urgency to utilise their servers more efficiently.

Hyperscale computing vendors aren’t the only folk who do this. It’s almost a cliche of physical logistics that things get cheaper for customers if they’re relaxed about delivery times. If drivers can wait for their van to fill up, they need to take fewer trips and they use less petrol. However, even in this simple case there’s a trade off - the driver and van are sitting idle and the customer is missing out on getting their stuff sooner. Every logistical decision depends on what you’ve chosen to optimise for.

Google’s willingness to delay less urgent tasks in order to use their hardware resources more efficiently is generally an effective approach, but there’s a downside to putting things off. What if in an hour's time Michael Jackson dies again (less extreme examples of fluctuations in demand are available) the internet is swamped with traffic, and you don’t have any free resources to execute your parked tasks? The longer you wait, the better optimisation you can get, but the greater the odds something unexpected will happen and screw everything up.

Google’s paper on their latest logistical approach to scheduling describes “delaying temporally flexible workloads'' but that's what they’ve always done, so what’s new? Have they changed what they’re optimising for?

Carbon-intelligent compute

One thing has altered for Google - a new willingness to deliberately downgrade server density if there isn’t enough low carbon intensity electricity available to run their machines. The paper even explicitly mentions turning machines off.

According to the paper, Google has built new carbon-aware capacity prediction models (VCCs) for its datacentre clusters. These make the availability of green electricity a limitation on DC capacity - the same as hardware limitations like lack of CPU. The models factor in predictions of both carbon-free electricity availability and demand. I.e they are co-optimising for carbon footprint AND infrastructure efficiency (they acknowledge the embodied carbon problem).

The good news is, it worked. At least for some places. What the team behind the paper found was that the benefits of their carbon-aware scheduling varied considerably between locations, depending on the predictability of the local load and, critically, the proportion of flexible (deferrable) tasks.

This is exactly what you’d expect, so it sounds like the plan is working as designed! Having said that, the changes are not world-changing stuff (1-2% improvements), but they have only just started.

Oh, and if Michael Jackson does die again, they plan to turn its prediction systems off and let the system do the best it can in real time to handle the current situation. You read it here first folks, pop stars dying is bad for the environment.

What’s the end game?

Google is aiming to actively reduce utilisation at times when there’s not much carbon zero electricity available. Hurray! Except there is then a carbon impact from under-utilised hardware (embodied carbon). Bummer! However, they seem well aware that’s the balance they need to strike.

I’m wondering if they’re doing something else to offset the hardware issue. Are they increasing the effectiveness of their packing algorithms by taking more scheduling risks? It might explain why they would be willing to take an efficiency hit. They could get away with it if their workload prediction models improved and thus their general level of efficiency. I bet those models have - it’s the kind of thing ML is excellent at - and I think they hint at this in the paper.

My guess is their strategy is to have those positive and negative utilisation deltas cancel one another out, with the result being no worsening of their hardware’s embodied carbon problem, but a reduction in their carbon emissions. It would be a pragmatic approach.

The trouble with Kubernetes

So what does all this have to do with Kubernetes and why do I reckon it’s doomed?

Kubernetes as a self-managed orchestrator is by definition limited in its scope. A single company doesn’t generally have that much variety in tasks. Google itself says: “in spite of high uncertainties at the job level, Google’s flexible resource usage and daily consumption at a cluster-level and beyond have demonstrated to be quite predictable within a day-ahead forecasting horizon.” You don’t have that many different workloads so you’ll never get the same efficiency levels.

Is there anything at all you can do to improve things? I guess you could start working with what you have. Divide up work into inflexible and flexible jobs, so that you can use schedulers more effectively. At any decent scale of operation with Kubernetes you will reach cluster-level eventually and although what you’ll be able to do in terms of efficiency won’t be a patch on Google, it’s better than nothing.

However, even if you put all that effort in and get some decent utilisation numbers, at that point you’ll hit a new problem: you’ll need a commercial service mesh. Right now that’s bad. Really bad. Service meshes like Istio use loads of layers and generate loads of processing.

There’s a reason the cloud providers use their own custom service mesh-style products - they have written them to be efficient. Publicly available service meshes are way too generic and energy intensive. We have to pray that’ll change in the near future because a service mesh is always on - it’s not a flexible task - so it has to be low overhead, but right now the commercial ones are not. If Kubernetes requires a service mesh and it’s energy costly, that’s a no-no.

Wake up, time to die

Seattle 2021 was a wake up call. There are a limited number of times we can keep hitting the snooze button on those and, let’s face it, we’ve reached that limit. Our two tech problems of hardware longevity and zero carbon electricity must be solved and it’s not long now before your government is going to make you do it.

If you don’t have a plan, get one. Fast. And I hate to break it to you, but Kubernetes is not a plan here. Right now, it’s a guaranteed failure.

If you are betting on K8s long term, you’ll need much more effective cluster schedulers and control planes, and you’ll have to do a thorough job of breaking up and labelling your own tasks. That’s a lot of work, but Kubernetes is a community project, so what are you going to do about it?

Comments
Leave your Comment