Cloud Native Operations, Architecture

OK Cloud, On-Prem is Alright

As someone who has worked in software since 2001, and in the Cloud Native (containerisation and Kubernetes) space since 2013, I'm getting old enough to have seen trends come and go a few times. VMs came (and stayed), continuous integration went from a fad talked about by gurus to the mainstream of software delivery, and containers went from some changes Google made to the Linux kernel to the de facto standard for software packaging, and then on to the foundation for Kubernetes, an industry-standard software deployment platform.

But it's another thing to see a wave come, and then see it recede for a time before you expect it to rise again. And in the last years, we at Container Solutions have observed that more and more of the decision-makers in our industry have shifted their stance on the cloud to the point where it became impossible for us to ignore.

One might say they've changed their strategy, but most of them don't couch this shift in those terms. What we're seeing is that businesses are moving from a wholesale 'migration to cloud' strategy to a more piecemeal, hybrid approach. After experimentation with moving workloads to the cloud (with mixed success), companies have altered their ambitions. Instead, they are now leaving workloads on-prem (or in colos), and only moving existing workloads to cloud if there is a compelling reason to. New workloads are going to the cloud by default.

In most cases we've observed, this is quietly accepted as a new reality rather than trumpeted as a new strategy. We think this is a change that's been borne of experience and financial necessity, two very strong motivators.

To be clear, we're not talking about repatriation here. We've discussed repatriation in previous content, but we're not seeing clients exit the cloud or even move workloads back into the data centre. Famously, David Heinemeier Hansson trumpeted his company HEY's repatriation of their workloads back on-prem. It was so frequently cited and discussed that he wrote an FAQ about it to answer questions. His situation, however, was a relatively rare combination of a single-product tech company that had decided it no longer needed the agility that the cloud brings, and had the skills and confidence to bring their workloads back.

There are hints that others are seeing the same trends. James Watters of VMWare Tanzu's Research and Development department talks here about how he's "seeing a lot of organisations where the trend isn't going one direction or the other. I'm seeing people benchmarking the cost and efficiency of the private cloud stack versus the public cloud stack, and people are having a lot of thoughtful conversations about that and in some cases are growing their private cloud deployments because of the results they see".

We've Been Here Before

As we all know, those who do not learn from history are doomed to repeat it. And the mainframe to PC migration of the latter 20th century offers a perfect example from living memory of a similar evolution. Putting precise dates on these kinds of shifts is difficult, but my researches suggested that this migration took place over a roughly 20-year period, from the early 80s to the first few years of the 21st century. The start date matches the time Bill Gates was said to have said that his vision was 'a computer on every desk and in every home'.

This 1982 article from the Sarasota Herald Tribune suggests that in 1981, IBM moved from 'large central computers' to 'small, smart computer equipment' whose potential IBM's rivals had seen in the late 70s. Interestingly, this shift followed a 13-year antitrust suit from the Justice Department that charged IBM with a mainframe monopoly. This poses an interesting question of which came first in this decision: the market changes, or the legal pressures? It's generally believed that the market rendered the monopoly discussion irrelevant as PCs took over global compute spend.

Discussion about mainframe to PC migrations petered out around the turn of the century, as references began to consistently use the past tense when talking about it. Again, it's interesting to note that this coincided with the 'Y2K bug' spending frenzy and the dot-com boom, suggesting that these investment bumps were what tipped mainframe migration to PC discussions to the past tense.

Although it's relatively easy in retrospect to say when these trends started and finished, as it happened there was confusion about where we were in the cycle, or even whether the cycle was still in play. As 'late' as 1986 - over half a decade since IBM had decided they were late to the party - PCs were not considered rivals to mainframes, and even eight years after that in 1994, experts were still being asked whether client-server was the future of computing.

So we might say that the secular trend away from the mainframe was strong, but within that trend there were periods of growth and deceleration for the PC. What we're seeing now may be a similar reduction in the rate of growth of a trend of cloud transformation that's been going on arguably since 2006, the year Amazon introduced the S3 block storage service.

Why Now?

The causes of this retreat from wholesale cloud migration are manifold. We'll look at three of them, in rising order of importance.

  • Migration exhaustion (and IT worker conservatism)
  • Cost of hardware/colos
  • Macroeconomic trends

Migration Exhaustion and Conservatism

Enterprises are wearying of transformation narratives. The last 10-15 years have seen revolving doors of CIOs proclaiming cloud transformations that have not lived up to their billing. Our founder even wrote a book called Cloud Native Transformation that captured this zeitgeist (and how do it right with the help of the patterns we open sourced, natch).

The hardest (and most underestimated) part of these transformations are the least technical ones. Once an API exists for a service, then moving your software to run on it is - in principle - straightforward.

What really constrains cloud transformations includes the innate conservatism of the broad mass of IT employees, and the organisational inertia of businesses that whose organisational architectures were not shaped by cloud economics. In other words, the 'money flows' of businesses are designed for organisational paradigms out of kilter with the new computing paradigm (which we've written extensively on this under-explored constraint in the past, having seen it as a major blocker to cloud adoption).

Natives, Converts, Hobbyists and Others

We see a strong divide in staff we encounter. The first people we usually meet on a client are the enthusiastic and capable cloud 'converts'. These internal evangelists are usually from a background specialism such as development, system administration, or networking. In this specialism they first used - and later in their career embraced - cloud technologies, or are career cloud natives. We often wonder why they need our help when they have such smart, committed people saying many of the right things. We find out why as we go deeper into their business and talk to the second and third groups.

The second group are feature team platform 'hobbyists', whose day job is typically to deliver features, but are motivated to improve delivery. They often become platform engineers over time if that's the way they want to go, as their skills (programming, source control, automation) are readily transferable from development into that field. Such people are relatively rare (this was my own path).

Finally, we have the broad mass of IT employees who don't have much interest in cloud computing. They are generally either satisfied with shipping features, or maintaining complex systems they have come to know well, or are at a stage of their career where operating in a new paradigm does not fill them with excitement. I once worked at a client with a DBA tasked with aiding a cloud migration who was quite resistant even to using shell scripts and Git, let alone Terraform or Python cloud libraries. These people can be considered like those who either clung to the mainframe in the nineties and survived, repurposed their career to a less technical path, or got out completely. There are more of them out there than many clued up engineers might think.

This is particularly marked in mainland EU, where employment law makes it relatively difficult and costly to turn over staff compared to the US and UK. Time and again we have discussions with business leaders that are frustrated by their own inability to move the cloud needle within their own organisation, and effectively unable (or unwilling) to part with long-standing staff. This is one of several factors that results in the frozen middle we've written about addressing previously.

The last ten years have seen several waves of cloud transformation, and these waves, along with the natural conservatism of the bulk of the IT workforce has resulted in a retreat of enthusiasm for wholesale migration. But this by itself would not stop an industry motivated to change. So we must look deeper.

Cost of hardware

While the IT world was focussed on cloud migrations and transformations, the cost of hardware continued to drop.

It's always been known that the performance of a cloud CPU and memory is not the same as physical hardware, but putting exact numbers on this is tricky, as it depends on so much else than the headline numbers, not to mention the fact that you're sharing your CPUs with other workloads. This study by Retailic suggests that 2 cloud 'vCPUs' (virtual CPU) is roughly equivalent to one 'real' CPU on-prem. And memory performance is even more murky, as this post attests.

So looking at just the CPUs you get 'twice' the value per CPU you pay for on-prem. So on-prem is cheaper? Not so fast. You still need to take into account cost of running your own data centre (software maintenance, electricity costs, risk of outages, network costs...) to do a proper comparison. Building all these capabilities and paying for all these 'extras' is the basis of the arguments for the cloud's value in the first place.

And, of course, you have to compare the yearly rental costs of servers with the yearly depreciation costs of buying the hardware in the first place. And here, hardware has also continued to get cheaper. Traditionally, computing equipment was depreciated over three years, but anyone who has bought servers or desktops in the last ten years will know that they typically have a useful life far longer than that.

For example: at home I have a nine year old Dell Xeon workstation that I bought for £650 three years ago (to run virtual Kubernetes clusters on, natch) that is still going strong. Inflation adjusted, the 2015 cost of that machine has depreciated at about £180 per year. According to AWS's cost calculator, a similarly-spec'd m6g.12xlarge instance on AWS would set you back over £11,000. This eye-watering sum includes a one-year reservation, and paying all upfront for a hefty discount. A three-year reservation would only bring that down to £7,000 per year.

At the other end of the 'serious computing' spectrum, Google saved billions of dollars at the stroke of their CFO's pen when they updated their servers' lifespans, from four to six years. That's a 33% reduction in cost on a pretty large line item. (Incidentally, Google started up by saving money and increasing compute power buying up commodity PCs instead of large servers or mainframes in the early noughties.)

Even if you don't fancy running your own data centre (the usual counter to these arguments), then you can rent colocated server space very cheaply. I quickly found a provider in London who would give me a 10U space for £119 per month. Data egress costs are cheaper, and you can scale up or down other features (such as network bandwidth) depending on your requirements.

Or you can buy 'clouds in a box' like Oxide at the higher end, or SoftIron's Hypercloud at the cheaper end, which offer options to self-host a cloud offering either in your data centre, or in a colo, or HPE Greenlake which offers metered usage of a similar product. If you combine these with genuinely Cloud Native workloads then you can pick the best cloud for your use case, which might be tempting as interest rates rise.

Macroeconomic Trends

Finally, we come to the most fundamental cause of this migration slowdown. Across the world, bond and interest rates have risen from historical lows in 2020 during the COVID pandemic to levels not seen since 2008. The tech boom we saw during the pandemic, where cheap money was thrown at tech in a bid for growth became a bust as many businesses woke up to the fact that they had overspent in the previous years in a bid for fast growth.

Screenshot 2024-04-24 at 14.46.07

This caused a cutting-back of spending and funding as companies and investors adjusted to the new reality, and CFOs cut back spending en masse. This in turn meant that ambitions for moving to the cloud were either dropped, put on hold, or cut back significantly. This resulted in noticeably many more requests for cost optimisation or finops work than we had previously, and noticeably fewer requests for large-scale cloud transformations.

What Next?

If the current position is one of a slowing in the growth of cloud provider take-up, then what does the history of mainframes and PCs teach us about what comes next?

There are stirrings of alternate offerings to the big providers coming to the market that leverage cheaper and less feature rich cloud providers. Cloudfanatics is one such company, about to come to market offering cheaply provisioned packages of multi-tier infrastructure for cost-conscious SMEs that don't have the expertise to build themselves. The packages use whichever cloud provider is most appropriate for the use case in terms of cost and features/performance.

Cloudfanatics' founder, Andrew Philp, was inspired to start this company up by his experiences with SMEs' struggles with both cost and maintenance of cloud systems. Although their needs were generally simple, he found that lack of availability of staff who could pick up a cloud setup from a departing employee, and poor practices around cloud topics such as IAM setup and secrets management meant that there was a sweet spot of functionality that could be delivered in a cost-effective way.

This may be the beginnings of the next 'mainframe vs PC' war. This time the big three cloud providers are the 'mainframe' vendors, and these smaller cloud providers are the more agile and simpler 'PC'. APIs such as Kubernetes and the AWS S3 protocol are the equivalent of the IBM PC standard, allowing customers to port their applications if needed.

Already we see alliances of cloud providers such as the bandwidth alliance clubbing together (facilitated by Cloudflare) to offer a looser agglomeration of cost-saving options for cash-constrained IT leaders, including free data transfers between providers. Data egress costs are often a bone of contention between customer and cloud.

What we are unlikely to see is a wholesale retreat from cloud to either on-prem or the colo. A whole generation of engineers and business owners have been raised on the cloud and Cloud Native tooling and technology. While the prem will never go away, wholesale repatriation programs like HEY's will be relatively rare.

What's critical in this new world is to ensure that your workloads can run portably. This is what is meant by the phrase 'Cloud Native'. Portability has always been the dream of the CIO looking to cut costs and increase choice, from bash scripts to mainframe software, and a source of contention between bit IT and consumer. As an enterprise architect for a bank, I once had a heated discussion with an AWS representative that told me to 'just use ECS' when our strategy involved making our workloads Kubernetes-native. Soon afterwards AWS announced EKS to the surprise of many, including the AWS rep.

What Does AI Mean for Cloud?

Opinion is divided over the effect AI workloads will have on these trends. On the one hand, Michael Dell cites a Barclays CIO Survey report which suggests that private cloud repatriation is rising to being on 83% of CIOs' radars, from a low of 43% at the height of COVID. It makes one wonder whether COVID drove a lot of necessary short-term costly cloud spending which is now being repatriated in a piecemeal fashion on a case-by-case basis.

Michael Dell mentions another factor in repatriation: data gravity and AI inference workloads, and here there's a split in opinion. On the one hand, companies looking to get ahead in AI may want to buy the latest hardware and run it themselves; on the other hand, renting cloud compute for short term agility may make more sense than buying AI hardware that will be out of date in the blink of a Moore's Law cycle.

Portability More Important than Ever

With bond rates continuing to rise, expect more and more pressure on costs to arise. But this doesn't necessarily mean the end of cloud computing. In fact, cloud spend is continuing to rise, just at a slower rate than before, according to a recent CIO survey.

Just as with the mainframe-to-PC computing paradigm shift, we are likely to continue to move to a cloud-first world. The big three cloud players will have their moats eaten away by smaller, cheaper and interoperable solutions. Some workloads will return to on-prem from the cloud, and a few specialised workloads will remain there until well past the time where the cloud is the default choice.

In such a world, it will become more important than ever that your software workloads are portable, and properly implemented cloud native build and delivery methods will help you ensure that you have that portability. It just won't matter that much whether it's actually running in the cloud or on touchable tin.


Leave your Comment