The Structure of Day 2 Problems

Companies who adopt Cloud Native technologies and principles sooner or later (often sooner) bump into Day 2 problems. This is not because the tooling is bad but rather the opposite - the tooling is excellent. This means that it’s easy to get started with and therefore easy to get into trouble with. In this blog, we’ll look at the dynamics that propel our customers forward on their Cloud Native journeys. We’ll also see that there is a structure for ‘getting into trouble’. To understand that, we’ll take a look at the Hero's Journey focussing specifically on Perseus and his fight with the Kraken before we look at the Cloud Native Journey and the lessons it teaches us.

 

Image 1 - The Hero's Journey in mythology.

 

We know, nowadays, that we recall information more easily when it's organised as a story. This quirk of our brains is why a the number of narrative patterns is limited, according to one writer, to only seven. One such pattern is the Hero’s Journey. We can see how this works by looking at the Grecian myth of Perseus.

Perseus received a call to action, to fight the Gorgon Medusa. Whilst still in the realm of the living, he received his supernatural aids. From Zeus he got a sword. From Hermes, a pair of winged sandals, so he could fly. From Athena, he received a polished shield. From Hesperides, he got a sack to put Medusa’s head in. These tools were exactly what he needed to overcome what he thought his real challenge was - to kill Medusa.

Only when Medusa was dead, and Poseidon was sufficiently pissed off, did Perseus come face to face with his real challenge: he had to kill the Kraken. This was where the great revelation came; Perseus couldn’t kill the Kraken with his supernatural aids. For that, he needed to muster both courage and ingenuity. That was the point of transformation and the moment his naive self died and his adult self was born.

Armed this time not with supernatural aids but courage and ingenuity, Perseus killed the Kraken, showing it Medusa’s head and turning it instantly to stone. He then married Andromeda, thus symbolically passing into maturity as the masculine was finally married to the feminine. Perseus then returned to the overworld, armed with new found wisdom and courage.

The Cloud Native Journey
In Cloud Native, killing Medusa is getting a proof of concept up and running using supernatural aids like AWS, Docker and Kubernetes. Killing the Kraken is getting Cloud Native technologies and principles rolled out across your whole organisation complete with logging, monitoring, network segmentation, governance, training and management that suits the strategic goals you’re hoping to achieve

 

 

Image 2 - Complexity increases over time.

The Structure of the Problem - Securing Approval Based on Limited Information
At stage one, organisations experiment with things like containers and orchestrators. At stage two, organisations have moved into the proof of concept phase. Here we see auto scaling, load balancing, automated builds and microservices. At stage 2 we also see the birth of Legacy AWS - that is AWS setups that work for a small number of users but that do not scale. The ‘mistake’ at this stage is to allocate budget and resources for a full migration to Cloud Native based on limited information. In stage 3, complexity starts to rise, security and scale issues appear, and the ‘mortal’ engineers in your organisation get confused and scared about this new stack they are forced to work with. The false confidence provided by the supernatural aids gives way to, as Andy Grove put it, the reign of chaos.

Mapping the Hero’s Journey to the Cloud Native Journey
We can hack the Hero’s Journey image to see what the Cloud Native journey looks like.

Image 3 - The Cloud Native Journey. A repeatable pattern that is emerging across the industry.

 

Stage 1 - The Call to Adventure
In recent times, companies like ING in the Netherlands, Bank of America in the US and HolidayCheck in Germany have started to succeed with Cloud Native. Other companies see this success and conclude that they’re not much different to these other companies.

At the same time, with every passing year, systems get harder and harder to maintain, time to market gets longer and longer, and the internet gets louder and louder with stories about how the cloud can help with all these things.

Cloud envy combines with day to day stresses to create a powerful call to action.

Stage 2 - The Supernatural Aids
It stands to reason that the tools available to us today are much better than 20 years ago. They are easy to get started with, have outstanding online tutorials and great communities. These tools are as close to supernatural aids as we ever seen. They can, very clearly, be used to utterly transform an organisation. At stage 2, however, the focus is all on the technology. The scale of organisational change that is required has yet to be revealed. That lies in the underworld.

Stage 3 - Threshold
Budgets have been allocated. We have committed to getting to the cloud before 2020 (because it’s a cool number). There’s no going back. We have crossed the threshold.

It is at this point in the journey, in my role as the CEO of Container Solutions, that I try to slow customers down. Sometimes my warnings are heeded. Often they are not - but what hero ever listened to warnings of the underworld from those who have been there? None. Ever. (It wouldn’t be much of a story, would it, if the hero actually listened to the naysayer? Imagined if Luke would have listened to Uncle Owen.)

Stage 4 - Challenges and Temptations
Stage 4 is an unsettling time for organisations. Just as Perseus passed into the underworld, so too do organisations. Many existing mental models don’t hold and so infighting becomes rampant. Alignment becomes difficult as the temptation of returning to the old ways of working kick in. Fear rises as those whose jobs are at risk become aware of the coming danger. On the technology side, nothing seems to work perfectly if at all. The autonomy of developers leaves gaping security holes in production code. The latent hope that there will be a technical solution to what is essentially a problem of management gives way to despair. People resign. It’s horrible.

Stage 5 - Revelation
Take a look at the videos posted by ING’s managers about their transformation to Cloud Native. They don’t talk too much about technology. Take a look at how Intel made the switch from memory to microchips. They also don’t talk too much about technology. The migration to Cloud Native is a management problem and not a technology problem. This is the great revelation.

We have discovered what we call the Skunk Works anti-pattern. Skunk Works was the name for Lockheed’s R & D department. The idea was simple enough. When you want to get research and development done, you grab your best people, stick them in a room, and leave them alone. You remove all organisational constraints, like time sheets and annual reviews, and you let them innovate.

Many companies who want to get into Cloud Native have created Cloud Native Skunk Works. These teams are often called things like, ‘The DevOps Seals’, ‘The Cloud Native Squad’, ‘The Lions’. These teams often succeed with creating reference architectures but fail miserably when it comes to rolling their ideas out across their organisations. That’s because innovation and adoption require different skill sets.

The great revelation, then, is centred around strategy. You can think of strategy as including three things: the strategy itself, what are we going to do? The implementation, how are we going to do it. And finally the, execution. In other words, the great revelation is the moment companies realise that the tooling won't help but courage and ingenuity might.

Stage 6 - The Transformation
The point where companies start to forget about technology and focus on managing the transition properly.

Stage 7 - Atonement
OK. That was really hard.

Stage 8 - Back to the Overworld
After a pretty harrowing journey into the underworld of distributed systems, organisations return more whole, more wise and more mature. Their hard fought lessons are embedded in their mental models, their processes and their new workforce, which will not have the same people in it as the original workforce. The benefits of Cloud Native kick in and competitive advantage, for the time being, has been achieved.

Conclusion
When something kicks off, it takes a while for patterns to emerge. Three years ago it wasn’t at all clear how companies succeed with Cloud Native or if indeed there were any recurring patterns. But now there are some emerging. In this blog we spoke about a few.

  • Legacy AWS. A few years ago, many companies hacked together Cloud Native solutions using AWS. For them, this was not a bad choice as they captured new markets, for example. However, this created almost instant legacy. Many of our customers now want help leaving AWS for Google’s Cloud, mainly because the migration itself offers an opportunity to tidy things up. Google’s support suits of a lot our customers, too.*
  • The Skunk Works. Even though a centralised platform team is a pattern for success, especially when the platform team understands that a large part of its job is dissemination, the Skunk Works team is not. It’s an anti-pattern that actually creates the false sense of security that causes so many Day 2 problems.
  • The Cloud Native Journey. The details are different for all companies, but the structure of the transformation is starting to emerge. It looks something like the Hero’s Journey, which is not surprising considering how our minds organise information. This is not to say it’s not real, but it is to say that Grecian myths like Cloud Native myths must be used as inspiration and nearly always taken with a pinch of salt. In the real world, this stuff is hard.

* Container Solutions are vendor neutral, which is one of the strongest selling points. I mention AWS and Google here as an example of what we’re hearing right now.

Acknowledgement
I first heard the term ‘Day 2 problem’ at a talk given by Ben Hindman from Mesosphere. He called them ‘Day 2 ops problems’. You can jump over to Mesosphere’s blog where there’s loads of good stuff about this.

Want to learn more about how to move fast without breaking everything? Download our whitepaper below:

 

Comments
Leave your Comment