Cloud Native Blog - Container Solutions

These Companies Keep Running Thousands of Failed Experiments, and You Should Too

Written by Charles Humble | Jan 9, 2023 3:04:33 PM

Twenty-odd years ago most big companies would run just a handful of experiments each year. These days, firms like Amazon, Google, Intuit and Netflix run thousands. But how are they able to do this? And why do they do so when so many of their experiments fail?

We’ve said multiple times on WTF is Cloud Native that a Cloud Transformation is as much about culture as it is about technology, perhaps more so. In essence what we mean when we talk about a Cloud Native organisation is that it is free to experiment, able to move quickly without breaking everything, and capable of rapidly getting ideas to market and in front of customers.

Broadly speaking, the cloud can be seen as an enabler in this context. The first step was to free us from procurement. Pre-cloud, new projects needed compute resources to be purchased and installed in a data centre—a process that was often convoluted and time consuming. I remember, when working on a project for a large DIY retailer, being kept awake at night worrying that I’d sized a system wrong, and that it would fail as soon as it was placed under greater than anticipated load. The elasticity of cloud removed the need for much of this up-front work, and is particularly useful for any sort of “bursty” workload, such as those in eCommerce.

The next step was the ability to deploy code quickly. Right now, the DORA research suggests that a medium performing organisation is able to release code to production between once per week and once per month, and the higher performers can do this on-demand as often as required (multiple deploys per day).

Speed of deployment is typically achieved through a combination of high levels of automation, and by designing and building systems as small, independently deployable units. These are often microservices, FaaS functions, or some combination of the two, but they don’t necessarily have to be: indeed some organisations are even choosing to return to a monolithic architecture having considered the trade offs.

The true benefit of being Cloud Native though comes not from a technology transformation, but rather from an organisational one. But organisational transformations turn out to be really hard.

Over the course of my career I’ve been involved, either directly or indirectly, in organisational transformations at five separate companies, of which three didn’t work as intended.

One was properly disastrous, failing because senior leadership imposed a top-down matrix structure that bore little resemblance to reality, and taking up so much time and resources that the company was unable to continue day to day operations. The second failed because the organisation had an innate elasticity, returning like a spring to its original shape when management attention was elsewhere. The last one failed because too few people were persuaded of the benefits of the change, and actively worked to sabotage it. People are hard!

So what about the successful transformations?

Moving in lock-step

I’m extrapolating from very limited data here, but both started by having the whole organisation shift to the same planning rhythm, with short iterations and no exceptions.

Individual departments will often object to this—“we naturally move in a different rhythm to the rest of the organisation because we’re unique”—but in truth what the rhythm is turns out to be less important than having one that everyone adheres to.

Typically you’ll have three rhythms. The first will be an annual cycle which is broadly set around financial and budgetary planning. The second will likely be a quarterly planning cycle where the various teams come together and make commitments as to what they are going to deliver. You then may also have shorter cycles, something like a Scrum sprint planning rhythm of timeboxed cycles of 1-4 weeks duration.

Having everyone moving in the same, short, rhythm means that no-one has to wait too long for a task to be carried out, and shorter cycles enable more rapid feedback.

But a third, and perhaps more subtle benefit is that it can liberate you from being tied to the annual budgetary cycle. This matters because being free to experiment requires that product teams can openly try out new ideas and technology without long delays for approval and funding. One Container Solutions client, for example, works in a three-month cycle, with funding for teams set on the same cadence. This means that if they need more resources in a particular area they can spin up a new team or expand an existing one at short notice.

As Jez Humble, Joanne Molesky, and Barry O'Reilly state in their book “Lean Enterprise”, “The use of the traditional annual fiscal cycle to determine resource allocation encourages a culture that thwarts our ability to experiment and innovate. It perpetuates spending on wasteful activities and ideas that are unlikely to deliver value.”

Being able to run many experiments requires that they are cheap to execute. As Jeff Bezos told “The Innovator’s DNA” coauthors Jeff Dyer, Hal Gregersen, and Clayton M. Christensen, “We’ve tried to reduce the cost of doing experiments so that we can do more of them. If you can increase the number of experiments you try from a hundred to a thousand, you dramatically increase the number of innovations you produce.”

But reducing cost, though critical, is only part of the equation.

Give employees autonomy

After budgetary cycles, the second most common reason I’ve seen for a lack of innovation in an organisation is a lack of autonomy, whether perceived or actual. In Drive, Dan Pink noted that autonomy, along with mastery and purpose, was one of the three main factors for employee happiness. This claim has been backed up by more recent research by Daniel Wheatley and his team, who found after studying data from 20,000 UK employees, that those who reported higher levels of autonomy in their work or workplace culture were happier with their jobs.

It’s perhaps worth saying that this works in the small as well, meaning that as individual managers we can push company culture in this direction from below. Early on in my management career I had a young report who had proven to be unreliable and, as a result, more and more responsibility and autonomy had been taken away from him. I decided instead to give him responsibility for a piece of work that actually mattered, and was genuinely complex. The result was that he became more engaged, started turning up to work on time, solved the problem I had given him, and then began to deliver work of a consistently high standard. I’ve repeated the trick many times since, and it is yet to fail me.

A common anti-pattern, particularly in smaller organisations, is where every decision has to be referred to the founder. This is a particularly difficult problem to overcome. Founders often get very attached to their companies, as the use of parenting language like “my baby” attests. In this context, it is perhaps not entirely surprising that there is a certain reluctance to let go.

If you find yourself in this situation there are a couple of techniques that can help. One method is simply to convince the founder that your idea was their idea in the first place. This does get wearing over time, but as a short-term approach it can pay dividends and let you make progress. Another, particularly if experiments can be designed to be extremely low cost, is just to run them and come back with data when you’ve got evidence to support what you want to do; most people can be persuaded with evidence, and it is often easier to ask for forgiveness than gain permission.

For a larger bet I’ve found using Wardley Mapping can depersonalise discussion, because the argument shifts to the map is wrong rather than the individual is wrong. I’m a big fan of mapping for this reason.

Create an environment where it is safe to fail

Thomas Edison took so many missteps that he once reportedly said, “I have not failed 10,000 times—I’ve successfully found 10,000 ways that will not work.”

One paper that reviewed experiments’ success rates found that less than 50% of those conducted at Amazon, Microsoft, and other software companies actually improved the metrics they were designed to improve. My own personal experience is broadly in-line with this—about 50% of all the experiments I’ve tried at work haven’t done what I expected. Quite often, though, they’ve told me something useful, even if it wasn’t the result I initially wanted.

This in turn means that as a learning organisation you have to be relaxed about failure, perhaps even celebrate it. As a small example, at Container Solutions we have an internal Slack channel called “failure-announcements” where everyone in the company can post examples of times when they’ve messed up. We also invest heavily in psychological safety.

But, to be clear, failure isn’t helpful if you don’t learn. Learning needs to be at the core of every Cloud Native organisation. This means that you need to strongly encourage employees to be curious—talk to customers, explore the impact of the work you do, ask why something isn’t doing what you expected.

So many organisations fail slowly. They either stifle innovation by creating an environment in which it isn’t safe to fail, or they don’t encourage curiosity, never checking to see if their products do anything meaningful. It's a very expensive way to kill a business. What we want is an environment where people can fail quickly and quietly at a low cost.

Seek feedback

It is likewise critical to create an environment in which it is safe for employees to provide constructive and effective feedback.

Feedback doesn’t happen naturally without facilitation and encouragement, so team leads should be trained on how to give and receive feedback, and facilitate it regularly with team members. It then needs either to be acted on or, if not, reasons provided as to why.

As well as collecting ongoing employee input, the reporting and analysis of customer feedback should be part of a regular rhythm. Teresa Torres' definition of continuous discovery is that the entire product team has a touchpoint with the customer at least once a week. Verne Harnish, in Mastering the Rockefeller Habits, extends that to executives, recommending that:

"All executives (and middle managers) have a 4Q conversation with at least one end user weekly.
  • The insights from customer conversations are shared at the weekly executive team meeting.
  • All employees are involved in collecting customer data.
  • A mid-management team is responsible for the process of closing the loop on all customer feedback."

My own view is that, as far as possible, everyone in the company should be encouraged to have customer conversations and share feedback.

Finally, in a recession, training is often the first thing that gets cut: but it shouldn’t be. Leaders and managers must invest in employees’ development, creating conditions to support people working together to continuously improve processes, knowledge, and the value delivered to customers.

In high-performing organisations, employees take pride in their work, with managers and leaders supporting employees in pursuit of the organisation’s goals. Change, improvement, and development are habitual in a truly Cloud Native organisation.