Cloud Native Blog - Container Solutions

Podcast: Sarah Wells on The Role of a Tech Director, Internal Tech Conferences, and Developer Enablement at the FT

Written by Charles Humble | Aug 7, 2022 3:08:57 PM

Charles Humble talks to Sarah Wells, Former Tech Director for Engineering Enablement at the Financial Times. They discuss the role of the tech director, explore what Developer Enablement (AKSA as Developer experience/DevEX and Developer Productivity) is, examples of DevEx tooling, running an internal tech conference, her upcoming book and reading recommendations.

Subscribe: Amazon MusicApple Podcasts | Google Podcasts | Spotify

About the interviewee

Sarah is a technology leader, consultant and conference speaker with a focus on microservices, engineering enablement, observability and devops. She has over 20 years experience as a developer, principal engineer and tech director across product, platform, SRE and devops teams.

Sarah spent over a decade at the Financial Times, leading as it transformed into a true cloud native organisation, releasing code 250 times as often and embracing autonomous empowered teams.

She is currently writing a book for O'Reilly, on Enabling Microservice Success. This covers the technical, cultural and organisational challenges you need to meet to get the most out of a microservice-based architecture.

Resources mentioned

Team Topologies by Matthew Skelton and Manuel Pais
WTF is Alert Fatigue by Sarah Wells
Building Microservices, 2nd Edition by Sam Newman
Accelerate by Nicole Forsgren Jez Humble and Gene Kim
The Manager's Path by Camille Fournier
Checklist Manifesto by Atul Gawande

Full transcript

Introduction

Charles Humble:

Hello, and welcome to Hacking the Org, the podcast from the WTF is Cloud Native? Team here at Container Solutions. I'm Charles Humble, Container Solutions Editor in Chief, and today I'm joined by Sarah Wells. Sarah is a technology leader, consultant and conference speaker, with a focus on microservices, engineering enablement, observability and DevOps. She has over 20 years experience as a developer, principal engineer and tech director across product platform, SRE and DevOps teams. She spent over a decade at the Financial Times, leading as it transformed into a truly Cloud Native organisation, releasing code some 250 times as often and embracing autonomous empowered teams. And she's currently writing a book for O'Reilly on enabling microservice success. Sarah, welcome to the show.

Sarah Wells:

Hi, it's great to be on the show. Thanks for inviting me.

Charles Humble:

I have to ask you since you were at the FT, which is obviously a newspaper, when you left, did they do a front page for you?

Sarah Wells:

They did. And it is honestly one of the best perks for when you leave the FT is that someone will write a lot of stories, and actually mine was printed on proper FT paper, it actually got printed at the presses. So I have a couple of sheets of that. Yeah.

Charles Humble:

Oh, that's really, really nice.

Sarah Wells:

There are two things that are really nice about working for a newspaper in terms of just things you don't expect. One is the homepage. The other is you never buy any wrapping paper because every present that's ever given to anybody's just wrapped in FT paper.

How did you get the role of tech director and how did you find it when you got there?

Charles Humble:

Absolutely fantastic. So I said in my intro that you'd moved from developer to principal engineer, to tech director, and I really wanted to focus on that last transition, the principal engineer to tech director one, partly because it's quite a big step. And I think it's one of those where the nature of the role maybe changes quite a lot. And I'm just curious to know, I think lots of people who listen to this podcast are probably on that sort of manager's promotion path. So how did you get the role and how did you find it when you got there?

Sarah Wells:

I agree that it's quite a different thing to be doing. And I think generally speaking, principal engineer is an individual contributor role. At the FT, it wasn't quite. So the role that I had as a principal engineer, I was leading the content publishing platform so I did have some line management responsibility and I was doing something that probably looked like a miniature tech director role. So I was extremely lucky there that the way that part of the organisation worked, I got a lot of that exposure towards things that would help me with that step up. Because well, a sort of standard principal engineer role, you are mostly thinking about things like technical strategy. And the step is to think more about organisational sides of things as well, I think when you move to tech director. I already got a bit of exposure with that.

Sarah Wells:

But the way that it happened for me was that there was a proposal that was sent around for how we were going to do out of hours support at the FT, by one of the existing directors. And my team had been doing out of hours support and working out how best to do it for a while. And I saw the proposal and I had quite a lot of feedback. I had so much feedback that I stopped putting comments on the Google doc and just created my own proposal. What I didn't realise was that there was already some thinking about having a role as a tech director for operations and reliability. And so almost accidentally, I definitely put my hat in the ring for that by just being very obviously interested in some of the problems that were looking to be solved in that area. And the learning how to operate and run a team that's got a new system built out of microservices, put me in a really good position to step up to be a tech director, that was looking to solve some of these problems across the FT.

What was it like to introduce a DevOps culture into the FT?

Charles Humble:

It's a bit of a sort of sidebar in the conversation, but I remember you talking about some of the problems of shifting to a kind of DevOps, "you build it, you run it" culture in an established organisation. So in an organisation with an established IT team. And I remember you making the point that it's not necessarily practical for everybody to be on call. I might have young children, or I might be a carer for an elderly relative, or there might be other things going on in my life. And I think it's one thing if I've joined Netflix or something, and I know in advance that's the culture, but I was interested in how you found introducing that culture into an existing organisation like the FT.

Sarah Wells:

Yeah. I think if you're going to expect developers to be doing some level of escalation of problems with their system, and I think you should. If you're building microservices in particular, but I think in general, you build differently if you know that you ultimately are going to have to support it. I think if you're going to do that, you have to work out a way to allow people to step away from it either permanently or temporarily, depending on their own needs. You have a problem if you can't muster a rota. At the FT, we generally managed it, but sometimes you'd have to step back and say, we don't have a lot of people that are willing to do support for this. Does that mean there is a more significant problem? And certainly with my own group within content publishing, when we were looking to start doing out of hours, we asked all of the developers what made them hesitant to sign up to do this on a best endeavours basis.

Sarah Wells:

And a lot of the things that were worrying people were architectural. So there was a sense of we're using this particular data store that is really hard to support and that we don't think is the right solution for us. So in fact, we did make a change in what data store we used so that people felt more comfortable to be able to support it in case things went wrong. So I think it helps to not make it mandatory because it helps you identify where you've got something that people do not want support. And I think you can only do a best endeavours type out of hours rota if people don't get called very often.

Can you describe the FT’s architecture?

Charles Humble:

Yeah. I really agree with that. And actually think it's a really important thing to keep an eye on. Now you mentioned the data store there, which kind of gives us a little bit of a hint of the FT's architecture. I tend to think of the FT as being very Java oriented, but I presume as you became more Cloud Native, it became more sort of architecturally diverse. Can you tell us a bit about that?

Sarah Wells:

Well, it's very diverse. So programming language, there are lots of teams working in Node. There are people using Go. So Go tends to be used more for backend services, so APIs. There's a fair amount of Python that tends to be for teams that are doing more infrastructure stuff. There's very little Java. I don't think there's any real active Java development, but there may be a little, but there's still Java running in production. So some of our microservices architectures, some of the FT's microservices architectures were originally written in Java. So some of that still exists. It's generally AWS based, but there are some things on other clouds. There are lots of applications at the FT that are running on Heroku. So handing off a lot of responsibility to Heroku rather than having to build everything from scratch. So it's pretty diverse, which was an interesting challenge when you're trying to support and build tools for all of those teams.

What prompted you to move into Developer Enablement?

Charles Humble:

And that actually gives us a really useful segue into the next thing that I wanted to talk about, which is you moved from your tech director role into the area of developer enablement, engineering enablement. What was it that prompted that move? Was it something you wanted to get into?

Sarah Wells:

It was more that I already had several teams that were in my group who built things for engineering teams. We just added some other teams so that ultimately all of the teams that built things for other engineers were part of one group that I was the tech director of. Because if you're going to try, I think, to do developer productivity tooling, to do engineering enablement, it helps to have a common approach to everything. So having everybody as part of one group meant we could decide where we should focus efforts and how we were going to do it. So for example, we could say, we want to put all the documentation in a single place so that everyone can find it. I know that sounds obvious, but actually over time, you end up with documentation all over the place. You'll find some people are using one format and some people are using something different. We wanted to come up with something that was common. So you could go and run one search and one set of documents and find all the information that you needed.

Sarah Wells:

So that was the sense behind that. So there was an existing group at the FT that were a mixture of teams that did things for engineers and teams that did things for general FT staff. So we split those out, and I think that gave both groups a real sense of who their stakeholders were.

Can you define what Developer Enableemnt is?

Charles Humble:

We should maybe pause a little bit and just clarify something on the terminology, because we're talking about engineering enablement, but there are other terms that get used a lot, like sort of developer productivity or developer experience/DevEx. Are those terms interchangeable? Do they mean the same thing? And also, what do you mean when you talk about engineering enablement? What actually is that?

Sarah Wells:

So I think of them as being pretty similar, I think I would use them pretty interchangeably. I'm thinking of it in terms of where you are writing tools and platforms for product development teams to do things. Certainly I think if you're within an organisation, I would probably be talking about engineering enablement. Whereas maybe if I was a company building tools that I would then sell to people, I might say developer productivity, but I do think it's all the same thing. It's a recognition that while we can move quite fast with something like a microservices architecture, you want to try and provide a foundation for everybody that's building services so that you're not having to repeat the same work in lots of places.

Charles Humble:

Yes. And I think also, maybe reducing the sort of small friction points. The, oh, it takes me two minutes to spin up a cluster and if we could speed that up, I wouldn't keep getting broken out of flow state, or, I mean, you mentioned documentation earlier, which is a classic. Going and hunting for the, where's the API documentation for this. And have I found the right one? And is it up to date? And all of that sort of stuff. So I think there's definitely a part of it, which is just removing those sort of little friction points. It's not terribly glamorous work, but it is kind of important, I think.

Sarah Wells:

Yeah. I think it's basically, if you want your product development teams to be delivering business value, you want there to be a flow of new feature work, then you want to make most of the stuff that they do, something they can do without having to coordinate with an external team. So if you look at say Team Topologies, which I think is a great book, talking about how teams work with each other, you'd really want to make most things self-service, well documented, build interesting tools. And actually that can be quite cool to think about how do I solve this problem so that other teams can do what they need while we still maintain the level of control over risk and cost that as a company we want? So how far do you let people go ahead with things?

Sarah Wells:

So, one thing I really liked at the FT was the team that supports DNS at the FT wrote that we migrated to a new DNS supplier three or four years ago. And they went for an Infrastructure as Code solution. So to make changes to DNS, you basically make changes in a repo, and you raise a PR and that team approves the pull request. But they talked to their customers and said, what do you like? What don't you like? And people said, well, that's really great, but I'm quite often waiting for someone to approve that and it's a relatively low risk change. So they analysed all of the pull requests that they'd had, like 150, and they identified certain patterns where things would always get approved without any kind of discussion. And they looked at creating rules that would automatically approve those, so that it really sped up the process because they looked at things that were very low risk.

Sarah Wells:

You're deleting the whole of a section of DNS configuration, or you're adding a whole new section, or they could identify things that might have a security implication, and immediately route it to someone in the security team so that they could immediately approve it. And I love that because it's really going beyond the infrastructure's code into something where you're thinking about how do you speed people up without incurring too much risk. So 80% of the time, it's fine. 20% of the time, you're going to have to talk to the team to explain what you're trying to do and make sure that it's not risky.

How did you go about setting up the developer enablement team at the FT?

Charles Humble:

How did you go about setting up the developer enablement team at the FT?

Sarah Wells:

We had loads of teams that were working in that area. And when we put the group together, we took the chance to look at some of those things where I think in any organisation, after a while, you have things that are not quite in the right place because of historical reasons. So when you try and say to each team, what is it that you do? You might find that they say, well, we do this, but we've also got these three other services that, for some reason, we own. So we did a bit of moving things around, but we tried to make sure that every team that we had had clear responsibility for something that you could explain why that is a single thing. So there was a bit of that, quite a few different people that worked with me did some analysis of what you need to do to provide a platform to people. And we compared what everyone had put together and thought, are there any gaps? Are there any gaps where we're not solving particular problems as well?

Sarah Wells:

And then once we'd formed those teams, we gave them a lot of freedom to work out what problem they wanted to solve. So encouraged teams to speak to their customers and work out where there was a problem. I was very keen because we'd formed this new group to try and tackle some of the things that had been difficult previously because they weren't in the same part of the organisation. Sometimes you need lots of teams to work together, to solve a problem. And if they're in different groups, they may not have the same priorities. That was where I could say, well, this is going to be a priority for us. So we were looking for example, at key rotation, we had a policy of rotating keys automatically every so often. It felt for a lot of product teams that that was a chore. And also it was surprisingly often linked to production issues. Because you would discover that someone had rotated a key, but they didn't know the key was also being used somewhere else and they hadn't rotated that version of the key. And when they disabled the old key, something else broke.

Sarah Wells:

So we looked at how can we make that into a much smoother process where people are notified that this is happening, where we can maybe put a message on a queue that they can then consume and automate that whole process.

Can you give me some other examples of the tools that you built as part of dev enablement?

Charles Humble:

Can you give me some other examples of maybe some of the tools that you built as part of dev enablement? Just kind of give us a picture of the kind of things you were doing.

Sarah Wells:

I would say the one thing you do is you want to give people the ability to see how they're doing on something. So you want to give them insight. And where you have guardrails that you want all the teams to do, it's good if you can show people how well they're doing on that. So we had one relatively early tool, it's been around for a few years now that we built was called SOS, the System Operability Score. But basically we were looking to improve the quality of our runbooks. So when something goes wrong on a system, is there a decent runbook that explains to people where they can find the code, where they can find the logs, what kind of troubleshooting they could do? So this was something where we scored the various fields that were available. And we already had a system called BizOps that was effectively a system registry and a graph of information about all of our systems, which is a fundamental thing where you have microservices, you need to know what you have and who owns it.

Sarah Wells:

But SOS would look at the runbook information and score it automatically. And we could aggregate that so we could show how well a team was doing for all the systems they owned and how well different groups of developers were doing. So we thought this was a little bit of gamification and that that could be quite good. And we did find that some teams got very competitive and they liked it, they wanted to get up to 100%, coverage and they wanted to tell other teams they were doing better than them. So that was one aspect.

Sarah Wells:

But the other thing we hadn't realised until we'd created this system was that if you have an organisation, as the FT was doing, that was using objectives and key results to plan work, key results need to be measurable. So we had just given everyone an easy way to measure that they were improving an aspect of being able to operate their systems. So product teams would say, we're going to improve our SOS score by 25% this quarter. So encourage people, it showed them where they could make an impact and encouraged them to do that. So I think insight is generally really useful, but I think you have to be very cautious not to try and roll up too much data into one score because it may not make sense. It may not make sense to try and aggregate too much information, but I think being able to show lots of different aspects so people can see how they're doing is really good. So that's one example.

Sarah Wells:

So I think things that nudge you to comply with the things we expect you to do and things that allow you as a team to see how well you're doing on something are really useful. And then self-service tooling for solving the common problems that you have is I think a really important aspect. And I like the idea that you take a Unix style philosophy of lots of small tools that you can compose together to solve a problem. Because it means that people don't have to use everything, but they can use lots of different parts as they need it.

What do you think is important when you are building a paved road?

Charles Humble:

And then with that, I think you are sort of getting towards this idea of a paved road or a golden path. These are terms, I think that Netflix kind of popularised. What do you think is important when you are building a paved road?

Sarah Wells:

Yeah. So I think the thing for me about doing a paved road or a golden path is it's different from the older approach of thinking about building a platform in that it's not, here is your big platform that you're going to deploy your code to. It's more, here is a whole bunch of stuff that will help you to move fast, but it's not mandatory, and we have an approach if you want to go off road. So the idea is that you pave this road, it's very easy to walk along the paved road, it will do everything for you, but you can go off road. It's just you're going to have to hack through the jungle, which means it will be harder work for you. And you might choose to do that, but you should understand that there are things you're going to do.

Sarah Wells:

So the idea is that if you're on the paved road, it should just do everything that's needed. But if you decide I'm not going to deploy to AWS using the tools that exist, you're going to have to make sure that there is log aggregation, and that there's monitoring, and that we are conscious of security and any other guardrail that might exist. So I like the idea with a paved road that you've defined what it means to go off road. And I like that it isn't mandatory because I think it makes the teams that are paving the road, think about whether they are building something customers want to use.

How do you drive adoption of the tools that you're building if adoption isn't being mandated?

Charles Humble:

But then how do you drive adoption of the tools that you're building if adoption isn't being mandated?

Sarah Wells:

I think by building something that solves people's problems. So if what you're building doesn't make it easier for people to make progress in delivering business value and they don't use it, then there is a question of why you're building it. Within a company, you should be able to build something that is truly useful for your customers, because they're right there for you to talk to and you only need to solve a problem for that company and that organisation. So if you think about all those teams that are out there, building developer tooling that they're selling to people, they've got to try and find market fit across a range of companies. But if I'm building something at the FT, I can build something that's very custom to what I already know exists at the FT. Trivial example, I would only need to consider supporting source control using GitHub because that's what they happen to have in place.

Sarah Wells:

So I think you should be able to build something that is useful for your customers, but you do have to have a product mindset in those teams that means you understand where the problems are, and you need to also have the appreciation that people can say, oh yeah, that sounds good. But unless you have a commitment that they're actually going to use the thing you build, then you are maybe not solving their problem. I think it's easy to say, yeah, that sounds okay. What you really want is for someone to say, that sounds great, and we're committing to spending a month adopting it next quarter or this quarter, even better.

What's the actual process when you are deciding what tool to build?

Charles Humble:

What's the actual process when you are deciding what tool to build? Do the developers raise that up to you? Do you sit down and have interviews with the various developers who are your customers and try and figure out where their pain points are and what you build? How do you go about that?

Sarah Wells:

So it's a mixture of things. So sometimes the principal engineers or people on the team within my group would say, I think this is a problem and we should look into it. And we also could do it. There's always an element of how long would it take us to solve a particular problem? Is there something we think is valuable that's actually not also going to take us that long? We should maybe look at that. We talk to our customer teams a lot, at lots of different levels. So quite a lot of the other teams at the FT would have developer huddles or some kind of place where developers are talking about problems. If we could go along to those periodically, we could hear the things people were struggling with and think, oh, will that be something that we could solve? When you start to get a reputation for being willing to listen, people will come and tell you this problem is big.

Sarah Wells:

Also you can look at what people are doing within their individual development teams and see whether that's something that you could adopt and make more available, more widespread because a team might build a really good tool. It's extremely hard for that team to then try and make it generic so everyone could use it. But you as a central engineering enablement team could do that. So we'd look for things people liked, and that they had built, and that they thought were good and we could also look at making those more general.

What do you think is the right time to introduce an engineering enablement function into an organisation?

Charles Humble:

If we could step up a level, what do you think is the right time to introduce an engineering enablement function into an organisation?

Sarah Wells:

I think it's about where you see lots of people trying to solve the same problems. So I think even within a group of people working on a particular system, you'll find that there are some people trying to build the more infrastructure and supportive stuff underneath it. And so maybe if you're building a monolith and you've got a few teams, there'll be a few people building the tooling for the build pipelines and various interactions with vendors, et cetera. As that starts to grow, you'll have some things where you don't want to have to solve it in every single group. So you don't want to have to solve interacting with DNS in every group. So you naturally will have some teams that exist that support other parts of the organisation. And at that point, those teams are doing something that enables engineering.

Sarah Wells:

I've only really got experience of this in a 250 developer organisation. But I just think once you get to the point where you have enough people working, that it makes sense to abstract out some of the requirements and give it to some other team. You're starting to think about engineering enablement, even if you don't have a full blown group that are working on it. But I think once you get to a couple of hundred people, once you are building microservices and you are big enough that people don't know what other teams are doing, it's useful to have those common supporting functions.

Why did you decide to introduce an internal developer conference?

Charles Humble:

Just thinking back over your time at the FT, another thing that you did was you helped to introduce their internal developer conference. So I was interested to know how you got started with that. What prompted you to set it up and what that experience has been like? What have you found from doing it?

Sarah Wells:

Probably seven or eight years ago, I went to a conference with a few other principal engineers from the FT. And it was really interesting because when I first started going to conferences from the FT, we were not ahead of the game. So we often went to a conference and thought, wow, this is really interesting. I wonder if we could use this technology. But for the first time we went to a conference and people were talking about stuff and we thought, oh, well actually it's about microservices and we sort of know that already. And then also I realised that I got the most value from talking to other principal engineers at the FT that I didn't speak to day to day. So the hallway track, but only other FT engineers was so valuable. So myself and a colleague, Rob Godfrey said, we should just have an internal conference.

Sarah Wells:

It would be good because we could share what we all know. And also it probably would help teams to understand the constraints that other teams are under. And we suggested it to the CTO at the time, who just said, yeah, fine. Let's do it in three weeks time. So we said, could we have a bit longer to organise that? And then we had a few months and we formed a group of mostly principal engineers and delivery people. And we did our first internal conference and the FT's had one every year since then, they've been very different, they are quite reflective of what the CIO or the CTO are looking to focus on. And so the first one was very much panel based. We didn't think we had enough people who would want to stand up and talk, panels are less intimidating, we had them on various topics, and would speak about a particular topic and discuss it.

Sarah Wells:

Into the latter day conferences, it's been more of a mix of lightning talks, longer talks, we've done unconference sessions. They've been very popular. That's where we have an open slot where anyone can suggest a topic. Anyone can choose to go and join the discussion on that topic. It's done under Chatham House rules. So you are not going to quote anyone in particular, but you get some really interesting discussions about things.

Sarah Wells:

What I like about internal conferences is it's a chance to recognize that you've got lots of people who've really interesting things to say. It's also a chance for me as a leader to set a tone about things. So I would definitely have things in mind. So a couple years ago, we ran our first fully online version because of the pandemic, which actually worked really well. The first hour was people telling stories about things going wrong in production. And I wanted to tell those kind of stories, because I think it's a really high energy, they're fun, get everyone started. But also, as the head of operations at the FT, responsibility for that, I wanted people to understand that it's fine. Things go wrong. And generally speaking, people will help out and it will all get fixed. So it's sending the message that this is normal. It's normal. I want you to tell me if you think something's broken.

Charles Humble:

I'm a big fan of that actually. I'm a big fan of sort of celebrating failure effectively because if you want to be a learning organisation, if you want to have people that are experimenting and trying things that might be amazing. But there's no point in running experiments if they don't sometimes blow up. Sometimes it's not going to work the way you think, and we all make stupid mistakes. We've all done the, typing the wrong thing into the wrong Unix command line window or something and accidentally shutting a cluster down or deleting something. I mean, I've done many of those in my career and I'm sure you have. It's not uncommon, but it's something I've noticed at public conferences, not so much internal ones, that people don't talk about that stuff. They don't say, and it all went horribly wrong and I did this stupid thing. And I think it's a shame. It's like we give the impression that we're all sort of flawless and perfect and set unreasonable expectations. And actually, many of the things that I've learned the most from in my career were where I messed up in some way.

Sarah Wells:

It's the standard thing really is that operationally, no one's doing this deliberately. If someone drops a table in production, that's because you don't have enough guardrails in place to stop them from making that mistake. If you are a junior or mid-level engineer and you drop a production database, what you want is a senior developer to go "Great, this is now going to be a great example to show you how we restore our data from backup."

Sarah Wells:

So first of all, just to wrap up the internal conference, I think an internal conference is a really good place to demonstrate the culture that you want people to have and to believe. On the operational side of things, I feel really strongly that you want people to understand that everyone does this and that we're not about trying to blame someone for a problem. We're about fixing the problem and then working out how we can protect people so that it's not possible to make that mistake again.

Sarah Wells:

When I took over operations at the FT, I wanted really to stop having any metrics that were about the number of incidents opened. I want people to say there's a potential problem. I don't even know if it's a problem yet, but I'll tell you about it because I know that it's not going to impact our bonus if we go over a certain number of incidents or anything like that. I think convincing people that you can tell us when something's gone wrong and that everyone makes mistakes, and that actually production incidents initially at least are a lot of cases of people going, not really sure what's going on here, but I'm going to go and have a look at this thing and see whether it's related.

Charles Humble:

That's a fantastic insight, I think. When you are measuring something and using that as a way of measuring team performance or maybe bonuses or something. It's so easy to end up with unintended consequences or perverse incentives. If you measure the number of instance in production, for example, and then something happens in production, it's very tempting for the engineer to go, I'll just quietly fix it. And I won't log it because that way it won't show on the production instance metrics board or whatever it is.

Sarah Wells:

I also think that we're not very good at thinking about the statistics of it. I just never thought at the FT that one month having two more incidents than the previous month is anything more than statistical fluctuations generally. So I don't really see the value. When we spoke to say, well, there should be some metric. You should have something other than a gut feel that says we're in a worse state in terms of our ability to run systems in production. The one that I thought was interesting was how many times have we had to phone someone when they're not at work. Because I think that means that something went wrong, and the first line support couldn't fix it based with the information that they've got, and we had to phone someone up and disturb them. I think that's an interesting thing to track.

Sarah Wells:

I would prefer not to have a real sense of, we must fix it within X minutes either because I don't think that necessarily helps you to have that sense of pressure of, oh my God, if I don't fix this in half an hour, this instant will have counted as a big failure. I think everyone's always trying to do it. What I think you want is that someone is managing the incident, doing communication and encouraging developers to think about what they could do that might mitigate, because my experience is developers like to understand what's gone wrong and fix it. But sometimes there's a mitigation you could do without having understood the whole of everything. Maybe you go, okay. We think maybe there's a problem in the EU region, lets failover to run out of the US. And sometimes that will fix your problem while you work out what's going on. And sometimes it won't. But it probably won't harm it and there's things like that.

Sarah Wells:

So I could talk a lot about operational stuff, because I just found it really interesting to try and focus on what behaviour do we want and how do we want people to feel? One thing that Anna Shipman's group at the FT introduced, which I really liked was the idea of shadowing. So people who were not yet confident that they would run out of hours support, like they didn't want to be escalated to, if something went wrong, could join a shadow rota so that when people were contacted out of hours, they would also contact someone on the shadow rota and they could join with no pressure to be the person trying to fix a problem. And people really found that reassuring to watch how other people approached solving the problem without feeling any pressure that they needed to do it. So we'd open incident calls and channels to anyone who wanted to join in so they could see what was being discussed, so they could see what was happening and they could learn about it. And I think that's really reassuring for people.

What was it like running the conference on-line?

Charles Humble:

Yeah, absolutely. I want to quickly jump back to something you said earlier about the fact that you'd moved the tech conference online as part of the pandemic. What was that experience like? How did that go for you?

Sarah Wells:

Well, I really liked it. I really liked doing it online. And I think we tried when we were putting that one together to think about how we used it to give people a chance to talk to people who are not in their direct teams. Because one of the things for the pandemic is I still saw everybody in my team, but I didn't really see the people outside of my team so much. And I thought it'd be really nice to allow people to do that kind of interaction. So that was, I think ... Certainly we went back to having the unconference sessions that particular time.

Sarah Wells:

So what was good about it was the constraints of space. So you can run multiple sessions in parallel because you don't have to book the conference suite at the FT and put people in that. There was a nice culture that had developed in product and technology where any time that someone was doing a product and tech briefing, there would be a lot of chat in the sidebar and you could expect that. And that actually felt really nice. And we encouraged that, encouraged people to kind of comment and chip in as well. I think it was nice.

Charles Humble:

It's really interesting you say this about interactivity. So as you would know, Container Solutions runs webinars, and indeed, you did one on alert fatigue, which we'll link to in the show notes because it's a fantastic episode. And we also run a couple of conferences. WTF is SRE? And WTF is Cloud Native? And so far, all of those events, at least in my time have been run online and we get really good interactivity at those, but I've also done other online events where that hasn't been true. And it's sort of like sitting in a room, talking on your own. I don't know what the magic is that my boss, Carla, and her team on the events side are doing in terms of making that interactivity happen. But it would be lovely to know. Do you have any thoughts?

Sarah Wells:

Yes. I think that it helps when you're in an organisation where people know each other. And also it helps where you have some people who will always take the lead. There are people that I work with at the FT, who I could guarantee they will chip into a chat, they'll encourage other people to talk and they just bring that engagement. So if other people were thinking, they wanted to say something, but they're a bit intimidated, they'll do it because someone else has now already spoken about something. And I think a few people incorporated stuff in their talks that encouraged people to chip in as well.

Sarah Wells:

I've been on several online talks where it is difficult to get sometimes a lot of interaction. And it's hard to get that feeling as someone who's speaking, whether people are engaged with what you're saying or not. It's particularly difficult, I think where you're not seeing anybody's face. And sometimes it's fine for people to be in an online conference and not to turn on their video, but it can be really difficult when you're speaking to know whether you are just saying something that everyone's finding really boring. So it's nice to get that feedback and yeah, I like that. And that's definitely the case.

Charles Humble:

Yeah. I found that the small number of online talks that I've given, I find it really hard because I will adapt what I'm saying based on the response I'm getting in the room. Like, okay, I think I've lost them there. Right. I better give you a bit more context. Or oh, okay. That got laugh. Maybe we'll dial the humour level up a little bit. So whatever it is, and you sort of feed off the audience a bit. And I found certainly for me, I got none of that doing online and I found that really quite hard actually.

Sarah Wells:

I think it is really hard. I really appreciate it when there are a few people who turn their camera on and are visibly reacting. That is really lovely. And I think if you're attending an online conference, and you are comfortable to put your video on, and you want to nod, you can make a big difference to the energy of the talk. But I do this in real life conferences. If someone I know is giving a talk for the first time, I'll be in the front row, smiling and nodding the whole way through, because I think it just makes it feel less intimidating when you know there's someone there interacting and looking at you and going, yeah, this is great.

Charles Humble:

Yes, absolutely. One of the first sort of big talks that I gave, it wasn't a keynote, but it was on the keynote stage in front of 1,000 people. And I had someone I knew from the conference circuit sitting in the front row who has one of those very sort of loud laughs. And I said something and she kind of guffawed and, you feel your confidence level going up. It's like, okay, at least somebody found it funny, and it just helps.

Sarah Wells:

Yeah. I've done quite a lot speaking, but you're always nervous. You're always nervous when you start. I generally find, I can tell when I'm about five minutes in and my shoulders relax because I've found my flow, and I'm usually getting enough feedback from people to think, oh yes, someone's finding this interesting. And that's tough for an online conference I think. So, yeah, I think that finding a way to encourage people to interact in some way. And also if you think about it, that whole idea of chatting while people are talking to you, it feels weird because in a real life conference, you wait and you ask questions at the end.

Charles Humble:

There's a whole another topic here, which actually I'd love to dive into because I think it's fascinating, but we don't have time. But just really briefly, so when we shifted to online obviously in a hurry, forced on by the pandemic, we basically looked at what happened at an in person conference, grabbed the talky bit, the talks because that's sort of relatively easy to do, and shifted that online with relatively little change. And I think we are just starting to think about how the interactivity models for online and in person might be different, but it's really just sort of the very, very beginnings of that. It's so interesting.

What are you plans for your book?

Charles Humble:

Unfortunately, if we tried to talk about it now we would end up with, I suspect the world's longest ever podcast episode, which would probably be a mistake. Before we close though, I do want to ask you about your book. So you're writing a book for O'Reilly on microservices. Can you give us an idea about when that's coming out, and also why another book on microservices? Don't we already have lots of those? What is it that you are going to be talking about that we haven't talked about before?

Sarah Wells:

It's going to come out probably first quarter next year. And I'm writing it at the moment. It's called Enabling Microservice Success. It's really about how, if you want to make a success of building a microservice architecture, it's about more than technology. It's about your organisational structure and your culture. And there are some excellent books about microservices out there. I think there's a lot of focus on how do you approach it and the architecture and everything. But my experience was, is when you are running them, what's it like when you're several years into this? Because we're getting to the point where there are now systems that have been around for five, six years that are built on microservices. How do you continue to keep building on them, and maintain them and keep them in a good state? Because one of the things you're hoping to have with microservices that you don't have to stop and start and completely start from scratch, you're able to replace components of your system and improve them and just keep the whole thing going.

Sarah Wells:

So I think there's a lot that I learned at the FT about building and operating microservices that I want to write up and share with people. Because I think that it can save you a lot of time if someone says, "Do you know what you need to do really early on? This thing." Like, literally your ability to find all of the information related to an event, whether that's through log aggregation that has a transaction ID on all the logs, or whether you're doing tracing, you need that. You need the ability to be able to trace what's going on. Things like that.

What would you recommend in terms of reading material for listeners who are maybe on a similar career path to yours?

Charles Humble:

That sounds great actually. I was thinking, there are lots of books Sam Newman, his Building Microservices, which is in its second edition now, which is a really fantastic book on the sort of foundational things that you need to know if you're thinking about microservices architecture. Including information hiding, ubiquitous language, all of that. It's a tremendous book, really recommend it, but you are right. I don't think there really is anything on, now you've got a microservices architecture and you're running it for long term. What does that look like? And I'm really looking forward to reading your book when it comes out, hopefully early next year. In the meantime, what would you recommend in terms of reading material for listeners who are maybe on a similar career path to yours?

Sarah Wells:

Ah, so there are three books that I recommend a lot to people. I'm afraid it's going to sound, just like they're common books everyone recommends. So first of all, the Accelerate book is really good just on what does it mean to be high performing as a software development organisation? What does it look like?

Sarah Wells:

Team Topologies. When I read it, I just had this real sense of, yeah, this makes sense. This is something we've found our way towards, the idea that there are different types of teams, the idea that how you interact is important, and that you want to interact in a way that doesn't constrain everyone to stop and work together for a long period. You want to be able to cooperate without having to do stuff in lockstep. So those are both really good books.

Sarah Wells:

If you are someone who is getting involved in managing people, The Manager's Path by Camille Fournier is really good just to help you. It's really great because it starts off with now you're managing someone. Okay, now you're managing someone who's senior, and you're managing a manager. So it has this step through of what it takes to be a manager. They're books that I have a copy of that I can loan out to people.

Sarah Wells:

Outside of sort of software development, I think there are a couple of books I'd recommend. Checklist Manifesto by Atul Gawande. And he's talking about how in the medical profession, they were trying to work out how you could have a relatively low cost intervention that made it much more likely people would do well in medical situations, particularly surgery. And they looked to aviation and the fact that in aviation for a long time, there's been the idea of checklists. And the checklists don't tell you everything you need to do, but they remind you of the things that you might forget and that are important. And they did this in medicine and it's something that really can apply in software as well.

Sarah Wells:

So if a surgical team introduce themselves to each other by name, there are better outcomes for the people being operated on. And the reason is about power differentials really. So it's literally the nurse is far more likely to say "you are about to cut off the wrong leg" to the surgeon, if they know the surgeon's name. I know that sounds ... This happens. So basically it's about what can you do, that means that you're more likely to have the right checks and balances in place? So is there something on the checklist that says we do a checkpoint that everyone is happy, we all understand the special circumstances of this operation. And so I think you can learn a lot from that for software development too. How do you just provide things that remind people of the things they might forget? Every developer knows how to do the build, test, deploy cycle. What might they forget that is important that we could then put on a checklist? I think that's good. And we had an engineering checklist at the FT.

Charles Humble:

That's a really interesting choice. Thank you, Sarah. I will include links to all of those resources in the show notes for this episode. And it just remains for me to say a huge thank you to Sarah Wells for joining me this month for this episode of the "Hacking the Org" podcast from the WTF is Cloud Native? Team at Container Solutions.

Sarah Wells:

Brilliant. Thank you so much. I've enjoyed it.