19 min read

    What SRE Teams Can Learn from Business Continuity and Vice Versa

    Keeping our software up and running isn’t so different from keeping our organisations functional. We can learn from each other and use the same techniques.

    Almost 20 Years In, SREs Are Still Finding Their Place

    The field of site reliability engineering originated at Google with Ben Treynor Sloss, who founded a site reliability team after joining the company in 2003, but the practice has spread across most or...

    DevOps - The Sec is Silent

    There are two hard problems in tech: cache invalidation, naming things, and off by one errors. We have proven this over and over again through a multitude of poorly named things. Whether it’s AWS Serv...

    16 min read

    A Beginner's Guide to Using the Prometheus Operator

    Prometheus is a simple and effective open-source monitoring system. In the years after we published the article Monitoring Microservices with Prometheus, the system has graduated from the Cloud Native...

    13 min read

    Why Should We Care about AIOps?

    Those of us who make a living producing software or managing software teams are ultimately getting paid to improve business processes, be it making cars autonomous to improve safety, save people time ...

    Why You Need Chaos Engineering Now More Than Ever

    About a year ago, brick and mortars like restaurants and grocery stores were scrambling to set up delivery and curbside pickup. A lot of them used chaos engineering, in production, to hunt for failure...

    Fire Drills: a Guide to Preparing for Your Next Incident

    Supporting Cloud Native applications is no easy task. Through offering Customer Reliability Engineering (CRE) support—essentially, Site Reliability Engineering (SRE) as a service—for multiple customer...

    WTF Is Continuous Improvement?

    When you’re offered a Covid-19 vaccine this year, which sort would you like? One that’s been through animal and human trials, received government approval, is made on a standardised production line, a...

    Isn't SRE Just DevOps?

    The truly Cloud Native way to work in teams, according to the Maturity Matrix, means SRE and DevOps. But what does that mean? You might be wondering, Isn’t SRE basically just DevOps?