Site Reliability Engineering, or SRE, an engineering practice formalised and named by Google, has helped many organisations maintain their platforms and ensure application performance and reliability,...
As Site Reliability Engineers, or SREs, we spend our days (and sometimes nights and weekends) making sure the platforms we oversee run smoothly. We also follow careful protocols for responding when so...
This is the conclusion of a three-part blog series. For more information, request our free e-book, SRE: The Cloud Native Approach to Operations. If you’ve been following parts 1 and 2 of this blog ser...
This is part 2 of a three-part blog series on Site Reliability Engineering. You can read Part 1 here. Part 3 is here. To learn more, request a free copy of the e-book on SRE from which this was excerp...
This is the start of a three-part blog series on Site Reliability Engineering. To learn more, request a free copy of the e-book on SRE from which this was excerpted. Almost all enterprises nowadays lo...
Back in 2017, I wrote on my personal blog about Things I Learned Managing Site Reliability for Some of the World’s Busiest Gambling Sites. A lot of it focussed on runbooks, or checklists, or whatever ...
Have you ever wondered how effective Site Reliability Engineering (SRE) teams manage complex applications successfully? In the Kubernetes ecosystem, there is only one answer: Kubernetes Operators! In ...
This blog post is the conclusion of a series. In Part 1 of this blog series about unikernels, I explained what unikernels are, and their role in reducing resource usage within operating systems and ma...
This blog post is part of a two-part series. In this blog post I’d like to provide an overview of what unikernels are, how they fit in the cloud computing landscape and what projects are driving the t...