The history (and prehistory) of Cloud Native has been characterised by the gradual encroachment of automation on more and more pieces of the software engineering lifecycle. In the dark ages before the year 2000, automation was limited to tools like make
and shell scripts, and was typically directed towards building software artifacts, such as tar
files or CD-ROMs.
As time went on, more tooling built up to support the automation of the software lifecycle. Git helped automate the management of software code, Jenkins helped automate and manage software build processes, Terraform the automation of infrastructure delivery (leveraging the Cloud’s development of APIs for infrastructure provisioning), Docker the encapsulation of software environments, and so on.
This diagram from our GitOps e-book was put together to show the antecedents of GitOps, culminating in tools like FluxCD and ArgoCD that are centred around Kubernetes. Kubernetes itself might be considered the apotheosis of the Cloud Native philosophy, emphasising the importance of software platform delivery as code that can be stored and managed in an auditable source control.
One area that has remained relatively impervious to automation has been the Governance, Risk and Compliance (GRC) areas of IT service management. As technology has increasingly automated the operation of the financial sector, little has changed in GRC, except perhaps the amount of money spent (up) and the number of people involved (up). As a result, GRC in finance has scaled poorly, still relying on regular manual point-in-time checks on whether controls are in place and working. Container Solutions has written about this challenge previously here.
So far, engineers have shown little interest in tackling this problem, perhaps because controls are seen as stifling rather than enabling, and ‘managing risk’ via controls is less intellectually challenging to master than security issues such as supply chain management or vulnerability detection.
However, that may change soon.
Like any other part of industry, regulators are subject to trends, and these trends can be shaped by events. In the financial sector, the biggest event this century was the 2008 global financial crisis, which precipitated a marked shift in regulatory frameworks and priorities. This shift brought focus to financial risk, both systemic and to the consumer. Banks and other financial institutions were required to hold more capital in reserve to guard against potential losses, and new measures were implemented to detect and prevent fraud and financial misconduct. The overall aim was to reduce risk and increase resilience in the financial positions that banks took.
While systemic risk never goes away, after 15 years this focus on financial regulation is receding, and there is increasingly a focus on operational risk and resilience (as opposed to financial).
In this context, the Digital Operational Resilience Act (DORA) has been put forward by the EU. This act seeks to “enable and support the potential of digital finance in terms of innovation and competition while mitigating the risks arising from it”. The initial memo on the act covers various areas to do with IT risk management, centering around risk management and reporting. The regulatory wind is blowing in a similar direction in the UK, with a resilience policy paper preceding the enactmentent of the UK Finance Software and Markets Act 2023.
The published documents on DORA are somewhat vague and generic. Although the act is slated to become law from January 17, 2025, the technical standards are expected to be published ‘in tranches from January 17, 2024’, so the detail isn’t known yet. This doesn’t give much time for financial institutions to allocate budget to deliver any changes they need to make to be compliant, especially as budgets are still typically decided on a yearly cadence in banking.
Some ‘reading between the lines’ of the DORA documents might make its intentions more clear, however. It seems that regulators have been frustrated with the opaqueness and lack of clarity and transparency around risk, and incidents relating to risk. This would explain the strong focus in the act on standards around ‘incident reporting, reducing administrative burdens and strengthen[ing] supervisory effectiveness’. To put it plainly, regulators are going to expect compliance and audit functions to have the ability to report on their controls regularly, efficiently and clearly.
There is also an emphasis on system testing, again suggesting that regulators are keen to see a reduction in operational risk through better deployment and monitoring of new technology and systems.
Of particular interest to us at Container Solutions are these quotes from the act:
‘financial entities shall establish, maintain and review, with due consideration to their size, business and risk profiles, a sound and comprehensive digital operational resilience testing programme as an integral part of the ICT risk management framework ‘
‘financial entities need to have in place comprehensive capabilities enabling a strong and effective ICT risk management, alongside specific mechanisms and policies for ICT-related incident reporting, testing of ICT systems, controls and processes, as well as for managing ICT third-party risk.’
These quotes point to an increasing focus on GRC from the regulators, and specifically on the testing and reporting of risk and ICT-related incidents. In short, less check-controlsbox and more real-time observability.
One of the frustrations we think regulators have is the lack of visibility on the effectiveness of controls. At the moment, audits of controls take place on a cadence in the years, and are carried out ‘by hand’ by auditors whose job it is to seek out evidence of the adherence to, and effectiveness of, controls.
At Container Solutions, we know this problem is ripe for automation and standardisation, just as other areas of software (such as CI/CD) have been over the last ten years. The first steps toward this have been taken in the industry, as FINOS has announced their Common Cloud Controls project. This project seeks to ‘develop a unified set of cybersecurity, resiliency, and compliance controls for common services across the major cloud service providers (CSPs)’.
However, this only deals with shared standards around the description of controls, and not the implementation of the controls. For example: a control description might be: ‘S3 buckets must not be available across the Internet’. Control implementations come in three classes: preventative, detective, and reactive. For this control the implementations might be:
At Container Solutions, we’re working on an open source solution that seeks to help companies automate their controls and auditing efforts. Just as with other Cloud Native transformation efforts, we see this as precipitating a step change not just in the pace of auditing, but also the effectiveness, transparency, and scalability of auditing processes.
Therefore we can begin to see a rebalancing of the equation on the GRC side where disproportionate costs and infrequent manual checking is replaced by automated and real time observability; which provides the opportunity to take immediate action significantly mitigating risk.
Stay tuned for more updates on this effort, as we seek to make Governance Engineering a first-class citizen in the Cloud Native landscape.