Continuous Delivery, Build, WTF Is Cloud Native

How the World’s Most Valuable Insurance Firm Takes a Cloud Native Approach

Allianz Direct is the European direct to customer arm of the world’s most valuable insurance brand. Regulated by the German, Dutch, Spanish and Italian authorities, it has taken advantage of a step change in how IS and IT Risk functions collaborate with Engineering teams to rapidly build a greenfield insurance platform, scale it to all four countries, and migrate away from legacy solutions. It runs its platform in the public cloud and makes extensive use of Agile and DevSecOps practices to deploy thousands of times a year, all whilst complying with strict regulations, policies, and processes.

The platform utilises Allianz’s inhouse core insurance system, Allianz Business System (ABS), which is enhanced with around 60 microservices written in Kotlin, Java and NodeJS. We take advantage of Domain Driven Design (DDD) to design both the platform and our organisation, creating independent teams around defined interaction points, with the services usually communicating via Kafka. We also make extensive use of Kubernetes for container orchestration and the Cloud Native approach to building software. This modern approach has enabled us to quickly build scalable software with a flexible architecture, however it is the revolution in compliance processes that enabled us to go live.

Trust us, the regulator will love it

The insurance industry is all about trust. Customers pay premiums trusting that in the future, should the insured event happen, the insurance company will compensate them. Insurance companies usually have a deep understanding of risk, especially any risk to their reputation. They have a healthy fear of getting on the wrong side of regulators and the press, and perhaps hundreds of years of experience in creating processes to ensure this doesn’t happen. However, the current race to the cloud and push for digital transformation has forced even conservative industries to find a way to move fast whilst not risking the trust of their customers.

It must be better

Change is never easy: there can be a high price to pay in both effort and organisational stress. Luckily advocating for change in IS and IT Risk is made easier if that change not only offers benefits in productivity and costs but also better outcomes. Legacy processes tended to be more focused on proving a control was implemented rather than on the risk being mitigated. To convince the various functions, you need to show that you are not just trying to build software faster but that the result will actually be more reliable.

A great example is penetration testing. Most IS processes call for penetration tests only on a scheduled basis or on large releases. By integrating a continuous pen testing solution into our DevSecOps process we closed the gap and proved how a modern approach, even one that deploys more frequently, actually holds less risk.

Incorporating continuous vulnerability scanning is another example of how a new approach is less risky. Previously vulnerability scanning only happened at build time, and as older applications are generally updated infrequently, it could be months or years since a running application was last checked. We enhanced a commercial tool to continuously scan all running containers for vulnerabilities, immediately reporting new vulnerabilities directly back to the engineering teams.

During the recent Log4Shell security event, Allianz Direct’s new approach to security was tested. As information first emerged about the Remote Code Execution (RCE) issue, our IS teams alerted our engineering teams to the threat. Whilst investigations continued, we were able to take advantage of our Infrastructure as Code philosophy to rapidly deploy WAF Filters and regularly update them. Our IS tools scanned all running containers for the vulnerable library versions, including dependencies of dependencies, and highlighted a list of microservices to be updated. As our engineers raced to upgrade the services, our continuous penetration testing tool was updated to probe using Log4Shell exploits, and our logging platform was continuously scanned for suspicious activity. Within 48 hours of the incident starting, our CI/CD pipelines redeployed nearly all 60 services while complying with all change management processes, and we were able to prove that all services had been found, scanned, and updated.

Focus on mitigating risks, not checklists

Compliance with IS and IT Risk policies often takes the form of documenting policies and can sometimes feel like a checklist filling exercise. Embarking on a change program gives an excellent opportunity to make sure your policies are actually making your company safer, rather than just providing a false sense of safety.

When approaching a risk control, it is important to remember the process is not the point, mitigating the risk is. Take your time to understand why the process is there, what it was trying to prevent, and how can you mitigate the risk in another way compatible with your new way of working.

At Allianz Direct we had a change management policy that all releases were signed off by authorised personnel before being deployed—even new features are considered changes. We had ambitions to deploy 1000s of times a year and could see no way to comply with this requirement. However, after talking with our IT Risk colleagues, we understood that this process was there to prevent new functionality being deployed without the business knowing. We were able to mitigate that by adding a step to our CI/CD platform that required the Product Owners, as representatives of the business, to approve deployments in GitHub before they were picked up by the production pipeline.

Our CI/CD platform uses Tekton and Argo CD, providing a smooth deployment experience for our developers. They also enable us to know exactly what is running in production and trace back changes in services or infrastructure to the Engineer who wrote it and the Product Owner who approved it. This, combined with the ability to rollback deployments to a guaranteed previous state, is a powerful advantage when discussing change management with auditors.

IS and Risk colleagues are a fantastic source of knowledge and help in finding new approaches. In fact, we found that they and our Audit colleagues were much more aware of the challenges posed by Agile and DevOps methodologies than we expected. They were often more than happy to act as trusted advisors.

Own the rules

Allianz Direct is an Operating Entity (OE) within Allianz, meaning we need to formally adopt our own rules and policies. Audit checks are regularly carried out by internal and external auditors to ensure these polices exist, are well documented, and followed.

As with most Allianz OEs, our polices are based on the Allianz Group polices, however being our own OE enabled us to change them where they conflicted with our new methodology. All changes needed to be approved both by the risk functions and the board of management, and be able to withstand scrutiny during an audit.

Moreover, being able to formally make these changes and design proper controls meant that when audits happened, we could show we were following our documented processes. Had we been an innovative team within a larger entity using more traditional policies, audits would have flagged us as not being compliant. Insurance companies do not like being non-compliant and generally the Group board of management is informed of issues with audits. Owning the rules enabled us to innovate and be compliant.

Challenge the status quo

Many objections against agile and DevOps processes assumed that these approaches are incompatible with tried and tested compliance policies. Oftentimes these assumptions are based on some kind of long-held corporate mythology, or a misunderstanding of the benefits that these new methodologies can bring.

Insurance companies are very conservative organisations, and that conservatism has been good for us and our customers. However, it can lead to rules being interpreted very narrowly. As the conversations around compliance continued, we gradually built up an understanding as to which opinions could be challenged. Again, understanding the risks the rules are trying to mitigate helped guide us.

It helps to have a big brother

While being a lean organisation enabled us to transform rapidly, in several cases it also meant we needed the assistance of other parts of the Allianz Group. As a strategically important part of the German economy, Allianz has regular touch points with the German regulator BaFin. This includes discussions on our IT strategy, including moving to the cloud. The outcome of these discussions enabled us to gauge the openness of the regulator to hosting data in the cloud and to identify topics to be addressed.

In recent years BAFIN has increased its focus on IT Security including in its special regulations for IT in insurance companies (VAIT). Awareness of changes is critical, as is having the resources to understand and interpret them. As a small OE, Allianz Direct relies on the Group IS and IT Risk functions to guide us.

We were also able to take advantage of Allianz Germany’s efforts in moving to the cloud. In 2020 Allianz Germany notified BAFIN that it intended to move some IT to cloud providers. The preparatory work that goes into such a notification is huge and Allianz Direct was able to piggyback on the announcement.

It’s not just you

While getting your own house in order can be challenging enough, you also need to ensure your supply chains are compliant. The move to Cloud Native approaches has increased the move to more managed services or SaaS solution from vendors. Each one needs to be assessed not just for IS and IT risk but for data privacy compliance.

When working with SaaS vendors we try to avoid those where the use case requires the sharing of customer-identifiable data or health information. There is no real way to minimise the compliance requirements and checks, which can be extensive and cause smaller vendors to be excluded. In these cases, we want to understand how the data is stored, who has access, what IS polices the service provider has and whether they are compliant with all our policies.

DevOps Culture

A key success factor for us has been the engagement of our ISO Anton Göbel with the engineers. Previously ISOs may have been seen as a gatekeeper to a production deployment, and thus someone to keep at arm’s length. At Allianz Direct we took the complete opposite approach. Anton and his team are fully integrated into the engineering teams: they are seen by the engineers as a source of knowledge and sparring partners rather than enforcers.

Whether it’s a side effect of the DevOps mentality or just that developers are now included in our IS and IT risk processes we have seen that the organisation is much more aware of IS and IT risk. We follow a Security & Privacy by Design approach, so risk is part of every service’s architecture discussion, but we also see engineers actively engaged with our IS function. To take full advantage of this we have a Security Champion role in each engineering team where we offer engineers additional training in security topics in exchange for them acting as the eyes and ears of IS in their team.

This tight collaboration enabled Allianz Direct to react quickly to the recent SpringShell security event. An engineer was the first to notice a tweet that started the discussion as to whether this actually was an RCE vulnerability. They informed the Security Champions and Anton’s team. Another engineer took it upon themself to analyse the Spring source code and to flag up the threat as a possible issue well before it was acknowledged by Spring. Allianz Direct was able to use this analysis to start a wider discussion in Allianz that allowed the global Allianz IS community to react appropriately to the threat.

Historically, the only time compliance excited an engineer was when they felt hamstrung by it. However as with all the “Ops” changes in methodology (DevOps, DevSecOps, AIOps) integrating people and empowering people in a process usually has profound effects. Building IS and Risk processes into our engineering process has given our engineers a deeper understanding of risk and more importantly made our organisation more secure.

Comments
Leave your Comment