Now is the Time to Act: Here’s what Companies can Learn from the Crowdstrike Outage

By Andersen Cheng, Founder & Chairman of Post-Quantum.

  • Wednesday, 11th September 2024 Posted 2 months ago in by Phil Alsop

On July 19th this year, a faulty update to a single piece of software led to unprecedented levels of disruption across the globe. Crowdstrike, a US cybersecurity company, founded in 2011, ran a routine update to their Falcon software. The update caused Windows systems to get stuck in a boot loop – rendering them unusable until a solution was found.

 

The chaos spanned across industries. Cancelled flights left thousands of passengers stranded at airports, emergency call centres were unable to function in some US states, and payments systems went down, leaving people unable to access their bank accounts. The financial impact was massive, but more importantly, the outage also put people’s lives at risk.

 

The event shows how we are dependent on technology to a greater extent than ever before. We rely on it to buy essential goods, transport us around the world, and communicate with friends and colleagues.

 

But the outage also teaches us another, more practical lesson. The severity of the impact was only made possible by the fragility of the global technology ecosystem and our over-reliance on a small number of cloud providers. The vulnerabilities of our system leave open the possibility of a major global outage. In this case, the outage was a technical issue, but imagine the chaos that could be caused if these vulnerabilities were exploited through an organised and sustained cyber-attack?

 

An important wake-up call

 

The interconnectivity of our digital infrastructure has brought about many benefits, allowing us to streamline processes and communicate more effectively. But it’s also left us exposed  to the impact of systemic failures in the technology economy.. As we’ve just seen, when a major technology provider is affected, our world comes to a standstill, and in today’s world, even the largest and most reputable companies are not immune to outages and breaches.

 

Microsoft has revealed that 8.5 million devices were affected by the botched Crowdstrike outage, which accounts for around 1% of all Microsoft devices worldwide. When the next incident takes place, this number could be even higher – and the effects even more catastrophic.

 

For IT decision-makers, the incident is a sign that they need to increase their operational resilience. This means gaining greater visibility and control over their technology supply chain. The question now is how to go about this.

 

One option is for businesses to adopt a multi-cloud approach, investing in multiple cloud subscription models in tandem. This would allow them to switch over operations to a secondary provider if their primary partner is experiencing issues. For example, if Microsoft goes down, businesses could shift to using Google.

 

However, this solution will be unattainable for most organisations. The expense of subscribing to and running two differently architected cloud services simultaneously puts it out of reach for all but the largest corporations.

 

The importance of independent infrastructure

 

Instead, companies should explore shifting their legacy technology ecosystem to an independent, ground-up digital infrastructure to future-proof their operations and reduce their reliance on major technology providers.

 

Firms should view this as an opportunity to run a new low-cost infrastructure in parallel to the platform they use for day-to-day operations. This new infrastructure should be independent of their primary cloud provider, meaning that in the case of a major outage or any other issues, they can switch over to this secondary infrastructure and prevent large-scale disruption to operations. For this to work, this solution would have to be quick to implement and be backed up with the necessary data.

 

In terms of functionality, at a minimum, this platform would provide an essential communications service for senior decision-makers and upper management, allowing them to relay instructions to staff and reassure customers and clients. It would also be able to initiate a contingency command and control process and facilitate simple tasks until the main systems are brought back online.

  

Preparation is key

 

Whether a cyberattack or a technical mishap, the next cyber disaster will be impossible to predict. Firms must act sooner rather than later if they want to protect themselves and ensure that business can continue as usual in times of crisis. We’ve seen that the technology ecosystem is more fragile than we think – we must prepare for when things go wrong.

 

With some of our mission-critical clients already beginning to invest in ground-up digital infrastructure, it’s clear that the time for delay is in the past. Making this shift is a significant and much-needed step towards reducing over-reliance on a small number of technology providers and increasing cyber-resilience.

By Kashif Nazir, Technical Manager at Cloudhouse.
By Terry Storrar, Managing Director at Leaseweb UK.
By Manuel Sanchez, Information Security and Compliance Specialist, iManage.
By Peter Hayles, Product Marketing Manager at Western Digital.
By Richard Eglon, CMO, Nebula Global Services.
Anita Mavridis, VP of Product at Zivver, and Sue Musumeci, Director of Quality & Clinical Informatics at Chronic Care Staffing, explore practical...
By Graham Jarvis, Freelance Business and Technology Journalist, Lead Journalist – Business and Technology, Trudy Darwin Communications.
By Krishna Sai, Senior VP of Technology and Engineering.