The incident serves as a stark reminder of the fragility of our digital infrastructure. By adopting a diversified, resilient approach to cybersecurity, we can mitigate the risks and build a more secure digital future.
On July 19, the world experienced one of the largest IT outages in history, affecting millions of users globally, and systems and people will be reeling from its impact for weeks. The cause? A faulty update on CrowdStrike's Falcon platform. This seemingly minor error in code cascaded into a major outage, affecting critical infrastructure worldwide. Airports, hospital systems, and other large enterprises relying on CrowdStrike were brought to a standstill, highlighting the vulnerabilities inherent in our increasingly digital world.
Falcon, a cloud-based security solution, functions like an advanced antivirus, updating threat intelligence and protecting systems automatically without user intervention. This automation is a boon for large enterprises, which can ensure all endpoints are protected and up to date without manual oversight. While efficient, this centralized system also introduces a fundamental risk: a single point of failure. When the update failed, it didn't just affect a few computers, but millions, all at once. The very feature that made Falcon attractive — its cloud-based, seamless, automated updates — became its Achilles' heel.
The Falcon failure exposed another fundamental flaw in our approach to cybersecurity and IT infrastructure. We tend to focus on protecting the most critical systems — flight control systems, cardiac machines in hospitals — while neglecting the everyday, mundane systems that are equally vital. In this case, it wasn't the high-stakes technology that failed but the routine systems like accounting, billing, and ticketing. These systems, often taken for granted, are the backbone of our daily operations, and their disruption can lead to chaos.
This is not a new phenomenon. Two years ago, the Colonial Pipeline hack highlighted a similar vulnerability. The attack targeted the pipeline's accounting system, not the refinery or processing plant. Without the ability to track and bill customers, operations came to a halt. Our reliance on digital solutions, coupled with the assumption that technology will always function flawlessly, leaves us unprepared for such disruptions.
Finally, we won't be able to fully recover for a while, even though mitigation guidance has already been released by CrowdStrike. It is because the system needs to be reset, and most endpoint users either lack the permissions (because IT has locked down systems by default) or because they don't know how to reset or revert systems. This is the third reason why the problem is persisting despite mitigation guidance already being released.
Advertisement
Such issues will only get worse as artificial intelligence (AI) gets integrated into systems. AI will centralize control further, automate complex tasks, and strip power and autonomy from users at the endpoint. Imagine a hospital where AI manages patient data, schedules, and even treatment plans. If such a system fails, frontline healthcare workers might find themselves unable to access crucial information or perform essential tasks, leading to potentially life-threatening delays. As AI becomes more integrated into our systems, the potential for large-scale disruptions increases. Our reliance on silicon-based systems will only deepen, making it imperative to address these vulnerabilities now.
Fortunately, carbon-based systems in nature provides a blueprint for resilience. In the early 1900s, Buffalo, N.Y., where I live, had thousands of tree-lined streets designed by Frederick Law Olmsted. Many of these trees were the same species, with streets named for the trees that lined them. But it created a single point of failure. When Dutch elm disease struck in the 1950s, it wiped out most of the elm trees because they were planted too closely together, allowing the disease to spread rapidly. This lesson teaches us the importance of diversity — in this case, heterogeneous computing systems. Organizations must implement diverse IT systems, especially for their core functions. Just as a monoculture of trees can be decimated by a single disease, a uniform IT infrastructure can be crippled by a single point of failure. Introducing variety in hardware and software solutions can create a more resilient digital environment.
Nature also offers insights into protecting core functions. Just as the human body employs multiple layers of defense to protect vital organs, organizations should use a variety of software and operating systems to handle critical functions. For example, a hospital's patient management system could run on one platform while its diagnostic tools operate on another, ensuring that a failure in one system doesn't compromise the entire operation. This is akin to how different species of trees in a forest can prevent the spread of disease; if one species is affected, others can continue to thrive. Similarly, deploying diverse cybersecurity measures and segregating core functions can provide a buffer against widespread failure, enhancing overall system resilience.
Finally, to prevent future meltdowns like the CrowdStrike incident, we also need to invest in training and preparedness drills to equip IT teams to respond swiftly and effectively to emerging threats. This is not a minor issue. Fixing the current problem required computers to be reverted back to their pre-update stage or waiting to deploy an updated patch. Even as technology is being centralized and implemented, more of the core functionalities are being centrally administered or locked down. While this approach aims to prevent disruption, it also makes it harder for staff to reboot systems or have administrative access, such as needing to reboot the system in safe mode or revert systems to their older state.
The issue is that people aren't really given access or equipped to handle these things, even as more of the technological functionalities are being centrally administered and removed from the hands of users at the endpoint. People remain the weakest link in cybersecurity — whether it's the coders creating patches or the individuals installing or reverting systems. Thus, our solutions must also include comprehensive training and a focus on the human element to ensure robust security measures.
The CrowdStrike meltdown serves as a stark reminder of the fragility of our digital infrastructure. By learning from nature and adopting a diversified, resilient approach to cybersecurity, we can mitigate the risks and build a more secure digital future. As the saying goes, "Those who can't remember the past are condemned to repeat it." Let us collaborate, innovate, and learn from our mistakes to ensure that such a disruption never happens again. The future of our digital world depends on the lessons we learn from the past and the actions we take today.
*A version of this article also appeared at: The CrowdStrike Meltdown: A Wake-up Call for Cybersecurity (darkreading.com)