We woke up one Friday morning to a world in chaos: airplanes couldn’t fly, live video streams were down, bank accounts were inaccessible, and hospitals were paralyzed. All of this was caused by Windows PCs displaying the dreaded “Blue Screen of Death,” unable to turn on.
The Culprit
A security firm, CrowdStrike, had sent out a single software update the night before. The impact was immediate, and the recovery plan required a rollback to a previous version of the CrowdStrike software. The day after the interruption, Microsoft announced that more than eight million Windows devices had been affected.
So, what exactly caused this fault? According to public information from CrowdStrike and Microsoft, CrowdStrike sent a routine update to its security software running within a highly protected part of the Windows Operating System (OS). This update, however, clobbered the system, rendering the OS non-functional.
And while there are plenty of detailed explanations of what failed and why it had such an impact, I want to focus on the broader implications — specifically, the reliability of the IT ecosystem and its far-reaching impact on society and the economy. Below are the critical questions that arise from this incident.
New Questions for the Industry
-
Reputational risk for Microsoft: While CrowdStrike’s security software caused the problem, many blame Microsoft. What does this mean for Microsoft’s reputation, and what risk will it present for them?
-
System vulnerability: How did Microsoft allow a single organization’s update to corrupt the Windows Operating System so dramatically? What steps will Microsoft take to ensure this type of failure never happens again? Are there other third-party software programs with similar potential impact?
-
Liability + accountability: Is there a liability to CrowdStrike and Microsoft for the additional costs and disruption caused by this incident? Will governments around the world hold these companies accountable?
-
Value of diversity in technology: Since MACs, Linux, and Chrome-based computers were not impacted, this incident puts a spotlight on the value of diversity in technology ecosystems. Should organizations mandate multiple operating systems to avoid single points of failure?
-
Approval processes for updates: There was a time when organizations carefully vet updates and patches through a Change Advisory Board (CAB). With the rise of SaaS applications, this process is no longer adhered to and often bypassed. Should we return to a more formal approval process?
-
Security strategy: Security companies are compelled to neutralize novel malware risks as quickly as possible. Does this strategy need re-evaluation due to this incident?
Fixing the Problem
One recommendation to fix the problem was to revert to a previous version of the CrowdStrike software. There is a feature in Windows called System Restore that allows you to return your computer’s system files to an earlier point in time. Unfortunately, this would require an alternative method to load the operating system. This Is not an automated process and requires being at the computer. This would be an impractical solution given the scale of affected devices.
There is a familiar quote, “What doesn’t kill you, makes you stronger.” The hope is that we’ve now identified an entirely new risk category. This incident provides us all with another risk vector that we need to safeguard against.
In the aftermath of this significant disruption, we must reflect on these critical questions and take actionable steps to fortify our IT infrastructure. Our resilience as a global economy depends on it.
Learn more about the crucial importance of recovery when an IT disaster strikes in our recent article from Compugen’s Salman Alibhai.
Take the Next Step
Our resilience as a global economy depends on our ability to reflect on these incidents and take actionable steps to fortify our IT infrastructure. We must identify new risk categories and safeguard against them, ensuring that our recovery strategies are not just about IT but about sustaining business operations and protecting societal well-being.
If you want to learn more about enhancing your security and recovery strategies, book a Discovery Call with Compugen. As your Technology Ally, our Dream Team will assess your existing fleet of security and recovery software and review your preparedness plan to ensure you are well-equipped to handle any IT disaster.
This article is republished from LinkedIn.