Lessons to Take Away from the CrowdStrike Crisis

Lessons to Take Away from the CrowdStrike Crisis

Thank you for clicking on our newsletter, Arbisoft Next. Before we dive into the topic, if you haven't already subscribed, please do so to stay updated on the latest tech and Arbisoft news.

If you're interested in partnering with us, contact us here. Our team of over 900 members across five global offices specializes in Artificial Intelligence, Traveltech, and Edtech. Our partner platforms serve millions of users daily.

We’re always excited to connect with people who are changing the world. Get in touch!

From handwritten plane boarding tickets to cash-only transactions at grocery stores - last week the world got a firsthand education of just how vulnerable our IT infrastructure is. 

The recent global IT outage caused by a corrupted software update by CrowdStrike serves as a grim reminder of the over-reliance on a single system and the importance of robust safeguards. A security update from CrowdStrike launched to its Falcon Sensor caused the 3rd party software to crash, and with it, almost 8.5 million Windows devices on the morning of July 19. Considering over half of Fortune 500 companies and U.S. government agencies use CrowdStrike as their primary cybersecurity provider, the resulting outage adversely impacted everything from banking and finance to travel, healthcare, and commerce worldwide. 

But what should we learn from this incident? Let’s take a look at 7 key takeaways.


1. Not All Clouds Are Created Equal

Cloud reliance is the new normal, but it's crucial to understand the risk it has! However, relying on a single cloud provider creates a monoculture. Imagine putting all your eggs in one basket. That's kind of what's happening with the cloud these days. If one of them has a hiccup, like what happened recently, it can cripple the entire network! Businesses might want to think about considering a multi-cloud strategy. That way, if one cloud faces any technical malfunction, the rest can still keep things running smoothly - especially when the influence of their services is gigantic! 

Mark Boost, CEO of Civo, emphasizes the dangers of monoculture: 

"The outage highlights the over-reliance risk on a single system or provider. Even established giants aren't invincible."

2. The Code Flaws

The exact cause of the problematic CrowdStrike update remains under debate (kind of!). One theory points to a null pointer error, a common C++ coding bug where a variable is used before being assigned a valid memory location. While CrowdStrike denies this, security researchers like Tavis Ormandy (Google) and Patrick Wardle (Objective-See) suspect a logic error. Regardless of the specifics, the faulty code should never have reached production.

3. QA is Necessary! 

CrowdStrike's quality assurance (QA) team is under scrutiny for letting this update slip through the cracks. This raises the question of how such a critical security patch bypassed client-side controls and rolled out to everyone. Konstantin Klyagin of Redwerk and QAwerk highlights the importance of automated testing, especially for large-scale updates, where manual testing might miss crucial issues.

4. Communication is Key in Chaos

The outage showcased the importance of clear and timely communication during a crisis. Businesses need to be prepared to inform stakeholders — employees, customers, and partners — about the situation, what's being done to fix it, and when normal operations are expected to resume. Regular updates, even if just to acknowledge ongoing investigations, help maintain trust and prevent confusion.

Both CrowdStrike and Microsoft demonstrated the importance of swift action in mitigating such crises. Their collaborative efforts to provide manual solutions and reroute traffic highlight the need for robust incident response plans.

5. Phased Rollouts Prevent Crisis

The simultaneous rollout of the update across all systems by many organizations is another critical lesson. While staged rollouts might seem time-consuming, they are essential for mission-critical systems. Techniques like blue/green deployments, canary deployments, and A/B testing allow for controlled rollouts, minimizing risk. Additionally, robust rollback procedures are crucial for reverting to a stable version if problems arise.

6. Test, Refine, Repeat

The importance of disaster recovery plans and reliable backups cannot be overstated. As cyber threats and technological complexities evolve, disaster recovery plans need to adapt. Turns out, businesses need ‘fire drills’ too, but for tech emergencies! Regularly or periodic practicing these "dry runs" by testing backups and recovery procedures ensures everything works smoothly when things go south. After all, a tech meltdown shouldn't turn into a full-blown crisis!

There are many instances where organizations faced many challenges lacking rapid backup solutions - like Hollywood Presbyterian Medical Center Ransomware in 2021 and Riviera Beach city government Florida cyberattack. Cloud backups, while convenient, introduce complexity. Traditional disaster recovery methods and backups would have proven invaluable in this situation.

7. Monitoring and Response

The global reach of the outage emphasizes the need for advanced monitoring tools and well-defined incident response plans. Real-time monitoring can detect issues early on, while proper incident response plans ensure fast identification, isolation, and resolution. Continuous monitoring, root-cause analysis, and post-incident reviews are all crucial for building resilience.


The CrowdStrike incident serves as a stark reminder and a wake-up call that even routine maintenance can be disruptive if not managed and assessed properly. It highlights the interconnectedness of modern IT systems and the cascading effects of failures in widely used software. By implementing robust risk management strategies and learning from past events, IT teams can be better prepared to weather the next storm.

Samina kalwar

Generative AI & ML || Data Science|| Python || Github || Big data || AWS

4d

Informative !

Like
Reply
WALEED ARIF

Skilled in Machine Learning & Deep Learning | Passionate Data Scientist | Innovator in Future Technologies | Experienced Software Engineer

5d

Insightful!

Like
Reply
Muhammad Ehtisham

Cyber Security Enthusiast | Threat Analyst | (ISC)² Certified in Cybersecurity | Wazuh | SIEM | ELK Stack | Network Security | Honeypot Server

5d

Informative

Like
Reply
Noorzaib Munawar Chohan

SWE @Enigmatix | BS IT’25 @IUB | Python | Django | DRF | FastAPI

5d

Well written.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics