Lessons to Take Away from the CrowdStrike Crisis

Arbisoft

Imagine . Build . Test . Repeat

Published Jul 23, 2024

Thank you for clicking on our newsletter, Arbisoft Next. Before we dive into the topic, if you haven't already subscribed, please do so to stay updated on the latest tech and Arbisoft news.

If you're interested in partnering with us, contact us here. Our team of over 900 members across five global offices specializes in Artificial Intelligence, Traveltech, and Edtech. Our partner platforms serve millions of users daily.

We’re always excited to connect with people who are changing the world. Get in touch!

From handwritten plane boarding tickets to cash-only transactions at grocery stores - last week the world got a firsthand education of just how vulnerable our IT infrastructure is.

The recent global IT outage caused by a corrupted software update by CrowdStrike serves as a grim reminder of the over-reliance on a single system and the importance of robust safeguards. A security update from CrowdStrike launched to its Falcon Sensor caused the 3rd party software to crash, and with it, almost 8.5 million Windows devices on the morning of July 19. Considering over half of Fortune 500 companies and U.S. government agencies use CrowdStrike as their primary cybersecurity provider, the resulting outage adversely impacted everything from banking and finance to travel, healthcare, and commerce worldwide.

But what should we learn from this incident? Let’s take a look at 7 key takeaways.

1. Not All Clouds Are Created Equal

Cloud reliance is the new normal, but it's crucial to understand the risk it has! However, relying on a single cloud provider creates a monoculture. Imagine putting all your eggs in one basket. That's kind of what's happening with the cloud these days. If one of them has a hiccup, like what happened recently, it can cripple the entire network! Businesses might want to think about considering a multi-cloud strategy. That way, if one cloud faces any technical malfunction, the rest can still keep things running smoothly - especially when the influence of their services is gigantic!

Mark Boost, CEO of Civo, emphasizes the dangers of monoculture:

"The outage highlights the over-reliance risk on a single system or provider. Even established giants aren't invincible."

2. The Code Flaws

The exact cause of the problematic CrowdStrike update remains under debate (kind of!). One theory points to a null pointer error, a common C++ coding bug where a variable is used before being assigned a valid memory location. While CrowdStrike denies this, security researchers like Tavis Ormandy (Google) and Patrick Wardle (Objective-See) suspect a logic error. Regardless of the specifics, the faulty code should never have reached production.

3. QA is Necessary!

CrowdStrike's quality assurance (QA) team is under scrutiny for letting this update slip through the cracks. This raises the question of how such a critical security patch bypassed client-side controls and rolled out to everyone. Konstantin Klyagin of Redwerk and QAwerk highlights the importance of automated testing, especially for large-scale updates, where manual testing might miss crucial issues.

4. Communication is Key in Chaos

The outage showcased the importance of clear and timely communication during a crisis. Businesses need to be prepared to inform stakeholders — employees, customers, and partners — about the situation, what's being done to fix it, and when normal operations are expected to resume. Regular updates, even if just to acknowledge ongoing investigations, help maintain trust and prevent confusion.

Both CrowdStrike and Microsoft demonstrated the importance of swift action in mitigating such crises. Their collaborative efforts to provide manual solutions and reroute traffic highlight the need for robust incident response plans.

5. Phased Rollouts Prevent Crisis

The simultaneous rollout of the update across all systems by many organizations is another critical lesson. While staged rollouts might seem time-consuming, they are essential for mission-critical systems. Techniques like blue/green deployments, canary deployments, and A/B testing allow for controlled rollouts, minimizing risk. Additionally, robust rollback procedures are crucial for reverting to a stable version if problems arise.

6. Test, Refine, Repeat

The importance of disaster recovery plans and reliable backups cannot be overstated. As cyber threats and technological complexities evolve, disaster recovery plans need to adapt. Turns out, businesses need ‘fire drills’ too, but for tech emergencies! Regularly or periodic practicing these "dry runs" by testing backups and recovery procedures ensures everything works smoothly when things go south. After all, a tech meltdown shouldn't turn into a full-blown crisis!

There are many instances where organizations faced many challenges lacking rapid backup solutions - like Hollywood Presbyterian Medical Center Ransomware in 2021 and Riviera Beach city government Florida cyberattack. Cloud backups, while convenient, introduce complexity. Traditional disaster recovery methods and backups would have proven invaluable in this situation.

7. Monitoring and Response

The global reach of the outage emphasizes the need for advanced monitoring tools and well-defined incident response plans. Real-time monitoring can detect issues early on, while proper incident response plans ensure fast identification, isolation, and resolution. Continuous monitoring, root-cause analysis, and post-incident reviews are all crucial for building resilience.

The CrowdStrike incident serves as a stark reminder and a wake-up call that even routine maintenance can be disruptive if not managed and assessed properly. It highlights the interconnectedness of modern IT systems and the cascading effects of failures in widely used software. By implementing robust risk management strategies and learning from past events, IT teams can be better prepared to weather the next storm.

Arbisoft Next

103,829 followers

+ Subscribe

4 Comments

Samina kalwar

Generative AI & ML || Data Science|| Python || Github || Big data || AWS

Informative !

WALEED ARIF

Skilled in Machine Learning & Deep Learning | Passionate Data Scientist | Innovator in Future Technologies | Experienced Software Engineer

Insightful!

Muhammad Ehtisham

Informative

Noorzaib Munawar Chohan

Well written.

See more comments

To view or add a comment, sign in

Lessons to Take Away from the CrowdStrike Crisis

Arbisoft

Imagine . Build . Test . Repeat

1. Not All Clouds Are Created Equal

2. The Code Flaws

Recommended by LinkedIn

3. QA is Necessary!

4. Communication is Key in Chaos

5. Phased Rollouts Prevent Crisis

6. Test, Refine, Repeat

7. Monitoring and Response

Arbisoft Next

103,829 followers

More articles by this author

Insights from the community

Others also viewed

Tech Forecast 2017: 5 key technologies to double down on now

DDoS Protection Market Growth with Challenges, Competitive Market Share and Top Players by 2025

Understanding API Timeline and Security

Today's Tech Digest - Jun 24, 2019

Top Tech Trends for Small Businesses in 2019

How a Faulty Update Grounded the World

Into the cyber-way fray: RedShield

5 Data Security Architectural Trends Helping Combat the Digital ISIS

Crossing the Red Sea: Lessons from the Microsoft Outage

Ugly Truths about Enterprise IT - Ode to 2020

Explore topics

1. Not All Clouds Are Created Equal

2. The Code Flaws

Recommended by LinkedIn

3. QA is Necessary!

4. Communication is Key in Chaos

5. Phased Rollouts Prevent Crisis

6. Test, Refine, Repeat

7. Monitoring and Response

Arbisoft Next

103,829 followers

A Closer Look at Etched and the World's First Transformer ASIC

Jul 17, 2024

What Do Claude 3.5 Sonnet & CriticGPT Bring to the LLM Table?

Jul 11, 2024

Mastering Native Mobile App Development: Benefits, Challenges, and Success Stories

Jul 3, 2024

Gear Up to Meet Edly by Arbisoft at the Open edX Conference 2024

Jul 1, 2024

Importance of Product Market Fit (PMF) for Startups

Jun 26, 2024

The Dark Side of Generative AI - Deepfakes, Disinformation, and Why You Should Be Worried (But Not Scared)

Jun 20, 2024

The Future of Web Scraping for MVP Development - (APIs, headless browsers, and advanced techniques)

Jun 15, 2024

Understanding Web3 and the Metaverse in the Age of AI

Jun 11, 2024

Crafting Your SaaS MVP Made Easy

Jun 4, 2024

How Arbisoft's Data Scraping Solutions Unlock Untapped Business Value

May 30, 2024

Insights from the community

Others also viewed

Tech Forecast 2017: 5 key technologies to double down on now

DDoS Protection Market Growth with Challenges, Competitive Market Share and Top Players by 2025

Understanding API Timeline and Security

Today's Tech Digest - Jun 24, 2019

Top Tech Trends for Small Businesses in 2019

How a Faulty Update Grounded the World

Into the cyber-way fray: RedShield

5 Data Security Architectural Trends Helping Combat the Digital ISIS

Crossing the Red Sea: Lessons from the Microsoft Outage

Ugly Truths about Enterprise IT - Ode to 2020

Explore topics