A review of major ICT failures and recovery strategies: Strengthening digital resilience

  • Amr Adel
  • , Noor H.S. Alani
  • , Tony Jan
  • , Mukesh Prasad

Research output: Contribution to journalReview articlepeer-review

Abstract

This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes. We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.

Original languageEnglish
Article number104678
JournalComputers and Security
Volume159
DOIs
Publication statusPublished - Dec 2025

Keywords

  • AI-driven recovery
  • Automation failures
  • Comparative review
  • Cybersecurity infrastructure
  • Digital resilience
  • ICT outages
  • Incident response

Fingerprint

Dive into the research topics of 'A review of major ICT failures and recovery strategies: Strengthening digital resilience'. Together they form a unique fingerprint.

Cite this