A comprehensive survey of machine learning and deep learning approaches for anomaly detection in high-performance computing systems

  • Cibin Ki
  • , Ramah Sivakumar
  • , Jaison Mulerikkal
  • , A. Binu
  • , Manish Gupta
  • , Tony Jan

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Anomaly detection is crucial in high-performance computing (HPC) systems for maintaining effective, efficient, and secure operations. This survey focuses on the current status of the application of machine learning and deep learning in HPC systems for detecting various types of anomalies, including performance anomalies, operational anomalies, and security anomalies. The study takes a thorough look at the current approaches using diversified machine learning and deep learning techniques, the significance and challenges that anomaly detection in HPC systems brings, as well as the factors that should be considered in determining the performance of the systems according to the research conducted. Additionally, it explores tools and frameworks created using these techniques, specifically tailored for HPC systems. Nevertheless, it also reveals the issues with existing models, and based on them, further research is suggested. Hence, the discoveries unveiled in this study will be helpful for researchers and professionals specializing in anomaly detection within HPC systems.

Original languageEnglish
Article number1032
JournalJournal of Supercomputing
Volume81
Issue number8
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Anomaly detection
  • Deep learning
  • High-performance computing
  • HPC systems
  • Machine learning

Fingerprint

Dive into the research topics of 'A comprehensive survey of machine learning and deep learning approaches for anomaly detection in high-performance computing systems'. Together they form a unique fingerprint.

Cite this