Abstract
Anomaly detection is crucial in high-performance computing (HPC) systems for maintaining effective, efficient, and secure operations. This survey focuses on the current status of the application of machine learning and deep learning in HPC systems for detecting various types of anomalies, including performance anomalies, operational anomalies, and security anomalies. The study takes a thorough look at the current approaches using diversified machine learning and deep learning techniques, the significance and challenges that anomaly detection in HPC systems brings, as well as the factors that should be considered in determining the performance of the systems according to the research conducted. Additionally, it explores tools and frameworks created using these techniques, specifically tailored for HPC systems. Nevertheless, it also reveals the issues with existing models, and based on them, further research is suggested. Hence, the discoveries unveiled in this study will be helpful for researchers and professionals specializing in anomaly detection within HPC systems.
| Original language | English |
|---|---|
| Article number | 1032 |
| Journal | Journal of Supercomputing |
| Volume | 81 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Jun 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
Keywords
- Anomaly detection
- Deep learning
- High-performance computing
- HPC systems
- Machine learning
Fingerprint
Dive into the research topics of 'A comprehensive survey of machine learning and deep learning approaches for anomaly detection in high-performance computing systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver