Abstract
This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings. © 2025 The Author(s).
| Original language | English |
|---|---|
| Article number | 100818 |
| Journal | Machine Learning with Applications |
| Volume | 23 |
| DOIs | |
| Publication status | Published - Mar 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
-
SDG 7 Affordable and Clean Energy
-
SDG 9 Industry, Innovation, and Infrastructure
-
SDG 11 Sustainable Cities and Communities
-
SDG 12 Responsible Consumption and Production
-
SDG 13 Climate Action
-
SDG 17 Partnerships for the Goals
Keywords
- Data Envelopment Analysis (DEA)
- Efficiency Measurement
- Fuzzy C-Means (FCM)
- Machine Learning
- Possibilistic Fuzzy C-Means (PFCM)
- Reference Sets
- Clustering algorithms
- Decision making
- Efficiency
- Fuzzy clustering
- Identification (control systems)
- Large datasets
- Learning systems
- Machine learning
- C-means
- Data envelopment
- Data envelopment analyze
- Efficiency measurement
- Fuzzy C-mean
- Machine-learning
- Possibilistic
- Possibilistic fuzzy C-mean
- Reference set
- Data envelopment analysis
Fingerprint
Dive into the research topics of 'A hybrid DEA–fuzzy clustering approach for accurate reference set identification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver