Skip to main navigation Skip to search Skip to main content

A hybrid DEA–fuzzy clustering approach for accurate reference set identification

  • S. Fanati Rashidi
  • , M. Olfati
  • , S. Mirjalili
  • , C. Grosan
  • , J. Platoš
  • , V. Snášel

Research output: Contribution to journalArticlepeer-review

Abstract

This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings. © 2025 The Author(s).
Original languageEnglish
Article number100818
JournalMachine Learning with Applications
Volume23
DOIs
Publication statusPublished - Mar 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being
  2. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy
  3. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure
  4. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities
  5. SDG 12 - Responsible Consumption and Production
    SDG 12 Responsible Consumption and Production
  6. SDG 13 - Climate Action
    SDG 13 Climate Action
  7. SDG 17 - Partnerships for the Goals
    SDG 17 Partnerships for the Goals

Keywords

  • Data Envelopment Analysis (DEA)
  • Efficiency Measurement
  • Fuzzy C-Means (FCM)
  • Machine Learning
  • Possibilistic Fuzzy C-Means (PFCM)
  • Reference Sets
  • Clustering algorithms
  • Decision making
  • Efficiency
  • Fuzzy clustering
  • Identification (control systems)
  • Large datasets
  • Learning systems
  • Machine learning
  • C-means
  • Data envelopment
  • Data envelopment analyze
  • Efficiency measurement
  • Fuzzy C-mean
  • Machine-learning
  • Possibilistic
  • Possibilistic fuzzy C-mean
  • Reference set
  • Data envelopment analysis

Fingerprint

Dive into the research topics of 'A hybrid DEA–fuzzy clustering approach for accurate reference set identification'. Together they form a unique fingerprint.

Cite this