TY - JOUR
T1 - Late acceptance hill climbing aided chaotic harmony search for feature selection
T2 - An empirical analysis on medical data
AU - Naskar, Anurup
AU - Pramanik, Rishav
AU - Hossain, S. K.Sabbir
AU - Mirjalili, Seyedali
AU - Sarkar, Ram
N1 - Funding Information:
The authors would like to thank the Center for Microprocessor Applications for Training Education and Research (CMATER) research laboratory of the Department of Computer Science and Engineering, Jadavpur University, Kolkata, India for providing infrastructural support to this work.
Publisher Copyright:
© 2023
PY - 2023/7/1
Y1 - 2023/7/1
N2 - In today's era of data-driven digital society, there is a huge demand for optimized solutions that essentially reduce the cost of operation, thereby aiming to increase productivity. Processing a huge amount of data, like the Microarray based gene expression data, using machine learning and data mining algorithms has certain limitations in terms of memory and time requirements. This would be more concerning, when a dataset comes with redundant and non-important information. For example, many report-based medical datasets have several non-informative attributes which mislead the classification algorithms. To this end, researchers have been developing several feature selection algorithms that try to discard the redundant information from the raw datasets before feeding them to machine learning algorithms. Metaheuristic based optimization algorithms provide an excellent option to solve feature selection problems. In this paper, we propose a music-inspired harmony search (HS) algorithm based wrapper feature selection method. At the beginning, we use a chaotic mapping to initialize the population of the HS algorithm in order to better coverage of the search space. Further to complement the inferior exploitation of the HS algorithm, we integrate it with the Late Acceptance Hill Climbing (LAHC) method. Thus the combination of these two algorithms provides a good balance between the exploration and exploitation of the HS algorithm. We evaluate the proposed feature selection method on 15 UCI datasets and the obtained results are found to be better than many state-of-the-art methods both in terms of the classification accuracy and the number of features selected. To evaluate the effectiveness of our algorithm, we utilize a combination of precision, recall, F1 score, fitness value, and execution time as performance indicators. These metrics enable us to obtain a comprehensive assessment of the algorithm's abilities and limitations. We also apply our method on 3 microarray based gene expression datasets used for prediction of cancer to ensure the scalability and robustness as a feature selection method in real-life scenarios. In addition to this, we test our approach using the COVID-19 dataset, and it performs better than several metaheuristic based optimization techniques.
AB - In today's era of data-driven digital society, there is a huge demand for optimized solutions that essentially reduce the cost of operation, thereby aiming to increase productivity. Processing a huge amount of data, like the Microarray based gene expression data, using machine learning and data mining algorithms has certain limitations in terms of memory and time requirements. This would be more concerning, when a dataset comes with redundant and non-important information. For example, many report-based medical datasets have several non-informative attributes which mislead the classification algorithms. To this end, researchers have been developing several feature selection algorithms that try to discard the redundant information from the raw datasets before feeding them to machine learning algorithms. Metaheuristic based optimization algorithms provide an excellent option to solve feature selection problems. In this paper, we propose a music-inspired harmony search (HS) algorithm based wrapper feature selection method. At the beginning, we use a chaotic mapping to initialize the population of the HS algorithm in order to better coverage of the search space. Further to complement the inferior exploitation of the HS algorithm, we integrate it with the Late Acceptance Hill Climbing (LAHC) method. Thus the combination of these two algorithms provides a good balance between the exploration and exploitation of the HS algorithm. We evaluate the proposed feature selection method on 15 UCI datasets and the obtained results are found to be better than many state-of-the-art methods both in terms of the classification accuracy and the number of features selected. To evaluate the effectiveness of our algorithm, we utilize a combination of precision, recall, F1 score, fitness value, and execution time as performance indicators. These metrics enable us to obtain a comprehensive assessment of the algorithm's abilities and limitations. We also apply our method on 3 microarray based gene expression datasets used for prediction of cancer to ensure the scalability and robustness as a feature selection method in real-life scenarios. In addition to this, we test our approach using the COVID-19 dataset, and it performs better than several metaheuristic based optimization techniques.
KW - Algorithm
KW - COVID-19 data
KW - Feature selection
KW - Harmony search
KW - Late Acceptance Hill Climbing
KW - Metaheuristics
KW - Microarray data
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=85149010090&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.119745
DO - 10.1016/j.eswa.2023.119745
M3 - Article
AN - SCOPUS:85149010090
SN - 0957-4174
VL - 221
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 119745
ER -