TY - JOUR
T1 - Improved intelligent water drop-based hybrid feature selection method for microarray data processing
AU - Alhenawi, Esra'a
AU - Al-Sayyed, Rizik
AU - Hudaib, Amjad
AU - Mirjalili, Seyedali
N1 - Funding Information:
We would like to thank all persons who provided technical help, assisted in reviewing and editing the language of writing the manuscript.
Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/4
Y1 - 2023/4
N2 - Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
AB - Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
KW - High dimensional datasets
KW - Hybrid feature selection
KW - Intelligent water drop algorithm
KW - Machine learning
KW - Medical applications
UR - http://www.scopus.com/inward/record.url?scp=85146563131&partnerID=8YFLogxK
U2 - 10.1016/j.compbiolchem.2022.107809
DO - 10.1016/j.compbiolchem.2022.107809
M3 - Article
AN - SCOPUS:85146563131
SN - 1476-9271
VL - 103
JO - Computational Biology and Chemistry
JF - Computational Biology and Chemistry
M1 - 107809
ER -