Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
|Journal||Computational Biology and Chemistry|
|Publication status||Published - Apr 2023|
- High dimensional datasets
- Hybrid feature selection
- Intelligent water drop algorithm
- Machine learning
- Medical applications