In today's era of data-driven digital society, there is a huge demand for optimized solutions that essentially reduce the cost of operation, thereby aiming to increase productivity. Processing a huge amount of data, like the Microarray based gene expression data, using machine learning and data mining algorithms has certain limitations in terms of memory and time requirements. This would be more concerning, when a dataset comes with redundant and non-important information. For example, many report-based medical datasets have several non-informative attributes which mislead the classification algorithms. To this end, researchers have been developing several feature selection algorithms that try to discard the redundant information from the raw datasets before feeding them to machine learning algorithms. Metaheuristic based optimization algorithms provide an excellent option to solve feature selection problems. In this paper, we propose a music-inspired harmony search (HS) algorithm based wrapper feature selection method. At the beginning, we use a chaotic mapping to initialize the population of the HS algorithm in order to better coverage of the search space. Further to complement the inferior exploitation of the HS algorithm, we integrate it with the Late Acceptance Hill Climbing (LAHC) method. Thus the combination of these two algorithms provides a good balance between the exploration and exploitation of the HS algorithm. We evaluate the proposed feature selection method on 15 UCI datasets and the obtained results are found to be better than many state-of-the-art methods both in terms of the classification accuracy and the number of features selected. To evaluate the effectiveness of our algorithm, we utilize a combination of precision, recall, F1 score, fitness value, and execution time as performance indicators. These metrics enable us to obtain a comprehensive assessment of the algorithm's abilities and limitations. We also apply our method on 3 microarray based gene expression datasets used for prediction of cancer to ensure the scalability and robustness as a feature selection method in real-life scenarios. In addition to this, we test our approach using the COVID-19 dataset, and it performs better than several metaheuristic based optimization techniques.
- COVID-19 data
- Feature selection
- Harmony search
- Late Acceptance Hill Climbing
- Microarray data