Mining Significant Features of Diabetes through Employing Various Classification Methods

Nurjahan, Mohammad Abu Tareq Rony, Md Shahriare Satu, Md Whaiduzzaman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.

Original languageEnglish
Title of host publication2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021 - Proceedings
EditorsBrig Gen Golam Faruque, Kazi Abu Taher, M. Shamim Kaiser, Mohammed Nasir Uddin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages240-244
Number of pages5
ISBN (Electronic)9781665414609
DOIs
Publication statusPublished - 27 Feb 2021
Event2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021 - Dhaka, Bangladesh
Duration: 27 Feb 202128 Feb 2021

Publication series

Name2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021 - Proceedings

Conference

Conference2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021
Country/TerritoryBangladesh
CityDhaka
Period27/02/2128/02/21

Keywords

  • Classifier
  • Cross Validation
  • Diabetes
  • Feature Selection Technique
  • Machine Learning

Fingerprint

Dive into the research topics of 'Mining Significant Features of Diabetes through Employing Various Classification Methods'. Together they form a unique fingerprint.

Cite this