TY - GEN
T1 - Mining Significant Features of Diabetes through Employing Various Classification Methods
AU - Nurjahan,
AU - Rony, Mohammad Abu Tareq
AU - Satu, Md Shahriare
AU - Whaiduzzaman, Md
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/2/27
Y1 - 2021/2/27
N2 - Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.
AB - Diabetes is a chronic disease that occurs when blood glucose becomes very high. It is responsible for a number of serious complications in an affected patients body. However, early detection of this harmful disease can reduce critical situations like death as well as minimize the chance of losing valuable organs due to this disease. The aim of this study is to construct a predictive model through examining several machine learning techniques namely Decision tree, K Nearest Neighbour, Naive Bayes, Support Vector Machine, Logistic Regression, extreme Gradient Boosting, Multi-Layer Perceptron and Random Forest on two different datasets of diabetes patients namely Pima Indian diabetes datasets and Sylhet Diabetes Hospital datasets. Several popular and effective feature subset selection procedures have also been utilized for eliminating unnecessary attributes. After analyzing the outputs of the work, it is seen that Random Forest delivers the highest accuracy (97.5%), F-measure (97.5%), Area under Receiver Operating Characteristic Curve (99.80%) for the Gain Ratio Attribute Evaluation feature subset selection technique in case of Sylhet hospital datasets. On the other hand, in case of Pima Indian datasets, Logistic Regression delivers the highest accuracy (77.7%), F-measure (77%) for Information Gain Attribute Evaluation and Area under Receiver Operating Curve (83%) for both of the techniques namely Correlation-based Feature Selection Subset Evaluation and Correlation Attribute Evaluation. However, In this study, 10 fold cross validation technique has been used for the performance measurement.
KW - Classifier
KW - Cross Validation
KW - Diabetes
KW - Feature Selection Technique
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=85104612386&partnerID=8YFLogxK
U2 - 10.1109/ICICT4SD50815.2021.9397006
DO - 10.1109/ICICT4SD50815.2021.9397006
M3 - Conference contribution
AN - SCOPUS:85104612386
T3 - 2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021 - Proceedings
SP - 240
EP - 244
BT - 2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021 - Proceedings
A2 - Faruque, Brig Gen Golam
A2 - Taher, Kazi Abu
A2 - Kaiser, M. Shamim
A2 - Uddin, Mohammed Nasir
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Conference on Information and Communication Technology for Sustainable Development, ICICT4SD 2021
Y2 - 27 February 2021 through 28 February 2021
ER -