TY - JOUR
T1 - Relevance-diversity algorithm for feature selection and modified Bayes for prediction
AU - Shaheen, M.
AU - Naheed, N.
AU - Ahsan, A.
N1 - Publisher Copyright:
© 2022 THE AUTHORS
PY - 2023
Y1 - 2023
N2 - Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. In these datasets, some features have a negligible connection with other features and some may be insignificant as their presence does not impact the results of big data analytics. The algorithms of big data analytics generate better classification models when supplied with a dataset consisting of relevant, important and informative features. These features can be classified as important and unimportant. For the selection of important features, different filtrations techniques are used. These techniques filter features on different basis like information gain, information dispersion, Gini index, etc. and have a few drawbacks reviewed in this paper. The first contribution of this paper is to propose a new feature selection technique named “Relevance-diversity algorithm” for selecting important features based on two measures i.e. relevance and diversity for optimizing features as low as possible and reducing the search time used in feature selection. The second contribution of the paper is that it proposes a new supervised classification algorithm based on Naive Bayes classification. The assumption of naive i.e. feature independence is discarded from the algorithm of Naive Bayes classification. The features are considered to be dependent on each other and their combined impact on the class value is evaluated. The newly proposed classification algorithm is then applied to the features selected through the relevance-diversity based feature selection technique. The datasets of Weather, Tic-Tac-Toe, Lenses, Balance-scale and CarEval are used for the evaluation of both the techniques. The results of the proposed feature selection method are compared with the existing methods and the results of Modified-Bayes are compared with the existing Naive Bayes algorithm. Analysis revealed that the proposed method performed better in terms of the number of features, accuracy and time complexity.
AB - Big data analytics uncovers hidden patterns through classification, prediction and reinforcement of big datasets. In these datasets, some features have a negligible connection with other features and some may be insignificant as their presence does not impact the results of big data analytics. The algorithms of big data analytics generate better classification models when supplied with a dataset consisting of relevant, important and informative features. These features can be classified as important and unimportant. For the selection of important features, different filtrations techniques are used. These techniques filter features on different basis like information gain, information dispersion, Gini index, etc. and have a few drawbacks reviewed in this paper. The first contribution of this paper is to propose a new feature selection technique named “Relevance-diversity algorithm” for selecting important features based on two measures i.e. relevance and diversity for optimizing features as low as possible and reducing the search time used in feature selection. The second contribution of the paper is that it proposes a new supervised classification algorithm based on Naive Bayes classification. The assumption of naive i.e. feature independence is discarded from the algorithm of Naive Bayes classification. The features are considered to be dependent on each other and their combined impact on the class value is evaluated. The newly proposed classification algorithm is then applied to the features selected through the relevance-diversity based feature selection technique. The datasets of Weather, Tic-Tac-Toe, Lenses, Balance-scale and CarEval are used for the evaluation of both the techniques. The results of the proposed feature selection method are compared with the existing methods and the results of Modified-Bayes are compared with the existing Naive Bayes algorithm. Analysis revealed that the proposed method performed better in terms of the number of features, accuracy and time complexity.
KW - Attributes Selection
KW - Classification
KW - Feature Selection
KW - Naive Bayes
KW - Relevance
UR - http://www.scopus.com/inward/record.url?scp=85142473233&partnerID=8YFLogxK
U2 - 10.1016/j.aej.2022.11.002
DO - 10.1016/j.aej.2022.11.002
M3 - Article
AN - SCOPUS:85142473233
SN - 1110-0168
JO - Alexandria Engineering Journal
JF - Alexandria Engineering Journal
ER -