Header menu link for other important links
Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept
S. Khire, P. Ganorkar, A. Apastamb,
Published in Springer Science and Business Media Deutschland GmbH
Volume: 58
Pages: 211 - 227
Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naïve Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naïve Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About the journal
JournalData powered by TypesetLecture Notes on Data Engineering and Communications Technologies
PublisherData powered by TypesetSpringer Science and Business Media Deutschland GmbH