Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept

S. Khire; P. Ganorkar; A. Apastamb; Suja Sreejith Panicker

doi:10.1007/978-981-15-9647-6_17

Profiles Research Units Publications

Book Chapter

Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept

S. Khire, P. Ganorkar, A. Apastamb,

Published in Springer Science and Business Media Deutschland GmbH

2021

DOI: 10.1007/978-981-15-9647-6_17

Volume: 58

Pages: 211 - 227

Abstract

Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naïve Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naïve Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Topics: Naive Bayes classifier (62)%, Random forest (59)%, Statistical classification (58)%, Decision tree learning (56)% and Decision tree (53)%

View more info for "Investigating the Impact of Data Analysis and Classification on Parametric and Nonparametric Machine Learning Techniques: A Proof of Concept"

About the journal

Journal	Data powered by SciSpaceLecture Notes on Data Engineering and Communications Technologies
Publisher	Data powered by SciSpaceSpringer Science and Business Media Deutschland GmbH
ISSN	23674512

Authors (1)

Suja Sreejith Panicker
- School of Computer Engineering & Technology
- Engineering and Technology

ABOUT

ACADEMICS

@MIT-WPU

ADMISSIONS/ PLACEMENTS

MISCELLANEOUS