Effective utilization of Machine Learning Techniques to Classify Breast Cancer Tumors

Kamath G.P.; Anuradha Chetan Phadke

Breast Cancer occurs when alterations called mutations to take place in the genes that cause anomalous cell advancement in the breast. One of the ways to achieve success in this field of cancer is by digging deep into machine learning techniques to diagnose the disease better as well as attempt to cure it. This paper aims at identifying breast cancer tumors fast and efficiently. The system suggested in the research uses the Wisconsin Breast Cancer Dataset, which was downloaded from the UCI repository, and allows binary classification, classifying tumors as malignant or benign. Techniques used to implement classification are Support Vector Machines and Random Forest. To comprehend the trends and patterns in the Wisconsin Breast Cancer Dataset, a thorough data visualization of the dataset has been conducted. The system employs data processing techniques to retrieve useful data, followed by Principal Component Analysis to carry out feature extraction. For SVM, to reiterate through the predefined hyperparameters, Grid Search CV has been implemented. For the Random Forest algorithm, k-fold cross-validation has been applied to achieve a unique set of results. The highest accuracy achieved using the random forest algorithm is 99.7% and the same for SVM is 98.2%. The following algorithms have been highlighted since their implementation has helped to retrieve significant accuracy levels. The models have been evaluated by computing the precision, recall score, f1 score, and confusion matrix. Models have also been compared using truepositive rate, true negative rate, false positive rate, and false negative rate. © 2022 IEEE.

Journal	2022 IEEE Pune Section International Conference, PuneCon 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
Open Access	No