International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Call for Paper Volume 13 Issue 1 January-February 2025 Submit your research for publication

SMOTE in Predictive Modeling: A Comprehensive Evaluation of Synthetic Oversampling for Class Imbalance

Authors: Sandeep Yadav

DOI: https://doi.org/10.5281/zenodo.14259555

Short DOI: https://doi.org/g8s4gf

Country: USA

Full-text Research PDF File:   View   |   Download


Abstract: Class imbalance is a pervasive challenge in predictive modeling, where minority class instances are significantly underrepresented, leading to biased models and suboptimal performance. Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used solutions to address this issue by generating synthetic samples for the minority class. This study provides a comprehensive evaluation of SMOTE and its variants in handling class imbalance across diverse datasets and model types. We assess SMOTE’s impact on predictive performance, model generalizability, and stability under different imbalance ratios. Key SMOTE variations, including Borderline-SMOTE, SMOTE-ENN, and SMOTE-Tomek, are applied and analyzed across metrics such as precision, recall, F1-score, and area under the precision-recall curve (AUC-PR). Experimental results reveal that while SMOTE enhances model performance by providing additional decision boundaries for the minority class, certain variants, such as SMOTE-ENN, excel in high-dimensional spaces by removing noisy samples post-oversampling. However, SMOTE can introduce synthetic noise when applied without consideration of data density and class boundaries. Overall, findings highlight the effectiveness of SMOTE and its variants in improving minority class prediction but underscore the importance of selecting the appropriate variant based on dataset characteristics and desired performance metrics. This study provides practical guidance for data scientists and researchers on utilizing SMOTE for imbalanced datasets, promoting robust and fair predictive models in diverse real-world applications.

Keywords: Class Imbalance, Noise Reduction, Minority class, Undersampling, Oversampling, SMOTE (Synthetic Minority Over-sampling Technique), Data Augmentation Techniques, Precision-Recall Curve (AUC-PR),Model Generalizability


Paper Id: 231706

Published On: 2020-08-04

Published In: Volume 8, Issue 4, July-August 2020

Cite This: SMOTE in Predictive Modeling: A Comprehensive Evaluation of Synthetic Oversampling for Class Imbalance - Sandeep Yadav - IJIRMPS Volume 8, Issue 4, July-August 2020. DOI 10.5281/zenodo.14259555

Share this