SMOTE in Predictive Modeling: A Comprehensive Evaluation of Synthetic Oversampling for Class Imbalance
Authors: Sandeep Yadav
DOI: https://doi.org/10.5281/zenodo.14259555
Short DOI: https://doi.org/g8s4gf
Country: USA
Full-text Research PDF File: View | Download
Abstract: Class imbalance is a pervasive challenge in predictive modeling, where minority class instances are significantly underrepresented, leading to biased models and suboptimal performance. Synthetic Minority Over-sampling Technique (SMOTE) is one of the most widely used solutions to address this issue by generating synthetic samples for the minority class. This study provides a comprehensive evaluation of SMOTE and its variants in handling class imbalance across diverse datasets and model types. We assess SMOTE’s impact on predictive performance, model generalizability, and stability under different imbalance ratios. Key SMOTE variations, including Borderline-SMOTE, SMOTE-ENN, and SMOTE-Tomek, are applied and analyzed across metrics such as precision, recall, F1-score, and area under the precision-recall curve (AUC-PR). Experimental results reveal that while SMOTE enhances model performance by providing additional decision boundaries for the minority class, certain variants, such as SMOTE-ENN, excel in high-dimensional spaces by removing noisy samples post-oversampling. However, SMOTE can introduce synthetic noise when applied without consideration of data density and class boundaries. Overall, findings highlight the effectiveness of SMOTE and its variants in improving minority class prediction but underscore the importance of selecting the appropriate variant based on dataset characteristics and desired performance metrics. This study provides practical guidance for data scientists and researchers on utilizing SMOTE for imbalanced datasets, promoting robust and fair predictive models in diverse real-world applications.
Keywords: Class Imbalance, Noise Reduction, Minority class, Undersampling, Oversampling, SMOTE (Synthetic Minority Over-sampling Technique), Data Augmentation Techniques, Precision-Recall Curve (AUC-PR),Model Generalizability
Paper Id: 231706
Published On: 2020-08-04
Published In: Volume 8, Issue 4, July-August 2020
Cite This: SMOTE in Predictive Modeling: A Comprehensive Evaluation of Synthetic Oversampling for Class Imbalance - Sandeep Yadav - IJIRMPS Volume 8, Issue 4, July-August 2020. DOI 10.5281/zenodo.14259555