Automated Data Science Workflow: Enhancing Data Cleaning & Preprocessing
Authors: Parag Patil, Akshat Patil, Ojas Ambekar, Aayush Bairagi, Prof. Shubhangi Nirgide
Country: India
Full-text Research PDF File:
View |
Download
Abstract: In the field of machine learning, data preprocessing plays a crucial role in building efficient and accurate models. However, preprocessing can be time-consuming and error-prone, especially for developers handling large numerical datasets. This project proposes the development of an intelligent tool designed specifically for machine learning practitioners, enabling them to upload numerical CSV datasets and automatically perform essential preprocessing tasks. Upon upload, the system intelligently detects and classifies each column type (e.g., continuous, categorical, binary), identifies and handles missing values, normalizes or scales numerical data, encodes categorical variables if necessary, and removes irrelevant or duplicate features. By automating these basic yet critical preprocessing steps, the tool aims to reduce manual effort, minimize human error, and accelerate the overall machine learning pipeline. This solution is intended to assist both novice and experienced developers by providing a clean, ready-to-use dataset, thus enabling them to focus more on model building and evaluation.
Keywords:
Paper Id: 232413
Published On: 2025-04-24
Published In: Volume 13, Issue 2, March-April 2025