International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Call for Paper Volume 13 Issue 2 March-April 2025 Submit your research for publication

Automated Data Science Workflow: Enhancing Data Cleaning & Preprocessing

Authors: Parag Patil, Akshat Patil, Ojas Ambekar, Aayush Bairagi, Prof. Shubhangi Nirgide

Country: India

Full-text Research PDF File:   View   |   Download


Abstract: In the field of machine learning, data preprocessing plays a crucial role in building efficient and accurate models. However, preprocessing can be time-consuming and error-prone, especially for developers handling large numerical datasets. This project proposes the development of an intelligent tool designed specifically for machine learning practitioners, enabling them to upload numerical CSV datasets and automatically perform essential preprocessing tasks. Upon upload, the system intelligently detects and classifies each column type (e.g., continuous, categorical, binary), identifies and handles missing values, normalizes or scales numerical data, encodes categorical variables if necessary, and removes irrelevant or duplicate features. By automating these basic yet critical preprocessing steps, the tool aims to reduce manual effort, minimize human error, and accelerate the overall machine learning pipeline. This solution is intended to assist both novice and experienced developers by providing a clean, ready-to-use dataset, thus enabling them to focus more on model building and evaluation.

Keywords:


Paper Id: 232413

Published On: 2025-04-24

Published In: Volume 13, Issue 2, March-April 2025

Share this