Beyond Human Scrutiny: Unleashing Machine Learning on the Data Quality Frontier
Authors: Sainath Muvva
DOI: https://doi.org/10.5281/zenodo.14565883
Short DOI: https://doi.org/g8w63k
Country: USA
Full-text Research PDF File:
View |
Download
Abstract: In today's data-centric world, maintaining high-quality information is paramount, as flawed datasets can undermine analytical efforts, lead to misguided choices, and potentially destabilize entire systems. The field of machine learning (ML) presents sophisticated methodologies for identifying and rectifying a wide array of data quality challenges, including inaccuracies, gaps in information, inconsistent entries, and outliers. This research delves into the application of various ML paradigms - supervised, unsupervised, and hybrid approaches - in the pursuit of data excellence. We examine key strategies such as employing classification algorithms for error identification, utilizing regression techniques for filling data gaps, implementing clustering methods to pinpoint anomalies, and harnessing the power of deep learning for data transformation and enhancement. Additionally, our study addresses the practical hurdles and showcases real-world implementations where ML has significantly improved data quality management processes.
Keywords: Data Quality, Machine Learning, Data Imputation, Anomaly Detection, Data Cleaning, Supervised Learning, Unsupervised Learning, Regression Models, Clustering, Data Transformation, Record Deduplication, Missing Data, Entity Resolution, Data Consistency, Deep Learning, Data Quality Assurance, Data Quality Dimensions, Model Interpretability, Data Pipelines, Active Learning
Paper Id: 231927
Published On: 2024-10-08
Published In: Volume 12, Issue 5, September-October 2024