International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Call for Paper Volume 12 Issue 4 July-August 2024 Submit your research for publication

Integrating Data Versioning and Management into CI/CD Pipelines for Machine Learning

Authors: Swamy Prasadarao Velaga

DOI: https://doi.org/https://doi.org/10.5281/zenodo.12805518

Short DOI: https://doi.org/gt442h

Country: India

Full-text Research PDF File:   View   |   Download


Abstract: The rapid evolution and widespread adoption of machine learning (ML) applications have underscored the critical importance of data management practices that ensure reproducibility, reliability, and scalability in model development and deployment. Integrating data versioning and management into Continuous Integration and Continuous Deployment (CI/CD) pipelines for ML represents a pivotal strategy to address these challenges. This survey paper explores the significance of data versioning in CI/CD pipelines, examining key benefits such as enhanced reproducibility of experimental results, effective management of data drift, and compliance with regulatory standards. We delve into the challenges associated with integrating data versioning, including handling large dataset sizes, managing dynamic data sources, and ensuring compatibility across diverse data formats. Moreover, the paper discusses best practices and implementation strategies for adopting data versioning in CI/CD pipelines, emphasizing automation, scalability, and integration with Machine Learning Operations (MLOps). Finally, we outline promising future research directions in data versioning, including advancements in automation, security, and cross-domain collaboration, aimed at further enhancing the reliability and transparency of ML workflows. By addressing these aspects, this paper provides a comprehensive overview of current trends, challenges, and opportunities in leveraging data versioning to optimize CI/CD pipelines for machine learning applications

Keywords: Continuous Deployment, AI Systems, Machine Learning Models, Data Versioning


Paper Id: 230795

Published On: 2021-02-03

Published In: Volume 9, Issue 1, January-February 2021

Cite This: Integrating Data Versioning and Management into CI/CD Pipelines for Machine Learning - Swamy Prasadarao Velaga - IJIRMPS Volume 9, Issue 1, January-February 2021. DOI https://doi.org/10.5281/zenodo.12805518

Share this