International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences
E-ISSN: 2349-7300Impact Factor - 9.907

A Widely Indexed Open Access Peer Reviewed Online Scholarly International Journal

Call for Paper Volume 13 Issue 1 January-February 2025 Submit your research for publication

Dual ETL – Hadoop Cluster Auto Failover

Authors: Sainath Muvva

DOI: https://doi.org/10.5281/zenodo.14280339

Short DOI: https://doi.org/g8ttk9

Country: USA

Full-text Research PDF File:   View   |   Download


Abstract: This paper examines the design of data infrastructure for high-speed delivery, focusing on the 4 V's of big data and the importance of geographically separated primary and Disaster Recovery clusters. It explores the complexities of the failover process of Hadoop clusters, identifying challenges such as manual metadata updates and data quality checks. The research proposes automation solutions, including the use of DistCp for data replication and Hive commands for metadata updates, aiming to enhance data infrastructure resilience and reduce manual intervention during critical events.

Keywords: ETL, Hadoop, Distcp, Data Quality


Paper Id: 231748

Published On: 2019-09-03

Published In: Volume 7, Issue 5, September-October 2019

Cite This: Dual ETL – Hadoop Cluster Auto Failover - Sainath Muvva - IJIRMPS Volume 7, Issue 5, September-October 2019. DOI 10.5281/zenodo.14280339

Share this