Dual ETL – Hadoop Cluster Auto Failover
Authors: Sainath Muvva
DOI: https://doi.org/10.5281/zenodo.14280339
Short DOI: https://doi.org/g8ttk9
Country: USA
Full-text Research PDF File:
View |
Download
Abstract: This paper examines the design of data infrastructure for high-speed delivery, focusing on the 4 V's of big data and the importance of geographically separated primary and Disaster Recovery clusters. It explores the complexities of the failover process of Hadoop clusters, identifying challenges such as manual metadata updates and data quality checks. The research proposes automation solutions, including the use of DistCp for data replication and Hive commands for metadata updates, aiming to enhance data infrastructure resilience and reduce manual intervention during critical events.
Keywords: ETL, Hadoop, Distcp, Data Quality
Paper Id: 231748
Published On: 2019-09-03
Published In: Volume 7, Issue 5, September-October 2019
Cite This: Dual ETL – Hadoop Cluster Auto Failover - Sainath Muvva - IJIRMPS Volume 7, Issue 5, September-October 2019. DOI 10.5281/zenodo.14280339