Hadoop high availability through metadata replication

Feng Wang; Jie Qiu; Jie Yang; Bo Dong; Xinhui Li; Ying Li

doi:10.1145/1651263.1651271

CloudDB - CIKM 2009

Conference paper

01 Dec 2009

Hadoop high availability through metadata replication

View publication

Abstract

Hadoop is widely adopted to support data intensive distributed applications. Many of them are mission critical and require inherent high availability of Hadoop. Unfortunately, Hadoop has no high availability support yet, and it is not trivial to enhance Hadoop. Based on thorough investigation of Hadoop, this paper proposes a metadata replication based solution to enable Hadoop high availability by removing single point of failure in Hadoop. The solution involves three major phases: in initialization phase, each standby/slave node is registered to active/primary node and its initial metadata (such as version file and file system image) are caught up with those of active/primary node; in replication phase, the runtime metadata (such as outstanding operations and lease states) for failover in future are replicated; in failover phase, standby/new elected primary node takes over all communications. The solution presents several unique features for Hadoop, such as runtime configurable synchronization mode. The experiments demonstrate the feasibility and efficiency of our solution. Copyright 2009 ACM.

Conference paper