HBase Replication

A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak

Optimizing HBase I/O for Large Scale Hadoop Implementations

Leverage HBase Cache and Improve Read Performance

Cloudera Engineering Blog

History of HBase. Google published a paper on Big Table in the year and in the end ofthe HBase development started. An initial HBase prototype was created as Hadoop contrib in the year and the first usable HBase was released in end.

Write Ahead Log (WAL) The WAL is a log file that records all changes to data until the data is successfully written to disk (MemStore is flushed). This protects against data loss in the event of a failure before MemStore contents are written to disk. HBase Architecture (cont.) • Based on Log-Structured Merge-Trees (LSM-Trees) • Inserts are done in write-ahead log first • Data is stored in memory and flushed to disk on regular intervals or based on size • Small flushes are merged in the background to keep number of files small • Reads read memory stores first and then disk based.

In the recent blog post about the Apache HBase Write Path, we talked about the write-ahead-log (WAL), which plays an important role in preventing data loss should a HBase region server failure occur. This blog post describes how HBase prevents data loss after a region server crashes, using an especially critical process for recovering lost updates called log.

If your data is already in an HBase cluster, replication is useful for getting the data into additional HBase clusters. In HBase, cluster replication refers to keeping one cluster state synchronized with that of another cluster, using the write-ahead log (WAL) of the source cluster to propagate the changes.

How does HBase write performance differ from write performance in Cassandra with consistency level ALL?

Apache HBase

server responds with an ack as soon as it updates its in-memory data structure and flushes the update to its write-ahead commit log. In older versions of HBase, the log was configured in a similar manner to Cassandra to flush .

hadoop - Poor write Performance by HBase client - Stack Overflow