You are on page 1of 23

Contents

Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS

Overview of HBASE
What is Apache HBase?

App

MR

ZK

HDFS

Apache HBase is an
open source,
distributed, column
oriented, scalable,
consistent, low
latency, random
access nonrelational database
built on Apache
Hadoop

Overview of HBASE
Production Apache HBase Applications
Inbox
Storage
Web
Search
Analytics
Monitoring

MoreCaseStudiesathttp://www.hbasecon.com/agenda/
3

Overview of HBASE
Why HBase ?
HBase is a Bigtable clone.
It is open source
It has a good community and promise for the

future

It is developed on top of and has good

integration for the Hadoop platform.

Linear Scalability.
Automatic failover

Overview of HBASE
Why HBase ?
Consistent reads and writes.
Sharding of tables
Failover support
Classes for backing hadoop mapreduce jobs
Java API for client access
Thrift gateway and a REST-ful Web
Shell support

Contents

Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS

Column Oriented
Column oriented
databases
Databases
Acolumn-oriented DBMSis adatabase

management system(DBMS) that stores data tables


as sections of columns of data rather than as rows
of data.

The goal of a columnar database is to efficiently

write and read data to and from hard disk storage in


order to speed up the time it takes to return a query

A column-oriented architecture looks the same on

the surface, but stores data differently than


legacy/row-based database.

Column Oriented
Column vs.
row orientation
Databases

Column Oriented
Advantages
of Column
Databases
Database

One of the main benefits of a columnar database is that data

can be highlycompressed. The compression permits columnar


operations like MIN, MAX, SUM, COUNT and AVG to be
performed very rapidly.

Another benefit is that because a column-based DBMSs is self-

indexing, it uses less disk space than a relational database


management system (RDBMS) containing the same data.

Column architecture doesnt read unnecessary columns.


Avoids decompression costs and perform operations faster.
Use compression schemes allow us to lower our disk space

requirements.

Contents

Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS

10

HBASE Architecture

Contents

Overview of HBASE
Column Oriented Database
HBASE Architecture
HBASE Features
HBASE COMPONENTS

12

HBase Features

Auto sharding

13

HBase Features
Distribution

14

HBase Features

Auto sharding & Distribution

Unit of scalability in Hbase is region.


Sorted, contigious range of rows.
Spread randomly across region servers.
Moved around for load balancing and failover
Split automatically or manually to scale with

growing data

Capacity is solely a factor of cluster nodes vs.

regions per node.

15

HBase Features

Storage Separation

16

HBase Features

Storage Separation

Column Families allow for separation of data


Used By Columnar databases for fast analytical

queries, but on column level only

Allow different or no compression depending on

the content type.

Segragate information based on access pattern


Data is stored in one or more storage file, called

HFiles

17

Contents

Overview of HBASE
Column Oriented Database
HBase Architecture
HBase Features
HBase COMPONENTS

18

HBase Components
HMaster
Responsible

servers

for

monitoring

region

Redirect client to correct region servers


Master controls critical functions such as

RegionServer failover and completing


region splits. So while the cluster can
still run for a time without the Master,
the Master should be restarted as soon
as possible.

Is

the interface for all metadata


changes, it runs on the server which
hosts namenode.

19

HBase Components
Regionservers
Responsible for serving and managing

regions, its like a data node for Hbase.

These can be thought of Datanode for

Hadoop cluster. It serve the client request


for the data.

It handle the actual data storage and

request.

Send HeartBeat to Master


It consists of Regions or in better words

tables.

RegionServers are usually configured to run

on servers of HDFS DataNode. Running


RegionServer on the DataNode server has
the advantage of data locality too

20

HBase Components
Zookeeper
Zookeeper is an open source

software providing a highly reliable,


distributed coordination service

Entry point for an HBase system

It includes tracking of region servers,

where the root region is hosted

21

HBase Components
API
Interface to HBase
Using these we can we can access HBase

and perform read/write and other


operation on Hbase.

REST, Thrift, and Avro


Thrift API framework, for scalable cross-

language services development, combines


a software stack with a code generation
engine to build services that work
efficiently and seamlessly between C++,
Java, Python, PHP, Ruby, Erlang, Perl,
Haskell, C#, Cocoa, JavaScript, Node.js,
Smalltalk, OCaml and Delphi and other
languages.

22

Thank You
23

You might also like