You are on page 1of 11

HBase – Coprocessors

Mingjie Lai, Trend Micro


HUG NYC,
Oct. 11, 2010
What are Coprocessors
● Inspired by Google Bigtable Coprocessors (Jeff
Dean's keynote talk at LADIS 09)
● Arbitrary code that runs at each tablet in table server
● High-level call interface for clients
– Calls addressed to rows or ranges of rows. coprocessor client library
resolves to actual locations
– Calls across multiple rows automatically split into multiple
parallelized RPC
● Very flexible model for building distributed services
– automatic scaling, load balancing, request routing for app
Current Status
● Umbrella case: HBASE-2000
● Coprocessor framework – HBASE-2001:
● Includes RegionObserver, CommandTarget, CP class
loading
● Code submitted for review
● Will be commited to TRUNK very soon, and 0.92 release
● Client side support – HBASE-2002:
● Dynamic RPC, between clients and region servers
● Code submitted for review
● Will be commited to TRUNK soon, and 0.92 release
● The first Coprocessor application – HBASE-3025 and
3045: Coprocessor based access control
● Code complete
RegionObserver
● If a coprocessor implements this interface, it will be
interposed in all region actions via upcalls
● Provides hooks for client side requests: HTable.get(), put(),
exists(), delete(), scannerOpen(), checkAndPut(), etc.
● Chaining of multiple observers (by priority)
● The first coprocessors application – HBase access control –
is built on top of it
● More extensions can be built on top of RegionObserver
– Secondary indexes
– Filters
● How to develop a RegionObserver
● No new client API defined for RegionObserver
● Need to implement RegionObserver interface and override
upcall methods: preGet(), postGet(), prePut(), postPut(), etc.
RegionObserver

Client requests Region server CP framework RegionObserver


CommandTarget
● CommandTarget with Dynamic RPC provides a way to
define one's own protocol communicated between
client and region server, and execute arbitrary code at
region server
● CommandTarget methods are triggered by calling
dynamic RPC client side method –
Htable.coprocessorExec(...), etc.
● How to develop
● Defines protocol interface (extends CoprocessorProtocol)
● Implements this protocol interface
● Extend BaseCommandTarget: protocol will be automatically
registered at coprocessor load
● On client side, the CommandTarget can be triggered by:
– HTable.coprocessorProxy() - single region
– HTable.coprocessorExec() - region range
Dynamic RPC: a sample
Given CoprocessorProtocol:
public interface CountProtocol extends CoprocessorProtocol {
int getRowCount();
}
Coprocessors Class Loading
● Load from configuration: set coprocessors class
names in HBase configuration
● hbase.coprocessor.default.classes
● Class names are comma seperated
● They will be picked up when region is opened, as default
coprocessors
● Load from table attributes
● Utilize table attribute: a path (e.g. HDFS URI) to jar file
● Loaded when region is opened
● We can utilize CommandTarget to have a way to load
coprocessors on demand
● Security is the biggest concern
Next Steps
● See HBASE-2000 and subtasks
● Framework
● MapReduce
– Runs concurrently on all regions of the table
– Like Hadoop MapReduce: Mappers, reducers, partitioners,
intermediates
– Not table MapReduce, parallel region MapReduce
● Code weaving
– Allow arbitrary code execution right now
– Use a rewriting framework like ASM to weave in policies at load time
– Improve fault isolation and system integrity protections
– Wrap heap allocations to enforce limits
– Monitor CPU time
– Reject APIs considered unsafe
● On demand Coprocessors class loading
Next Steps
● Applications
● HBase access control: HBASE-3025, HBASE-3045.
● Aggregate: HBASE-1512
● Region level indexing: HBASE-2038
● Table metacolumns: HBASE-2893
● Secondary indexing?
● New Filtering?
Q&A

You might also like