Coprocessor client library resolves to actual locations Calls across multiple rows automatically split into multiple parallelized RPC automatic scaling, load balancing, request routing for app very flexible model for building distributed services - Current Status Includes RegionObserver, CommandTarget, CP class loading code submitted for review will be commited to TRUNK very soon, and 0. Release code complete client side support - HBASE-2002: the first coprocessor application - HBASE access control - is built on top of it
Coprocessor client library resolves to actual locations Calls across multiple rows automatically split into multiple parallelized RPC automatic scaling, load balancing, request routing for app very flexible model for building distributed services - Current Status Includes RegionObserver, CommandTarget, CP class loading code submitted for review will be commited to TRUNK very soon, and 0. Release code complete client side support - HBASE-2002: the first coprocessor application - HBASE access control - is built on top of it
Coprocessor client library resolves to actual locations Calls across multiple rows automatically split into multiple parallelized RPC automatic scaling, load balancing, request routing for app very flexible model for building distributed services - Current Status Includes RegionObserver, CommandTarget, CP class loading code submitted for review will be commited to TRUNK very soon, and 0. Release code complete client side support - HBASE-2002: the first coprocessor application - HBASE access control - is built on top of it
HUG NYC, Oct. 11, 2010 What are Coprocessors ● Inspired by Google Bigtable Coprocessors (Jeff Dean's keynote talk at LADIS 09) ● Arbitrary code that runs at each tablet in table server ● High-level call interface for clients – Calls addressed to rows or ranges of rows. coprocessor client library resolves to actual locations – Calls across multiple rows automatically split into multiple parallelized RPC ● Very flexible model for building distributed services – automatic scaling, load balancing, request routing for app Current Status ● Umbrella case: HBASE-2000 ● Coprocessor framework – HBASE-2001: ● Includes RegionObserver, CommandTarget, CP class loading ● Code submitted for review ● Will be commited to TRUNK very soon, and 0.92 release ● Client side support – HBASE-2002: ● Dynamic RPC, between clients and region servers ● Code submitted for review ● Will be commited to TRUNK soon, and 0.92 release ● The first Coprocessor application – HBASE-3025 and 3045: Coprocessor based access control ● Code complete RegionObserver ● If a coprocessor implements this interface, it will be interposed in all region actions via upcalls ● Provides hooks for client side requests: HTable.get(), put(), exists(), delete(), scannerOpen(), checkAndPut(), etc. ● Chaining of multiple observers (by priority) ● The first coprocessors application – HBase access control – is built on top of it ● More extensions can be built on top of RegionObserver – Secondary indexes – Filters ● How to develop a RegionObserver ● No new client API defined for RegionObserver ● Need to implement RegionObserver interface and override upcall methods: preGet(), postGet(), prePut(), postPut(), etc. RegionObserver
Client requests Region server CP framework RegionObserver
CommandTarget ● CommandTarget with Dynamic RPC provides a way to define one's own protocol communicated between client and region server, and execute arbitrary code at region server ● CommandTarget methods are triggered by calling dynamic RPC client side method – Htable.coprocessorExec(...), etc. ● How to develop ● Defines protocol interface (extends CoprocessorProtocol) ● Implements this protocol interface ● Extend BaseCommandTarget: protocol will be automatically registered at coprocessor load ● On client side, the CommandTarget can be triggered by: – HTable.coprocessorProxy() - single region – HTable.coprocessorExec() - region range Dynamic RPC: a sample Given CoprocessorProtocol: public interface CountProtocol extends CoprocessorProtocol { int getRowCount(); } Coprocessors Class Loading ● Load from configuration: set coprocessors class names in HBase configuration ● hbase.coprocessor.default.classes ● Class names are comma seperated ● They will be picked up when region is opened, as default coprocessors ● Load from table attributes ● Utilize table attribute: a path (e.g. HDFS URI) to jar file ● Loaded when region is opened ● We can utilize CommandTarget to have a way to load coprocessors on demand ● Security is the biggest concern Next Steps ● See HBASE-2000 and subtasks ● Framework ● MapReduce – Runs concurrently on all regions of the table – Like Hadoop MapReduce: Mappers, reducers, partitioners, intermediates – Not table MapReduce, parallel region MapReduce ● Code weaving – Allow arbitrary code execution right now – Use a rewriting framework like ASM to weave in policies at load time – Improve fault isolation and system integrity protections – Wrap heap allocations to enforce limits – Monitor CPU time – Reject APIs considered unsafe ● On demand Coprocessors class loading Next Steps ● Applications ● HBase access control: HBASE-3025, HBASE-3045. ● Aggregate: HBASE-1512 ● Region level indexing: HBASE-2038 ● Table metacolumns: HBASE-2893 ● Secondary indexing? ● New Filtering? Q&A
Maximum Availability For MySQL: InnoDB With Synchronous Replication, Automated Failover, Full Data Consistency, Simplified Management, and Industry-Leading Performance