Professional Documents
Culture Documents
Short Bio
Research interests:
Collaborators
Outline
11
Proposed work
13
Capacity
Single unit
size limit
Latency
Scalability
Resilience
Query &
indexing
File systems
Very large
Large
O(10) ms
Low
Medium
No
SQL
Databases
Large
Various small
O(10) ms
Very low
Very high
(ACID)
Best
NoSQL data
stores
Large
Various small
O(0.1~1) ms
High
High
Good
14
Limitations of Current
NoSQL DBs
Performance
Portability
High Performance
Low Latency
High Throughput
1 Journal paper
A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing,
Journal of Concurrency and Computation Practice and Experience (CCPE) 2015
5 Conference papers
A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE Big Data
2015
Distributed NoSQL Storage for Extreme-Scale System Services, SC 15, PhD showcase
A Dynamically Scalable Cloud Data infrastructure for Sensor Networks, ScienceCloud 2015
Scalable State Management for Scientific Applications in the Cloud, IEEE BigData 2014
ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table,
IEEE IPDPS 2013
3 research posters
GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015
A Cloud-based Interactive Data Infrastructure for Sensor Networks, IEEE/ACM SC 2014
Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRIC PER 2011
17
Co-author peer-reviewed
Publications
2 Journal papers
4 Conference papers
Overcoming Hadoop scaling limitations through distributed task execution, IEEE Cluster
2015
FaBRiQ: Leveraging Distributed Hash Tables towards Distributed Publish-Subscribe
Message Queues, IEEE/ACM BDC 2015
FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale
High-Performance Computing Systems, IEEE Big Data 2014
Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE Big Data
2014
18
ZHT Features
Unconventional operations
Append,Compare_swap,Change_callback
Fault tolerance
Replication
Strong or eventual consistency
21
Node Node
Node
n
1
2
Node
n-1
...
Client 1 n
Key Key
j
k
Value k
Replica
3
Value k
Replica
1 Value k
Replica
2
hash
hash
Value j
Replica
2
Value j
Value j Replica
Replica
1
3
22
ZHT architecture
Topology
Server architecture
23
Name
Impl.
Routing
Time
Persistence
Dynamic
membership
Additional
Operation
Cassandra
C-MPI
Java
Log(N)
Yes
Yes
No
C/MPI
Log(N)
No
No
No
Dynamo
Java
0 to
Log(N)
Yes
Yes
No
Memcached
ZHT
No
No
No
0 to 2
Yes
Yes
Yes
C++
24
Evaluation
Test beds
Commodity Cluster
Up to 64-nodes
Amazon EC2
Performance on supercomputer:
Intrepid, IBM BG/P
2.5
Latency (ms)
1.5
0.5
Number of Nodes
26
Performance on a cloud:
Amazon EC2
14
Latency (ms)
12
10
8
DynamoDB
2
0
1
16
Node number
32
64
96
27
Performance on a commodity
cluster: HEC
3
Latency (ms)
2.5
2
ZHT
Cassandra
Memcached
1.5
1
0.5
0
1
8
16
Scale (# of nodes)
32
64
28
Efficiency:
Simulation with 1M nodes
100%
Efficiency (ZHT)
Efficiency (Simulation)
Difference
Efficiency (Memcached)
Efficiency
80%
60%
40%
20%
0%
14% 15%
0% 1%
4%
1% 2%
1% 1% 2% 2%
10%
4%
29
Client
Proxy
Client
Proxy
Client
Client
Client
Client
Proxy
Client
Proxy
Physical
server
K/V Store
Server
Physical
server
K/V Store
Server
Physical
server
K/V Store
Server
Physical
server
Check
results
Request Handler
Response
buer
B1
K/V Store
Server
Choose
strategy
B2
B3
Check
condition
Condition Monitor
& Sender
Bn-1
K K K K
Bn
Batch buckets
Batching
Strategy
Engine
K K
V V
V V V V
Plugin
Plugin
Plugin
Unpack and
insert
Latency
Feedback
Result
Service
Data Center
Sending batches
Returned batch
results
31
16000%
13,417%
10,847%
0.2
12000%
10000%
8000%
6000%
3,997%
4000%
2000%
0.0
Throughput)in)ops/s)
14000%
0.4
0.6
0.8
1.0
Workload patterns
1,519%
0.1
1,940%
0.5
5.0
50.0
Request latency in ms
0%
No%batching%
Pa8ern%1%
Pa8ern%2%
Workload)QoS)pa5erns)
Throughput
Pa8ern%3%
Pa8ern%4%
34
Metadata performance:
Weak scaling: creating 10k empty files in 1 directory on each node
35
36
Matrix
Falkon
37
WaggleDB Architecture
39
DOE DE-AC02-05CH11231
NSF No. 0910812
40
FRIEDA-State architecture
41
Conclusion
Lessons Learned
Decentralized architecture
High Performance
Excellent scalability
Improved fault tolerance
Simplicity
Light-weight design
Little reliance on complex software stack
Easy adoption as building block for more complex systems
43
Scalability
Fault tolerance
44
Thank you!
Q&A
45