Professional Documents
Culture Documents
Fatehjeet Sra
CSE, IIT Mandi
TEAM:
Khushpreet Singh Ritish Rana
CSE, IIT Mandi
Aim
build a prototype of a cost-effective cluster of distributed GraphChi nodes that
can perform large-scale graph computations with performance and latency
bounds.
Objectives
configure Neo4j ( also disk-based ) on a machine and run a multi-hop query with
a large data-set.
same query would be run on a configured GraphChi node.
Performance Comparison of Neo4j and GraphChi.
develop a cluster of GraphChi nodes and port it on small factor nodes such as
Raspberry Pis (or ARM processor chips). (Raspberry Pi is a low cost, credit-card
sized computer)
Finally, test the performance of our cluster by running the same multi-hop query
and come out with best possible results.
Configuring Neo4j
Download the latest release fromhttp://neo4j.com/download
Linux Service:
sudo
sudo
sudo
sudo
./bin/neo4j-installer install
service neo4j-service status
service neo4j-service start
service neo4j-service stop
Server Start:
./bin/neo4j console
Neo4j Shell:
./bin/neo4j-shell readonly path/to/neo4j-db
Configuring GraphChi
Headers-only (no installation required)
Makefile was run using make apps
Compiled executables in bin/apps/
Graph Datasets
.Reference: http://snap.stanford.edu/data/
Format Conversion
GraphChi reads graphs in
EdgeListFormat :
<src> <dst> <value>
AdjListFormat: <src> <listcount> <d1> <d2> <d3>
Execution on GraphChi
Build and Run
bin/apps/pagerank file GRAPH-NAME
If the graph has not been preprocessed, the program will ask for the
format of the graph (edgelist or adjlist)
Algorithms Run
PageRank
application prints the ids of the top 20 vertices with highest pagerank.
Connected Components
app produces output GRAPHNAME_components.txt, which on each line has
<Component ID>, <No_of_Vertices>
Execution on Neo4j
Interaction with database is done using CYPHER which can be
used from the Neo4j Shell or browser based platform.
Sample Query:
MATCH (n) RETURN (n) LIMIT 500 ;
Performance Comparison
Data: Twitter Social NW [81,306 ,
1,768,149] <Directed>
GraphChi
Query: Pagerank
Format: EdgeList
Neo4j
Performance Comparison
Data: Twitter Social NW [81,306 ,
1,768,149] <Directed>
GraphChi
Query: Connected
Components
Query:
Format: EdgeList
Neo4j
Performance Comparison
Data: CA Road NW [1,965,206 , 5,533,214]
<UnDirected>
GraphChi
Neo4j
Pretty Fast!
Fast Enough!
Crashed!
Edges:
Inference
GraphChi has a slight edge over Neo4j in terms of performance.
Neo4j GUI cant handle large number of vertices (>1000)
Computation inefficient for relatively large graphs.
Cypher queries relatively slower.
Future Plan
Even Semester ( Feb-May 2015)
Develop a cluster of networked GraphChi nodes.
Port the cluster on Rasberry Pis.
Run the same query on the cluster.
Analyze and Report Results.