You are on page 1of 8

Location-based Schema Caching for Client-peer Query Routing in

Super-peer Networks
Rozlina Mohamed 1, 2 and Christopher D. Buckingham1

1
Aston University, Computer Science Department, B4 7ET, UK
2
University Malaysia Pahang, Software Engineering Department, 15150 Malaysia

In this paper, we present the construction of location-based schema caching for query routing at the client-peer in super-peer
networks. This cached information is used for directly routing subsequent repeated queries towards their actual resource locations
without going via the super peer. Instead of caching the previous query and its result, our proposed approach caches the query with
the routing direction, which is the resource location of the previous queries. This means outdated results are avoided. The paper
describes the main processes, including details of the algorithm required for constructing the cached information.

Index Terms— query caching, query rewriting, query routing, super-peer network, peer-to-peer systems

I. INTRODUCTION its routing management peer information and query routing. 
Peer­to­peer (P2P) networks are most popularly used as  Suppose the client­peer, pi wants to find some target data. It 
file sharing applications, with Gnutella , Napster  and Kazaa  sends  a  query  to  the  super­peer,  spi, to  request  the  target 
among the most  famous. Alternative P2P  systems  include  data. Then the super­peer forwards this  query message to 
instant   messaging,   collaborative   computing,   distributed  another client­peer pj if the super­peer knows the target data 
computing,   and   platform   applications.   In   P2P   systems,   a  is   obtained   and   shared   by  pj.   Otherwise,   the   super­peer 
huge   number   of   computers   are   typically   interconnected  forwards the query request to other super­peers connected to 
using  Internet Protocol  (IP) network numbering,  lying  on  it and contributes to the potential for flooding the system 
top   of   physical   computer   networks.   Networking   between  with query messages, which consume CPU  resources and 
these   computers   determines   the   structure   (also   known   as  bandwidth of peers. 
network topology) for the whole computers in a system. In a  The query forwarding process is widely known as query  
broad   sense,   there   are   three   types   of   P2P   systems:  pure,  routing, which is reduced in the super­peer network model 
centralized  and  super­peer. This classification is based on  such as in , because the super­peer   can manage its client­
the   degree   of   decentralization   in   processing   tasks   and  peers and index information; the query routing would go to 
sharing resources among participants in the network. We are  super­peers that have indexed the target data even though the 
focusing   on   the   super­peer   network   model,   which   is  target   data  are actually obtained  from the  client­peers. In 
classified as optimized decentralized P2P systems. this case, the number of routing messages transmitted in the 
The super­peer network consists of super­peer and client­ network is reduced. 
peer nodes, with nodes decomposed into clusters. For each  The authors have proposed additional pre­processing for 
cluster, there are one or more nodes selected to be the super­ query routing in order to improve further the query routing 
peer for managing a group of client­peers. Our discussion in  in   super­peer   networks.   The   pre­processing   is   by   using   a 
this paper is focusing on the responsibility of super­peers for  schema cached list (SCL) at the client peer that enables it to 
query   routing,   although   we   are   aware   that   they   do   have  associate the query with data locations and thus route the 
additional roles.  query directly to the peers without going through the super 
Typically,   if   a   new   client­peer,  pi,   wants   to   join   the  peer. In this paper, we discuss the maintenance of our SCL 
network, it first has to send its information to a super­peer  as part of the pre­processing mechanism for query routing. 
spi. Then, the super­peer inserts the peer’s information into 
In   addition,   we   identify   several   key   elements   that   are  peer p5. This sequence of routing requests is shown in Figure 
required for query caching. 2. In Figure 3, we illustrate acknowledgement messages sent 
by the owners of data to  p2. These are followed by query 
II. BACKGROUND & RELATED WORKS
messages sent by p2 to the data owners. Remember that the 
The   super­peer   network   is   considered   an   efficient   P2P  original ‘ab’ query has been decomposed into sub­queries 
network model for searching query results . One of the key  ‘a’ and ‘b. Once the data owner has processed the sub­query, 
reasons is because of the routing index and query routing  the   results   are   returned   to  p2  for   further   query   result 
facilities providing by the super­peer for queries posted by  manipulation. 
its client­peers. In super­peer network systems, client­peers 
are connected to their local super­peer to upload their shared 
data and request query routing directives. Figure 1 illustrates 
the scenario, where the client peer,  p2, queries for data ‘a’  3. 3.
and ‘b’. Data ‘a’ is located at client­peers p1, p3 and p4 while  3 4
data ‘b’ is located at client­peer  p5. Client­peers  p3  and  p4 
are   connected   to   super­peer  sp2  while   client   peer  p5  is  4
3.
connected to super­peer sp3. 2.


2.
p4 b p5
p3
a
a sp3
  sp 2
Request message for ‘a’
Request message for ‘b’
sp1
Consult super­peers’ index

Figure   2.   Sequence   of   routing   request   messages   in 


Query ’ab’conventional super­peer networks
a
p1 p2
 
Figure. 1. Example of P2P network scenario

Suppose a query is given to client­peer p2 for information 
about   ‘a’  and   ‘b’.  In   a   conventional   super­peer   network 
system, the routing process is started by a request message 
sent by  p2  to its super­peer  sp1  as illustrated in  Figure 2. 
Then  sp1  consults its index to find peers with the required 
information   and,   in   this   example,   sends  p2  the   routing 
directions for ‘a’ and ‘b’. Based on these, the message is 
routed to  p1  and  sp2  then re­routed to  p3,  p4  and  sp3. The 
message request for ‘b’ is then re­routed by sp3 to its client 
information   is   located   at  the  super­peer.  Thus,  the  client­
peers   would   be   able   to   retrieve   the   query   result   just   by 
sending a query to the super­peer. 
Instead   of   implementing   schema­based   caching   for   the 
previous query and its result at the super­peer, He et al. and 
Fegaras et al. have proposed an  actual data­based indexed 
model in . Rather than have a routing index, the super­peer 
in     not   only   obtains   the   schema   but   also   the   data   that 
belongs to its client­peers. 
In   brief,   the   super­peer   in   the   above   mentioned 
approaches   has   the   ability   to   answer   the   query   based   on 
Acknowledge for ‘a’ existing query results that have been cached. Furthermore, 
Query  &  retrieval  for  the   super­peer   is   not   only   responsible   for   obtaining   the 
‘a’ indexes of data locations but also for processing the query 
Acknowledge for ‘b’ result.   Meanwhile,   the   super­peer   node   itself   is   not   so 
scalable   and   likely   to   be   a   single­point   of   failure   for   the 
Figure   3.   Acknowledge   message,   query   and   result  client­peers within its cluster .
retrieval in conventional super­peer network A fault tolerance module for super­peer failure has been 
From   the  above  presented  super­peer  network  scenario,  considered in the P2P­DIET project as presented in . In P2P­
the client­peer will send its query to the locations that have  DIET, a fast encoding peer profile is indexed by the client­
been   determined   by   the   super­peer.   The   routing   request  peer   for   ad­hoc   query   processing.   Therefore,   client­peers 
message by  p2  is restricted by its  time­to­live  (TTL) value  would be able to get their query routing directives without 
that has been set. If the TTL is exceeded before reaching sp3,  being   fully   dependent   on   the   super­peer.   However,   we 
its descendant peer that possesses data ‘b’ will be excluded  believe that replicating the routing index from the super­peer 
from the message routings / propagation. However, setting a  is not worthwhile, due to limited capabilities of client­peers. 
high   TTL   value   is   not   worthwhile   due   to   increasing   the  Additionally, the probability of the whole index usage is not 
query  routing  and a  risk  of causing  network congestion  .  reported by the authors. 
This risk is reduced if the amount of message routing in the  If the resources of the database are updated, the use of 
network is decreased, with a concomitant reduction in the  cached   query   results   may   lead   to   retrieval   of   outdated 
number   of   querying   peers   and   transmitted   messages   .  information for subsequent queries . Thus, results produced 
Furthermore, query mistakes can be avoided. by   the   super­peer   may   be   obsolete.   Therefore,   as   an 
Assisted query routing has been widely accepted since the  autonomous data provider in the P2P environment, the most 
introduction   of   super­peer   networks   and   more   recently,  up­to­date   results   should   come   from   the   actual   peers’ 
caching strategies have received significant attention . We  resources. These  resources  can  be  retrieved by  requesting 
separate caching strategies into two approaches (i) caching  their owners directly. Thus, instead of caching the query and 
the   actual   data,   and   (ii)   caching   the   routing   direction  its result, Quan et al. have proposed a  query hit message  
towards the location of that data.  caching  approach   .   The   query   hit   message   is   cached   for 
Caching   the   actual   data   is   similar   to   the  materialized   subsequent query request. Therefore, the subsequent query 
views  approach   that   has   been   implemented   in  federated   request for the same schema is re­directed to the replicated 
database systems  . In P2P, this concept is adapted in   and  data instead of every query request being routed to a single 
Brunkhorst & Dhraief have proposed a model of  semantic   resource. On the other hand, schema­based query caching is 
caching  . Semantic caching is used to cache the query and  proposed by Doulkeridis et al. for assisting query forwarding 
its   result   based   on   a   schema.   This   schema­based   cached  .   In   ,   the   schema   of   content   located   at   remote   peers   is 
cached. Thus, the subsequent query can be directly routed to  backbone of the P2P network and to facilitate the network of 
the resource location.  However, these approaches   are for  P2P connections. In addition, the super­peer is a dedicated 
pure P2P network, which route queries differently to super­ server   in   the   cluster   for   processing   messages.   Thus,   the 
peer networks. In the context of super­peer networks, query  index of resource locations which is used for assisting the 
caching   at   the   client­peer   has   so   far   not   been   instigated.  routing   of   messages   is   maintained   by   the   super­peer. 
There is some work on caching in super­peer networks, but  Therefore,   the   query   message   is   routed   according   to   this 
mainly  associated with  query  caching  on  super­peers,  not  index. Hence, the super­peer is also responsible for query 
peers.  processing on behalf of its respective client­peers.

III. SCHEMA CACHED LIST (SCL) FOR QUERY PRE-PROCESSING


a
Super­peer   network   systems   are   usually   based   on   two­ b
 
phase routing. Firstly, the query is routed within super­peers.   
Then, for each super­peer, the query is routed within their  a
respective client­peers. Our proposed SCL is piggy­backed   
on   the   above   mentioned   routing   strategies.   Super­peers 
employ   the   usual   routing   indices   and   two­phase   routing. 
However,   in   our   approach,   the   client­peer   is   able   to 
determine the routing direction locally, which is part of its 
pre­processing   activities.   The   SCL   is   used   to   obtain   the  a
routing direction for client­peer usage because, in contrast to   
routing indices by super­peers, the SCL keeps the routing 
direction from previous queries that have been performed by 
the client­peer. Since the query routing direction is locally 
determined, the existing two­phase routing is shortened into 
a single routing. Consequently, the query is routed directly 
to the location of required data for further retrieval. Based  Request message for ‘a’
on the same scenario in Figure 1, Figure 4 illustrates the  Request message for ‘b’
routing   request   message   from   the   queried   peer   using   our 
SCL   at   client­peer  p2.   Here,   we   assume   that   the   routing  Consult SCL at 
direction  for  ‘a’  and  ‘b’  has  been  previously  obtained   by 
client­peer p2.

A. SCL Locality
Figure 4. Illustration of our proposed routing request
In order to appreciate our location­based SCL as the pre­
processing   for   query   routing,   it   is   expedient   to   look   at 
2) Client-peer
different   peer   types   and   their   characteristics.   Here,   we 
The   client­peer   can   be   seen   as   either   a   consumer   that 
distinguish two types of peer nodes in super peer networks. 
searches for resources or a data provider. The client­peer is 
1) Super-peer
The super­peer node is a hub for several peer nodes in a  fully­dependant on the super­peer for query routing. The fact 
cluster. In this  research we assume that only one node is  that the super­peer may fail or suddenly leave the network 
assigned as a super­peer for a cluster of peers, even though  means that a fault tolerance module for super­peer failure is 
some research work that has been done on having multiple  required to be obtained by the client­peer . In our adapted 
super­peers   .   The   super­peer   is   used   to   maintain   the  super­peer   network,   we   are   proposing   that   the   client­peer 
obtains the capability of locally deciding the query routing  cache   policies   are   based   on   access   order,   recency,   and 
direction, instead of being fully dependant on the super­peer.  frequency. LRU is based on the temporary locality principle, 
Moreover,   local   query   routing   facilities   would   be   able   to  where items which have not been used for the longest time 
reduce the super­peer workload, as well as greatly reducing  are replaced when the initial space that has been specified is 
the number of messages being routed in the network and the  exceeded.   LRU   is   chosen   because   of   good   and   stable 
number   of   queried   peers   needed   for   retrieving   the   query  performance of file access times . 
result . 
C. Answering Queries Using SCL
B. SCL Data Structures Location­based   schema   caching   simply   stores   resource 
The role of the SCL is to provide the routing direction for  locations that have been extracted from previous queries for 
every   query   request   from   a   local   client­peer.   In   order   to  routing   local   subsequent   queries.   The   stored   resource 
achieve the correct and efficient execution of requests, each  location is decomposed into several attributes which are the 
SCL manages two data structures. One of them is to store  IP address, port number, filename and path that are linked to 
the schema that it supports and the other one is to store the  particular   schema   supported   by   the   local   SCL.   For   our 
resources location information for particular schema, called  approach, we assume that the query is only conjunctive, that 
schemaTable  and  resourceTable  respectively.   A   view   of  is  For­Let­Where­Return  (FLWR)   clauses   in   XQuery 
these data structures is shown in Figure 5.  grammar. In order to answer the incoming query, the client­
peer has to do some routing pre­processing. 
schemaTable resourceTabl
The local routing pre­processing consists of decomposing 
a conjunctive query that consists of several expressions into 
a single expression. Here, we assume that each expression is 
required for one schema or a sub­goal of the query. Then, 
each expression is matched against the schema cached in the 
k List  of  SCL. If the matched schema is found, the resource location 
is   identified.   Based   on   the   resource   location,   a   single­
k Resource  location sub­query is then generated. Thus, this re­written 
sub­query is routed to a specified location. 

IV. QUERY EXECUTION


Figure   5.   Data   structures   of   schemaTable   and  In   the   previous   sections,   we   have   presented   a   general 
resourceTable discussion   about   query   routing   in   typical   super­peer 
networks,   what   happens   when   the   super­peer   obtains   the 
The data structure for schemaTable and resourceTable is a  query   and   its   result,   and   some   effects   of   our   proposed 
java  LinkedHashMap.   The   ‘key’   for   each   record   in  adaptation where a peer is able to obtain the SCL. In this 
schemaTable is a schema supported by the SCL while a ‘List  section,   we   analyze   the   functionalities   of   our   system 
of   pointers’   is   pointing   to   the   related  resourceTable  prototype and what will happen when a client­peer forms a 
record(s).   Each   record   of  resourceTable  contains  query and when the SCL resource is required to be updated. 
information  on  one   resource such  as   the  IP   address,  port  Although   query   caching   in   super­peer   networks   has   been 
number, filename and path for particular schema. The use of  proposed   previously,   a   detailed   description   about   how   a 
a   java  LinkedHashMap  is   well­suited   for   Least   Recently  query   acquires   routing   directions   from   the   cached 
Used   (LRU)   cached   data.   The   LRU   is   one   of   the   most  information and how the cached information is updated has 
commonly used cache replacement policy. Three common  not been investigated or, at least, not been fully described. 
In our research, we consider the process at the client­peer  6. SCL updates,
that   has   been   queried   and   then   conducts   pre­processing  For each si  
necessary   for   query   routing.   The   following   routing   pre­ 3.1. Get the resource location attributes, ri
processing takes place for a query Q :  3.2. For each  ri, create a record of resource location in 
1. The FLWR query is created,  Q resourceTable
2. Q   is   decomposed   into   multiple  single­expressions,  3.3. Insert si as the ‘key’ for schemaTable
each expression being a sub­goal of query qi 3.4. Insert the pointer to each ri as a ‘list of pointers’ for 
3. For each qi, si in schemaTable 
qi is matched against the schema that belongs 
to schemaTable in SCL In order to fulfill the main steps mentioned above, there 
3.1 If match is found are three sub­processes: query path capturing, insertion of 
Assign the resource location the   resource,   and   identifying   where   the   resource   path 
3.2 If match is not found overlaps with the cached path. These are shown in a data­
Send this expression to the super­peer  flow   diagram   in   Figure   6.   The   query   path   capturing   and 
Exit from this routine  insertion is described in Algorithm 2.
3.3 Rewrite   the  qi  according   to   attributes   from   the 
assigned   resource   location(s)   as   a  single­location   Algorithm 1. Query Matching
sub­query. Input: Sub-query, qi
Output: A list of resource locations where
the schema of qi and the schema of
Based   on  the   above   pre­processing   steps,   we   show   the  the resource location is matched, rt
algorithm for query matching in Algorithm 1. The algorithm 
1. Identify the schema of qi, sq
for query re­writing has been presented in . 2. Get the schema from schemaTable that
matches with sq, st
V. QUERY CACHING AT CLIENT-PEER 3. If st != null
//matched schema is found
We have discussed the pre­processing involved in a query  3.1 Get the list of pointers to the
with   all   required   schema   found   in   client­peers’   SCL.  resourceTable record(s), rt
However, there is a possibility that a required schema is not  3.2 Return rt
4. Return null
found   as   in   step   2.1.2.   When   this   scenario   happens,   we 
consider the query routing as in a typical super­peer network 
The   resource   insertion   is   simply   a   java   method  put  in 
where the unfulfilled schema location request is forwarded 
LinkedHashMap  that   has   been   used   to   obtain   our 
to the super­peer. As in the typical super­peer network, the 
schemaTable and resourceTable. For every insertion, the put 
client­peer   will   re­write   the   query   based   on   the   routing 
method will check the maximum number of records that has 
location   informed   by   the   super­peer   for   a   further   routing 
been defined for both cached lists. If the new insertion is 
process. In our project, we consider the re­written query or 
done while the number of records is exceeded, then the least 
sub­query that is used in the routing information from the 
frequently used record will  automatically be deleted from 
super­peer as the input for updating our SCL. The following 
the particular cached list. 
steps take place when a client­peer re­writes a query / sub­
After each resource insertion, the resource path overlap is 
query based on the routing information from the super­peer
identified. Normally, there is a possibility that the cached 
Q ’:
path belongs to the same overlapping resource location. For 
4. Get the generated query / sub­query that was used in the 
example,   path  ‘//book/author’, 
routing information from super­peer, Q’
‘//book/author/firstname’  and 
5. Q’  is   decomposed   into   a  single­expression,   each 
‘//book/author/lastname’  that   belong   to   the 
expression obtaining one schema, si
resource location are identified as ‘overlapping’. Thus, we  3 Identify path overlapping
are   creating   a   resource   eliminating   function   in   order   to 
identify   the   overlapped   path.   Therefore,   the   path 
‘//book/author’  and its related resource information 
are   deleted.   Algorithm   3   describes   the   pseudo­code   for  Algorithm 3. Identify path overlapping
Input: schemaTable, resourceTable and
eliminating the overlapping path within cached resources.
insertion method invocation
Query  Output: Updated schemaTable and
resourceTable
Generator
schemaTab
For each record in resourceTable, rti
le 1. Get the path of rti
1 resourceTa 2. Compare the path of rti and rti+1
Get  ble If matched is found
2.1 Delete the record of rti
query
2.2 Delete the link of pointer for ‘2.1’
2 in schemaTable
3
Resourc
Insertion VI. CONCLUSION
e capturing
In this paper we have highlight the significance of using 
data from previous queries in P2P query processing. We are 
4 proposing the use of query caching in super­peer networks 
Identi for   reducing   the   number   of   queried   peers   and   messages 
fy  routed to obtain the query result. In addition, our proposed 
overlap approach   can   also  be  seen   as  a   super­peer  fault­tolerance 
mechanism.   In   contrast   to   the   query   and   result   caching 
approach, where the super­peer is  responsible for caching 
the query and its result, our proposed caching approach is to 
cache the routing direction of previous queries at the client­
Figure 6. Data flow diagram peer.   Thus,   query   routing   can   be   done   locally   without 
depending on the super­peers’ routing index. Even though 
caching the query and its result would allow super­peers to 
Algorithm   2.  Query   path   capturing   and   resource 
answer the query without further routing, we are concerned 
insertion
Input: Generated query / sub-query that about the frequency of data being updated by the resource 
used the routing information from owner and the desirability of avoiding generation of obsolete 
the super-peer, Q’ results. 
Output: Updated SCL
Since caching the routing direction of previous queries is 
1. Identify the schema of Q’ denoted as si central to our aims, in this paper we have described the main 
2. For each si process   for   obtaining   this   cached   information,   which   we 
2.1 Get the location information of Q’:
ip, port, filename & path have named as the schema cached list, SCL. Our vision is to 
2.2 Create a record for ‘2.1’ in provide evidence for reducing the number of queried peers 
resourceTable
and the number of messages routed in the network, thereby 
2.3 Generate a key for ‘2.2’
2.4 Create a record for si in reducing   the   entire   network   load   through   the   SCL. 
schemaTable. Currently,   we   have   shown   that   our   proposed   architecture 
2.5.1 Let si be a key for ‘2.4’
reduces   the   number   of   queried   peers   and   messages   by 
2.5.2 Let key of ‘2.3’ that belongs to
the same si be a part of list in testing it on an extension of the existing simulator used in . 
schemaTable object of ‘2.4.1’
The result has been presented in . For future work, we will 
compare the cost of caching at super­peers with the effect of 
caching at the client­peer in super­peer networks.

REFERENCES

You might also like