Location-Based Schema Caching For Client-Peer Query Routing in Super-Peer Networks

Location-based Schema Caching for Client-peer Query Routing in
Super-peer Networks
Rozlina Mohamed 1, 2 and Christopher D. Buckingham1
1
Aston University, Computer Science Department, B4 7ET, UK
2
University Malaysia Pahang, Software Engineering Department, 15150 Malaysia
In this paper, we present the construction of location-based schema caching for query routing at the client-peer in super-peer
networks. This cached information is used for directly routing subsequent repeated queries towards their actual resource locations
without going via the super peer. Instead of caching the previous query and its result, our proposed approach caches the query with
the routing direction, which is the resource location of the previous queries. This means outdated results are avoided. The paper
describes the main processes, including details of the algorithm required for constructing the cached information.
Index Terms— query caching, query rewriting, query routing, super-peer network, peer-to-peer systems
I. INTRODUCTION its routing management peer information and query routing.
Peertopeer (P2P) networks are most popularly used as Suppose the clientpeer, pi wants to find some target data. It
file sharing applications, with Gnutella , Napster and Kazaa sends a query to the superpeer, spi, to request the target
among the most famous. Alternative P2P systems include data. Then the superpeer forwards this query message to
instant messaging, collaborative computing, distributed another clientpeer pj if the superpeer knows the target data
computing, and platform applications. In P2P systems, a is obtained and shared by pj. Otherwise, the superpeer
huge number of computers are typically interconnected forwards the query request to other superpeers connected to
using Internet Protocol (IP) network numbering, lying on it and contributes to the potential for flooding the system
top of physical computer networks. Networking between with query messages, which consume CPU resources and
these computers determines the structure (also known as bandwidth of peers.
network topology) for the whole computers in a system. In a The query forwarding process is widely known as query
broad sense, there are three types of P2P systems: pure, routing, which is reduced in the superpeer network model
centralized and superpeer. This classification is based on such as in , because the superpeer can manage its client
the degree of decentralization in processing tasks and peers and index information; the query routing would go to
sharing resources among participants in the network. We are superpeers that have indexed the target data even though the
focusing on the superpeer network model, which is target data are actually obtained from the clientpeers. In
classified as optimized decentralized P2P systems. this case, the number of routing messages transmitted in the
The superpeer network consists of superpeer and client network is reduced.
peer nodes, with nodes decomposed into clusters. For each The authors have proposed additional preprocessing for
cluster, there are one or more nodes selected to be the super query routing in order to improve further the query routing
peer for managing a group of clientpeers. Our discussion in in superpeer networks. The preprocessing is by using a
this paper is focusing on the responsibility of superpeers for schema cached list (SCL) at the client peer that enables it to
query routing, although we are aware that they do have associate the query with data locations and thus route the
additional roles. query directly to the peers without going through the super
Typically, if a new clientpeer, pi, wants to join the peer. In this paper, we discuss the maintenance of our SCL
network, it first has to send its information to a superpeer as part of the preprocessing mechanism for query routing.
spi. Then, the superpeer inserts the peer’s information into
In addition, we identify several key elements that are peer p5. This sequence of routing requests is shown in Figure
required for query caching. 2. In Figure 3, we illustrate acknowledgement messages sent
by the owners of data to p2. These are followed by query
II. BACKGROUND & RELATED WORKS
messages sent by p2 to the data owners. Remember that the
The superpeer network is considered an efficient P2P original ‘ab’ query has been decomposed into subqueries
network model for searching query results . One of the key ‘a’ and ‘b. Once the data owner has processed the subquery,
reasons is because of the routing index and query routing the results are returned to p2 for further query result
facilities providing by the superpeer for queries posted by manipulation.
its clientpeers. In superpeer network systems, clientpeers
are connected to their local superpeer to upload their shared
data and request query routing directives. Figure 1 illustrates
the scenario, where the client peer, p2, queries for data ‘a’ 3. 3.
and ‘b’. Data ‘a’ is located at clientpeers p1, p3 and p4 while 3 4
data ‘b’ is located at clientpeer p5. Clientpeers p3 and p4
are connected to superpeer sp2 while client peer p5 is 4
3.
connected to superpeer sp3. 2.
2
2.
p4 b p5
p3
a
a sp3
sp 2
Request message for ‘a’
Request message for ‘b’
sp1
Consult superpeers’ index
Figure 2. Sequence of routing request messages in

Query ’ab’conventional superpeer networks
a
p1 p2

Figure. 1. Example of P2P network scenario
Suppose a query is given to clientpeer p2 for information
about ‘a’ and ‘b’. In a conventional superpeer network
system, the routing process is started by a request message
sent by p2 to its superpeer sp1 as illustrated in Figure 2.
Then sp1 consults its index to find peers with the required
information and, in this example, sends p2 the routing
directions for ‘a’ and ‘b’. Based on these, the message is
routed to p1 and sp2 then rerouted to p3, p4 and sp3. The
message request for ‘b’ is then rerouted by sp3 to its client
information is located at the superpeer. Thus, the client
peers would be able to retrieve the query result just by
sending a query to the superpeer.
Instead of implementing schemabased caching for the
previous query and its result at the superpeer, He et al. and
Fegaras et al. have proposed an actual databased indexed
model in . Rather than have a routing index, the superpeer
in not only obtains the schema but also the data that
belongs to its clientpeers.
In brief, the superpeer in the above mentioned
approaches has the ability to answer the query based on
Acknowledge for ‘a’ existing query results that have been cached. Furthermore,
Query & retrieval for the superpeer is not only responsible for obtaining the
‘a’ indexes of data locations but also for processing the query
Acknowledge for ‘b’ result. Meanwhile, the superpeer node itself is not so
scalable and likely to be a singlepoint of failure for the
Figure 3. Acknowledge message, query and result clientpeers within its cluster .
retrieval in conventional superpeer network A fault tolerance module for superpeer failure has been
From the above presented superpeer network scenario, considered in the P2PDIET project as presented in . In P2P
the clientpeer will send its query to the locations that have DIET, a fast encoding peer profile is indexed by the client
been determined by the superpeer. The routing request peer for adhoc query processing. Therefore, clientpeers
message by p2 is restricted by its timetolive (TTL) value would be able to get their query routing directives without
that has been set. If the TTL is exceeded before reaching sp3, being fully dependent on the superpeer. However, we
its descendant peer that possesses data ‘b’ will be excluded believe that replicating the routing index from the superpeer
from the message routings / propagation. However, setting a is not worthwhile, due to limited capabilities of clientpeers.
high TTL value is not worthwhile due to increasing the Additionally, the probability of the whole index usage is not
query routing and a risk of causing network congestion . reported by the authors.
This risk is reduced if the amount of message routing in the If the resources of the database are updated, the use of
network is decreased, with a concomitant reduction in the cached query results may lead to retrieval of outdated
number of querying peers and transmitted messages . information for subsequent queries . Thus, results produced
Furthermore, query mistakes can be avoided. by the superpeer may be obsolete. Therefore, as an
Assisted query routing has been widely accepted since the autonomous data provider in the P2P environment, the most
introduction of superpeer networks and more recently, uptodate results should come from the actual peers’
caching strategies have received significant attention . We resources. These resources can be retrieved by requesting
separate caching strategies into two approaches (i) caching their owners directly. Thus, instead of caching the query and
the actual data, and (ii) caching the routing direction its result, Quan et al. have proposed a query hit message
towards the location of that data. caching approach . The query hit message is cached for
Caching the actual data is similar to the materialized subsequent query request. Therefore, the subsequent query
views approach that has been implemented in federated request for the same schema is redirected to the replicated
database systems . In P2P, this concept is adapted in and data instead of every query request being routed to a single
Brunkhorst & Dhraief have proposed a model of semantic resource. On the other hand, schemabased query caching is
caching . Semantic caching is used to cache the query and proposed by Doulkeridis et al. for assisting query forwarding
its result based on a schema. This schemabased cached . In , the schema of content located at remote peers is
cached. Thus, the subsequent query can be directly routed to backbone of the P2P network and to facilitate the network of
the resource location. However, these approaches are for P2P connections. In addition, the superpeer is a dedicated
pure P2P network, which route queries differently to super server in the cluster for processing messages. Thus, the
peer networks. In the context of superpeer networks, query index of resource locations which is used for assisting the
caching at the clientpeer has so far not been instigated. routing of messages is maintained by the superpeer.
There is some work on caching in superpeer networks, but Therefore, the query message is routed according to this
mainly associated with query caching on superpeers, not index. Hence, the superpeer is also responsible for query
peers. processing on behalf of its respective clientpeers.
III. SCHEMA CACHED LIST (SCL) FOR QUERY PRE-PROCESSING

a
Superpeer network systems are usually based on two b

phase routing. Firstly, the query is routed within superpeers.
Then, for each superpeer, the query is routed within their a
respective clientpeers. Our proposed SCL is piggybacked
on the above mentioned routing strategies. Superpeers
employ the usual routing indices and twophase routing.
However, in our approach, the clientpeer is able to
determine the routing direction locally, which is part of its
preprocessing activities. The SCL is used to obtain the a
routing direction for clientpeer usage because, in contrast to
routing indices by superpeers, the SCL keeps the routing
direction from previous queries that have been performed by
the clientpeer. Since the query routing direction is locally
determined, the existing twophase routing is shortened into
a single routing. Consequently, the query is routed directly
to the location of required data for further retrieval. Based Request message for ‘a’
on the same scenario in Figure 1, Figure 4 illustrates the Request message for ‘b’
routing request message from the queried peer using our
SCL at clientpeer p2. Here, we assume that the routing Consult SCL at
direction for ‘a’ and ‘b’ has been previously obtained by
clientpeer p2.
A. SCL Locality
Figure 4. Illustration of our proposed routing request
In order to appreciate our locationbased SCL as the pre
processing for query routing, it is expedient to look at
2) Client-peer
different peer types and their characteristics. Here, we
The clientpeer can be seen as either a consumer that
distinguish two types of peer nodes in super peer networks.
searches for resources or a data provider. The clientpeer is
1) Super-peer
The superpeer node is a hub for several peer nodes in a fullydependant on the superpeer for query routing. The fact
cluster. In this research we assume that only one node is that the superpeer may fail or suddenly leave the network
assigned as a superpeer for a cluster of peers, even though means that a fault tolerance module for superpeer failure is
some research work that has been done on having multiple required to be obtained by the clientpeer . In our adapted
superpeers . The superpeer is used to maintain the superpeer network, we are proposing that the clientpeer
obtains the capability of locally deciding the query routing cache policies are based on access order, recency, and
direction, instead of being fully dependant on the superpeer. frequency. LRU is based on the temporary locality principle,
Moreover, local query routing facilities would be able to where items which have not been used for the longest time
reduce the superpeer workload, as well as greatly reducing are replaced when the initial space that has been specified is
the number of messages being routed in the network and the exceeded. LRU is chosen because of good and stable
number of queried peers needed for retrieving the query performance of file access times .
result .
C. Answering Queries Using SCL
B. SCL Data Structures Locationbased schema caching simply stores resource
The role of the SCL is to provide the routing direction for locations that have been extracted from previous queries for
every query request from a local clientpeer. In order to routing local subsequent queries. The stored resource
achieve the correct and efficient execution of requests, each location is decomposed into several attributes which are the
SCL manages two data structures. One of them is to store IP address, port number, filename and path that are linked to
the schema that it supports and the other one is to store the particular schema supported by the local SCL. For our
resources location information for particular schema, called approach, we assume that the query is only conjunctive, that
schemaTable and resourceTable respectively. A view of is ForLetWhereReturn (FLWR) clauses in XQuery
these data structures is shown in Figure 5. grammar. In order to answer the incoming query, the client
peer has to do some routing preprocessing.
schemaTable resourceTabl
The local routing preprocessing consists of decomposing
a conjunctive query that consists of several expressions into
a single expression. Here, we assume that each expression is
required for one schema or a subgoal of the query. Then,
each expression is matched against the schema cached in the
k List of SCL. If the matched schema is found, the resource location
is identified. Based on the resource location, a single
k Resource location subquery is then generated. Thus, this rewritten
subquery is routed to a specified location.
IV. QUERY EXECUTION

Figure 5. Data structures of schemaTable and In the previous sections, we have presented a general
resourceTable discussion about query routing in typical superpeer
networks, what happens when the superpeer obtains the
The data structure for schemaTable and resourceTable is a query and its result, and some effects of our proposed
java LinkedHashMap. The ‘key’ for each record in adaptation where a peer is able to obtain the SCL. In this
schemaTable is a schema supported by the SCL while a ‘List section, we analyze the functionalities of our system
of pointers’ is pointing to the related resourceTable prototype and what will happen when a clientpeer forms a
record(s). Each record of resourceTable contains query and when the SCL resource is required to be updated.
information on one resource such as the IP address, port Although query caching in superpeer networks has been
number, filename and path for particular schema. The use of proposed previously, a detailed description about how a
a java LinkedHashMap is wellsuited for Least Recently query acquires routing directions from the cached
Used (LRU) cached data. The LRU is one of the most information and how the cached information is updated has
commonly used cache replacement policy. Three common not been investigated or, at least, not been fully described.
In our research, we consider the process at the clientpeer 6. SCL updates,
that has been queried and then conducts preprocessing For each si
necessary for query routing. The following routing pre 3.1. Get the resource location attributes, ri
processing takes place for a query Q : 3.2. For each ri, create a record of resource location in
1. The FLWR query is created, Q resourceTable
2. Q is decomposed into multiple singleexpressions, 3.3. Insert si as the ‘key’ for schemaTable
each expression being a subgoal of query qi 3.4. Insert the pointer to each ri as a ‘list of pointers’ for
3. For each qi, si in schemaTable
qi is matched against the schema that belongs
to schemaTable in SCL In order to fulfill the main steps mentioned above, there
3.1 If match is found are three subprocesses: query path capturing, insertion of
Assign the resource location the resource, and identifying where the resource path
3.2 If match is not found overlaps with the cached path. These are shown in a data
Send this expression to the superpeer flow diagram in Figure 6. The query path capturing and
Exit from this routine insertion is described in Algorithm 2.
3.3 Rewrite the qi according to attributes from the
assigned resource location(s) as a singlelocation Algorithm 1. Query Matching
subquery. Input: Sub-query, qi
Output: A list of resource locations where
the schema of qi and the schema of
Based on the above preprocessing steps, we show the the resource location is matched, rt
algorithm for query matching in Algorithm 1. The algorithm
1. Identify the schema of qi, sq
for query rewriting has been presented in . 2. Get the schema from schemaTable that
matches with sq, st
V. QUERY CACHING AT CLIENT-PEER 3. If st != null
//matched schema is found
We have discussed the preprocessing involved in a query 3.1 Get the list of pointers to the
with all required schema found in clientpeers’ SCL. resourceTable record(s), rt
However, there is a possibility that a required schema is not 3.2 Return rt
4. Return null
found as in step 2.1.2. When this scenario happens, we
consider the query routing as in a typical superpeer network
The resource insertion is simply a java method put in
where the unfulfilled schema location request is forwarded
LinkedHashMap that has been used to obtain our
to the superpeer. As in the typical superpeer network, the
schemaTable and resourceTable. For every insertion, the put
clientpeer will rewrite the query based on the routing
method will check the maximum number of records that has
location informed by the superpeer for a further routing
been defined for both cached lists. If the new insertion is
process. In our project, we consider the rewritten query or
done while the number of records is exceeded, then the least
subquery that is used in the routing information from the
frequently used record will automatically be deleted from
superpeer as the input for updating our SCL. The following
the particular cached list.
steps take place when a clientpeer rewrites a query / sub
After each resource insertion, the resource path overlap is
query based on the routing information from the superpeer
identified. Normally, there is a possibility that the cached
Q ’:
path belongs to the same overlapping resource location. For
4. Get the generated query / subquery that was used in the
example, path ‘//book/author’,
routing information from superpeer, Q’
‘//book/author/firstname’ and
5. Q’ is decomposed into a singleexpression, each
‘//book/author/lastname’ that belong to the
expression obtaining one schema, si
resource location are identified as ‘overlapping’. Thus, we 3 Identify path overlapping
are creating a resource eliminating function in order to
identify the overlapped path. Therefore, the path
‘//book/author’ and its related resource information
are deleted. Algorithm 3 describes the pseudocode for Algorithm 3. Identify path overlapping
Input: schemaTable, resourceTable and
eliminating the overlapping path within cached resources.
insertion method invocation
Query Output: Updated schemaTable and
resourceTable
Generator
schemaTab
For each record in resourceTable, rti
le 1. Get the path of rti
1 resourceTa 2. Compare the path of rti and rti+1
Get ble If matched is found
2.1 Delete the record of rti
query
2.2 Delete the link of pointer for ‘2.1’
2 in schemaTable
3
Resourc
Insertion VI. CONCLUSION
e capturing
In this paper we have highlight the significance of using
data from previous queries in P2P query processing. We are
4 proposing the use of query caching in superpeer networks
Identi for reducing the number of queried peers and messages
fy routed to obtain the query result. In addition, our proposed
overlap approach can also be seen as a superpeer faulttolerance
mechanism. In contrast to the query and result caching
approach, where the superpeer is responsible for caching
the query and its result, our proposed caching approach is to
cache the routing direction of previous queries at the client
Figure 6. Data flow diagram peer. Thus, query routing can be done locally without
depending on the superpeers’ routing index. Even though
caching the query and its result would allow superpeers to
Algorithm 2. Query path capturing and resource
answer the query without further routing, we are concerned
insertion
Input: Generated query / sub-query that about the frequency of data being updated by the resource
used the routing information from owner and the desirability of avoiding generation of obsolete
the super-peer, Q’ results.
Output: Updated SCL
Since caching the routing direction of previous queries is
1. Identify the schema of Q’ denoted as si central to our aims, in this paper we have described the main
2. For each si process for obtaining this cached information, which we
2.1 Get the location information of Q’:
ip, port, filename & path have named as the schema cached list, SCL. Our vision is to
2.2 Create a record for ‘2.1’ in provide evidence for reducing the number of queried peers
resourceTable
and the number of messages routed in the network, thereby
2.3 Generate a key for ‘2.2’
2.4 Create a record for si in reducing the entire network load through the SCL.
schemaTable. Currently, we have shown that our proposed architecture
2.5.1 Let si be a key for ‘2.4’
reduces the number of queried peers and messages by
2.5.2 Let key of ‘2.3’ that belongs to
the same si be a part of list in testing it on an extension of the existing simulator used in .
schemaTable object of ‘2.4.1’
The result has been presented in . For future work, we will
compare the cost of caching at superpeers with the effect of
caching at the clientpeer in superpeer networks.
REFERENCES

Location-Based Schema Caching For Client-Peer Query Routing in Super-Peer Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Location-Based Schema Caching For Client-Peer Query Routing in Super-Peer Networks

Uploaded by

Copyright:

Available Formats

Location-based Schema Caching for Client-peer Query Routing in

Figure 2. Sequence of routing request messages in

III. SCHEMA CACHED LIST (SCL) FOR QUERY PRE-PROCESSING

IV. QUERY EXECUTION

You might also like