Professional Documents
Culture Documents
SDBSs [11, 25, 19] have been introduced with the arrival Query models
of the new generation of sensors. Thanks to their increasing
computing, storage and wireless communication capacities, In both DSMSs and SDBSs, the most popular way of
each sensor can be seen as an autonomous database con- formulating queries is to use an SQL-like language [8, 12,
taining data about its environment (temperature, pressure, 25, 11]. However, the underlying query models are not al-
geographic location, etc.). Sensors form a wireless sensor ways formalized. The most serious effort in formalization is
network [6], in which queries are distributed by a gateway made by the STREAM system [7], namely CQL [8]. Our
(or base station) in a multi-hop manner. Continuous queries approach differs from their model principally in three as-
are evaluated on the sensors (or in the network for some ag- pects. First, we base our model of stream query evaluation
gregation operators [24]) and the results are collected by the on a timely tuple-by-tuple execution basis (our arguments
gateway. Sensor data is not materialized until the evaluation for this choice is discussed later), while they have a rela-
of the query on the sensor. tion granularity for query execution. Second, our model,
DSMSs [3, 12, 7] are systems of continuous query as well as a timestamp ordering, includes a linear position
processing over data streams. They are not conceived only ordering by which we aim to be able to take advantage of
for sensor applications, but also for monitoring applications some positional operators seen in sequential databases (see
of financial, telecommunication, or network data. Contrar- the overview of sequential databases below). And third, we
ily to SDBSs, most of the DSMSs are centralized systems, give a more general definition for windows in order to de-
i.e. stream sources send their data to the DSMS, and con- fine various types of windows. CQL is limited to sliding
tinuous queries [10] are evaluated on a centralized server. windows. In addition, contrarily to their model, we also de-
Sensor data is materialized as a data stream. fine sliding distance and sliding rate for windows, as well
SDBSs and DSMSs are two strongly related domains. as a more flexible management of window edges. Another
In fact, we can talk about a certain sensor stream manage- DSMS, namely TelegraphCQ [12] also provides a general
ment system (SSMS) as a sub-domain of DSMS (see Fig- definition for windows by using a for-loop construct. How-
ure 1). Several recent works in the literature [23, 17, 4, 15] ever, they are restricted to temporal ordering, thus they only
fall into this domain. Below we give some common sensor define temporal windows. Besides, their sliding window
data and query representation aspects introduced by DSMSs definition does not include a sliding rate parameter.
Figure 1. Relations between DSMS and SDBS
to three different parts of sensor data: meta-information of Sequences of tuples form data streams. Next section
the sensor (identification, location, type, unit of measure, gives basic stream definitions and notations in order to for-
etc.), sensor’s measurement (temperature, pressure, GPS malize the stream data model.
coordinates, etc.), and timestamp representing the time at
which the measurement is made. Continuous query oper- 3.1.2 Sensor data stream
ators execute on the measurement of sensors (e.g. sensors
measuring less than 10). However, in order to localize the Definition 2 A stream S = {s0 , s1 , ........, sn , ........} is a
sensors whose data will be interrogated, a part of the query set of tuples si ordered by their tmstmp value. In addition,
is executed on the sensors’ meta-information (e.g. temper- tuples also have linear positional ordering, i.e. the tuple sn
ature sensors in room A measuring less than 10 Celsius). is the nth element of the stream S.
And lastly, time is also concerned by most of the queries on We note that the set S may contain distinct tuples which
sensor data (e.g. temperature sensors in room A measuring have the same value for the timestamp attribute, and an el-
in average less than 10 Celsius in a sliding window of 5 ement of T is not necessarily present among the timestamp
minutes). Hence, in our general sensor data schema defin- attributes of si . More formally:
ition, we differentiate three types of attributes. First type is
property attributes. They contain meta-information of sen- Property 1 Let τ : S → T be a function that gives the
sors. We note that not all property attributes are known by value of the timestamp attribute of a tuple si (i.e. τ (si ) =
the sensors. Some of them would be added to the tuples si .tmstmp). This function is neither injective nor surjec-
by intermediary units such as proxies, gateways, or servers. tive from S to T.
The second type is the continuously changing attribute over
time, and is represented by the measurement field. The se- Conceptually streams can be unbounded. However, only
mantic of the measurement and the properties can vary de- a bounded part of a stream is materialized for query process-
pending on application contexts. The semantic interpreta- ing. Therefore, we differentiate three parts in a stream. The
tion would be done at the application level. For instance, past, the present and the future (see Figure 3):
measurement can represent temperature reading for one ap- Definition 3 The present part of a stream is the currently
plication, RFID tag number for another. By this way, our materialized part of the stream at an instant t. We denote
objective is to allow the coexistence of different types of this part as S t ⊆ S, and as in definition 2, the tuple stn
sensors while giving a sufficiently general schema defini- is the nth element of S t . Thus, the first element of S t is st0 ,
tion. Finally, the last type is the timestamp attribute that |S t | is the cardinality of S t , the last element of S t is st|S t |−1 ,
represents the time at which a measurement is made by the
and ∀i sti .tmstmp ≤ t.
sensor. We assume that timestamps are attributed to tuples
according to a global time (by sensors or by some other in- The present part of the stream contains currently avail-
termediary units). We argue that this representation is suf- able data for query evaluation. Mostly, this part is mate-
ficiently generic that most sensor data types can be repre- rialized in form of a queue structure whose size is limited
sented (see Table 1 for some examples). by the memory capacity of the query processing unit. Be-
We can therefore give a formal tuple definition as follow- yond this limit, the data expires from the queue, therefore
ing: becomes past data.
Definition 1 A tuple s is a list of several property attributes Definition 4 The past of a stream S, at an instant t, is com-
ai , one measurement attribute m, and one timestamp at- posed of si ∈ S such that si .tmstmp < st0 .tmstmp.
tribute tmstmp, i.e. s =< a1 , a2 , ......., an , m, tmstmp >. The past can be stored in a persistent disc-based storage
Each attribute ai belongs to a particular domain Di , m system. Queries over histories of data can be evaluated on
attribute to the measurement domain M , and tmstmp to this part.
the time domain T . T is a totally ordered set containing
discrete points which represent different moments in time, Definition 5 The future of a stream S, at an instant t, is
tmstmp ∈ T = {t0 , t1 , t2 , ...}, T ⊆ N0 . composed of si ∈ S such that si .tmstmp > t.
Figure 3. The subset S t of the infite stream S represents data
currently available for processing (e.g. as a queue in memory).
n=1
contains the results of operations over input streams, i.e.
Sout = Op(Sin1 , Sin2 , .......Sinn ) where Sout denote where t0 represents the time of the first execution of the op-
the output stream and Sini an input stream erator, n ∈ N the nth execution of the operator, and ∆tn the
accumulated duration until the operator’s nth execution (i.e.
However, there are two types of operators which are ∆tn = Σni=1 δti ). Note that δti is the duration of operator’s
mostly used: unary operators (i.e. Sout = U nOp(Sin)) ith execution, |Soutt0 | = 0, and ∆t0 = 0 .
and binary operators (i.e. Sout = BinOp(Sin1 , Sin2 )). Since tuples are being added to the stream continuously
Although numerous unary operators could be defined, and eventually with a high rate, temporal dimension of the
concretely selection and projection operators are general query operators which reflects the real-time nature of sen-
enough to answer a large number of different kinds of sor queries gains more importance. Typically, in real-time
queries. Similarly, the join operator can be given as an ex- databases, δt is the constraint on the execution of the oper-
ample of a binary operator. However join operations are ator. These systems will require δt to be less than a certain
generally executed on windows, as a result of the block- threshold in order to keep temporal consistency in the sys-
ing nature of this operation. Therefore, in this section we tem. Although this subject is out of the scope of this paper,
will only define unary operators. In addition, we will par- we want to note that adding a temporal dimension to the
ticularly deal with the materialized part of streams which operators would facilitate to take into account the real-time
represents present values; whence the following definition: aspects of these systems. See the perspectives section for
more details.
Definition 7 U nOP t is a unary stream operator which
In the sensors context, periodic execution of operators is
represents the execution of the operator UnOP at time t over
very usual: periodic filters over the data periodically sent by
the input stream Sint . U nOpt takes the first element of
sensors, operators over periodically sliding windows, etc.
Sint and executes the operation defined by the operator.
0 In order to represent these cases, it would suffice to replace
The result forms the last element of Soutt , where t0 = t +
in the preceding formula, ∆tn with rate × n, where rate
1 Operators producing several streams are not considered. represents the execution periodicity of the operator.
3.2.2 Window Creation Operators
Windows are finite subsets of streams. From a general point
of view, a window is bounded by two parameters: start
and end. We differentiate two types of windows: temporal
windows and position based windows. In case of temporal
windows, window edges are time points (start, end ∈ T );
in case of position based windows, window edges are po-
sitions of the tuples in the stream (start, end ∈ N). Note
that, in both cases start ≤ end, and start = end implies
an empty window.
Window creation operators create windows from
streams. Formally;
Definition 8 Let W be a window creation operator over Figure 5. Position based sliding window with rate = 2, start adv
a data stream S, it returns a window R bounded by = end adv = 3. Window width is 4 units
start and end parameters. For position based windows:
W(start,end) (S) = R = {si ∈ S | start ≤ i ≤ end},
start ≥ 0. For temporal windows: W(start,end) (S) =
∞
R = {si ∈ S | start ≤ si .tmstmp ≤ end}, start ≥ 0
[
s0 .tmstmp Wdesc (S) = Rn (4)
n=1
As mentioned earlier, mostly we will deal with the
Rn is the window created during the nth execution of the
present values of a stream. Thus, we define instantenous
operator W t over the stream S t (see Figure 6). Formally:
window creation operator W t which creates a window from t0 +(n−1)×rate
the stream S t . We will use temporal windows to illustrate Rn = W(start(n),end(n)) (S t0 +(n−1)×rate )
the rest of the section. The reasoning would be similar for
position based windows. where start(n) = start + (n − 1) × start adv and
end(n) = end + (n − 1) × end adv.
Definition 9 W t is an instantaneous temporal window With this general definition, it is possible to define dif-
creation operator which, at instant t, takes as input ferent types of windows: fixed windows (start adv =
a stream S t and returns a window R, i.e. R = end adv = 0), landmark (either start adv = 0, or
t
W(start,end) (S t ) = {sti ∈ S t | start ≤ sti .tmstmp ≤ end adv = 0), tumbling (start adv = end adv = end −
end}, start ≥ st0 .tmstmp and end ≤ st|S t |−1 .tmstmp. start), etc.
In general, window width is constant for sliding win-
In the sensor querying context, generally the windows dows (i.e. start adv = end adv = cnst). However, if
are not fixed, i.e. edges of windows vary continuously in we want to have windows of different sizes at each sliding
function of the time. In order to include this kind of win- period, we can define start adv(n) and end adv(n) which
dows, we give a window description definition below: can take different values at the nth execution of the window
creation operator. Similarly, a non-constant rate parameter
Definition 10 A window description desc is a 5-tuple con- implies an aperiodic window. Therefore, a variable parame-
taining the parameters: start, end, rate, start adv, and ter rate(n) defines a different rate for operator’s nth execu-
end adv. start (resp. end) is the initial value of the first tion (e.g. every time that a new tuple arrives). In addition,
(resp. second) edge of the window, rate represents the pe- in some cases, the window edges may surpass the present
riodicity of the window, finally start adv and end adv de- part of a stream (part where currently tuples are present).
termine the sliding distance (i.e. how much window edges For instance, this can happen when a sliding window ad-
will advance) in case of moving windows. (see Figure 5 for vances so fast that window’s end parameter falls into the
an example). future part of the stream (see Figure 7). One solution for
Therefore, we can generalize the window creation oper- this problem is to evaluate the windowed operator (see next
ator definition given above: section) over the window only including the present values
of the window; hence, the end parameter of the window
Definition 11 Let W 0 be a window creation operator, it becomes the timestamp of the last element of S t , i.e. if
takes as input a stream S and a description desc, it returns end > s|S t |−1 .tmstmp then end = s|S t |−1 .tmstmp. This
a set of windows created according to the behaviour de- solution could be used for periodic operators in order to give
scription given in desc. Formally: at least a result at the end of each period. Another solution
Figure 7. if end > s|S t |−1 .tmstmp then end =
s|S t |−1 .tmstmp
ing:
∞
[
Sout = W U nOpt0 +(n−1)×rate (Rn ) (6)
n=1
[∞
t0 +n×rate
= sout|Sout t0 +(n−1)×rate | (7)
Figure 6. W creates windows from input stream. WUnOp is
n=1
executed on such windows and creates the output stream
where Rn has been introduced in the formula 4.
Note that for some windowed unary aggregation opera-
could be to wait until the window fills with the demanding tors (e.g. average, count, sum), and binary operators (e.g.
amount of tuples before executing the windowed operator. join), the timestamp value that the output tuple will take is
This solution could be adopted when there is no rate spec- not obvious. There are several possibilities to handle this
ified for the operator. Similarly, if the start < s0 .tmstmp, problem: i) to choose as the output’s timestamp, one of the
we can either take start = s0 .tmstmp or we can take the timestamp values of input tuples which contributed to the
tuples from historic, if this latter is available. output tuple (e.g. the minimum [16], the maximum [9]),
or the one indicated by the query [10]); ii) to assign a new
timestamp (e.g. operator’s execution time [10]); iii) or alter-
3.2.3 Windowed Operators natively to have a time interval [min ts, max ts] instead of
This section introduces windowed operators – operators ex- a unique timestamp [18]. In order to maintain the temporal
ecuting over windows. They are represented by the symbol order of the output tuples, we choose to take the maximum
W Op (Windowed Operator). As in the case of stream op- of the input timestamp values. It is also the value nearest to
erators, we take two types of windowed operators: unary the one that the operator would assign, if the second solu-
(W U nOp) and binary (W BinOp). As examples of win- tion was chosen.
dowed unary operators, we can give traditional aggregation
operators such as average, count, sum, min, and max. Simi- 4 Query example
larly, a windowed join is a binary operator (see Section 4 for
operator examples). In this section, we only define unary This section illustrates several aspects of SStreaM. The
windowed aggregation operators due to size restrictions. example is based on a hybrid multi-level architecture de-
However, other operators can be defined in an analogous fined to query distributed heterogeneous sensors [17]. The
way. architecture (see Figure 8) is composed of three main lev-
Similarly to definition 7, an aggregation W U nOp oper- els: control sites, gateways and sensors. Control sites are
ator takes as input a window R and returns the result tuple the entry points of the system. Users or applications send
to the output stream: their queries to the control site, and the control site decom-
poses the query in order to send the sub-queries to the gate-
W U nOpt (R) = soutt+δt
|Soutt | (5) ways concerned by the query. Gateways are distributed ac-
cording to an attribute (mostly the location attribute). They
t
where R = W(start,end) (Sint ) group different kinds of sensors, more precisely their prox-
ies. A proxy is the software controlling one or more sen-
As in the formula 2 and 3, we can find back the output sors. On the gateway, there is also one adapter per proxy
stream in case of a periodically sliding window as follow- which is the interface between the sensor specific proxy and
our sensor querying system. Adapters are charged to make
the translation between our query language and proxy’s sen-
sor specific control commands. Sensors are physically dis-
tributed in an environment and send their measures to their
proxies in a periodic or aperiodic manner. There are dif-
ferent kinds of sensors (temperature, pressure, localization,
etc.) with different capabilities such as some query oper-
ator processing and storage capacities which can be used
for query optimization purposes (e.g. if a sensor can exe-
cute a selection operator, then push the selection operator to
the sensor). Having this architecture in mind consider the
following scenario which will be used to illustrate a query
example:
Figure 8. Architecture and query example
In a factory, each product passes respectively by a cer-
tain number of sections during its lifecycle of production.
The product stays, during one minute, in each section where lected by the control site. We illustrate the part of the query
some operations are effectuated on it, and then passes to the which will be executed at one section.
next section. Each section has a gateway containing dif- Let S1 be the stream created by the RFID reader, and
ferent kinds of sensor proxies. For our example, we will S2 the stream created by temperature sensors of the section,
take two types of sensors: temperature sensors and RFID then Q can be represented in algebraic form as following:
readers (sensors). We assume that there are, several tem- 0
P rojOpL (SelOpP (W AV Gattr (Wdesc 3
(W JoinJC (
perature sensors placed at different locations of a section, 0 0
Wdesc1 (S1), Wdesc2 (S2)))
and one RFID reader per section detecting product tags (see
Figure 8). where L =< S1.measurement >, P = (S1.type =
According to the general operator definitions given in RF ID ∧ S2.type = T emperature ∧ average > 40),
section 3, we introduce following operators which will be attr = S2.measurement, and JC = (S1.location =
used for the query example: S2.location).
Stream operators: There is a join between the RFID readers’ data stream
SelOpP takes as input a tuple si and returns si if the (S1) and a sliding window over the temperature sensors’
tuple conforms to the predicate condition defined in P . data stream (S2). It is an equi-join over the location prop-
P rojOpL takes as input a tuple si and returns the tuple erty. However, according to our assumption that there is one
s0i which only contains the attributes of si listed in L. gateway per section, this condition will always be true. As
Windowed operators: a result, this join operation will only couple each product
W AV Gattr takes as input a window and returns the av- with the temperature readings made during its presence in
erage values of attr attributes of the tuples present in the the section. Knowing that each product stays in one sec-
window. tion during one minute, the width of the window will be 60
W JoinJC takes two windows as input, and returns the seconds 2 . The window is aperiodic; its sliding rate is de-
concatenation of the tuples holding the join condition spec- termined by the products’ arriving rate. Sliding distance for
ified in JC. both edges of the window is the difference between arrival
According to our objective to represent different kinds times of two consecutive products. Therefore, the join is
of sensor data, we define a global common schema calculated between the tuples of S1 and the windows cre-
sensor stream: ated on stream S2. Such window creation operator uses the
< SId, location, type, measurement, timestamp > following description:
This schema is actually a view over different distributed desc2 = < t0 , t0 + 60, rate(n), dist(n), dist(n) >
databases located at different levels of the architecture (con- where t0 is the timestamp of the first tuple in stream
trol sites, gateways, proxies) and over the stream data of S1; rate(n) = dist(n) = S1[n + 1].timestamp −
sensors. Note that the first three attributes form the prop- S[n].timestamp; and S1[n] gives the current tuple in
erty attributes progress.
We note that, although, in our model we didn’t define
Let’s consider the following query Q: Which products in
joins between a stream and a window, we can consider
the production chain had undergone an average tempera-
the former as a position based tumbling window whose
ture more than 40◦ C during its presence in a section?
start parameter is 0, end parameter is 1, rate is aperiodic,
This query will be executed on the gateway of each sec-
tion. The partial results from gateways will then be col- 2 The smallest time unit is a second
uses the OSGI platform [2], thus adopts a service-oriented
approach. Data is collected by sensor services, and aggre-
gated by distributed query services on the gateways. Global
sensor stream query services at control sites discover and
query sensor stream data by intermediary of query services
and sensor services.
Our ongoing research concerns the management of sen-
sor farms. We have found out that continuous queries will
be executed simultaneously with update transactions mod-
ifying sensor properties. This will require specific transac-
tion management. We believe that temporal dimension of
operators introduced in SStreaM would lead us to a finer
management of continuous queries.
References