You are on page 1of 8

Environmental Modelling & Software 21 (2006) 1579e1586

www.elsevier.com/locate/envsoft

Database architectures: Current trends and their relationships to


environmental data management
Jaroslav Pokorny*
Charles University, Faculty of Mathematics and Physics, Department of Software Engineering, Malostranske nam. 25, 118 00 Praha, Czech Republic
Received 11 November 2005
Available online 27 June 2006

Abstract

Ever increasing environmental information demands from customers, authorities, and governmental organizations as well as new business
control functions are implemented and integrated to environmental information management systems (EIMSs). These systems are often based
on traditional file techniques or, more recently, on commercial database management systems (DBMSs). With a production of huge data sets and
their processing in real-time applications, the needs for environmental data management have grown significantly. Numerous examples from
practice of EIMSs prove that the architecture of DBMS should be open for a permanent evolution. Current trends in database development
and an associated research meet these challenges. New information and communication technologies and techniques influence todays DBMSs.
They include, among other things, sensor networks, stream processing, processing uncertain and imprecise data, knowledge discovery and in-
telligent data analysis, as well as wireless broadcast and mobile computing. Both research and practice indicate that the traditional universal
DBMS architecture hardly satisfies these trends and new solutions are needed. Rather separate specialized engines connected into networks
are beneficial. The paper discusses recent advances in database technologies and attempts to highlight them with respect to requirements of
EIMSs.
2006 Elsevier Ltd. All rights reserved.

Keywords: Environmental management system; Database management system; Sensor; Sensor network; Stream processing; Uncertain and imprecise data; Knowl-
edge discovery and intelligent data analysis; Wireless broadcast; Mobile computing

1. Introduction from customers, authorities and governmental organizations.


Recently, reflecting these demands, new business control func-
Without doubt the world of data is changing, particularly, tions are integrated to environmental management systems1
the nature and sources of information. All these changes (EMS). For their computerized part we can use the term envi-
have a significant influence on database needs, and conse- ronmental information system (EIS), if we address public envi-
quently, on questions where the database field is and where ronmental information systems, or environmental management
it should be going. Abiteboul et al. (2005) in their report em- information system (EMIS), if we deal with industrial environ-
phasize two main driving forces today: Internet and particular mental information systems. As data or information process-
sciences, like the physical sciences, biological sciences, med- ing is primarily what we focus on, we will use the term EIS
icine, and engineering. These sciences produce large and com- through the paper.
plex data sets that require more advanced database support An important observation is that, similarly to the sciences
than current products provide. mentioned, EISs process also huge data sets, often continually
Another trend existing since the 1960s concerns the indus-
tries having faced ever increasing environmental demands
1
By LCA (2005) an EMS is a part of the overall management system that
includes organisational structure, planning activities, responsibilities, prac-
* Fax: 420 221914323. tices, procedures, processes and resources for developing, implementing,
E-mail address: pokorny@ksi.ms.mff.cuni.cz achieving, reviewing, and maintaining the environmental policy.

1364-8152/$ - see front matter 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2006.05.004
1580 J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586

and with triggering various control actions. Consequently, the formulation of queries based on common techniques as they
needs for environmental data management have grown are used for example in classical databases. Often we are
significantly. not able to formulate a query, e.g. in SQL, and despite of
Considering environmental data sets combined with, e.g., the fact we believe on the other hand that something interest-
business data, emails, documentations etc., adequate informa- ing is hidden in our data. In such situations a lack of semantics
tion integration mechanisms are needed. Since their beginning is apparent. To describe data semantics, metadata and its for-
the databases have had an integrative role in the world of data. mal description are necessary.
Reuter (2005) argues that the technological evolution of data- Data in collections considered creates an ideal platform for
base technology makes database systems even the ideal candi- using knowledge discovery methods and/or intelligent data
date for integrating all types of objects that need persistence as analysis. Also online analytical processing (OLAP), data
well as for supporting all the different types of execution that warehouses (DW), and data mining (DM) techniques can
are characteristic of the various application classes. help in this context.
The most important part of each management system deals The purpose of the paper is to present the main challenges
with data through querying. When users want to search and influencing todays database development with respect to the
use environmental information, the following problems occur processing environmental data. First, in Section 2 we discuss
(Tomasic and Simon, 1997): properties of new data sources. In Section 3 we repeat the con-
cepts of the classical centralized DBMS architecture, as it ex-
(1) Data do not exist or are insufficient; sometimes this may ists from early 1980s. According to Harder and Reuter (1983),
require synthesis or reproduction of data. the architecture models five-level abstraction hierarchy. Its im-
(2) Data are not referenced by data suppliers and therefore plementation has five technological layers that allow to sepa-
hard to locate, or data are referenced under specific classi- rate some problems and their solutions in relatively
fication criteria that are domain-specific. independent way. The main part of the paper presents five
(3) Data are hard to access; they are either private or of a too new technologies influencing database architectures in Section 4.
high cost, or requiring costly pre-processing (e.g., data They include sensor data and sensor networks, stream pro-
must be re-entered manually from paper documentation) cessing, approaching uncertain and imprecise data, knowl-
or format translation. edge discovery methods and intelligent data analysis, and
(4) Accessed data sets are hard to use because they are incon- wireless broadcast and mobile computing. In Section 5, we
sistent or non-compatible; for example, access to long argue that new DBMS architectures are needed, describing
time series but standard data collection techniques have briefly some of their proposals, and give several examples
not been applied, thereby making adjacent time series of their occurrences in practice. In conclusions we summa-
not compatible. rize the basic ideas given in the paper and add a number
(5) The quality of retrieved data is hard to assess; it is often of other issues that can influence processing environmental
hard to compare data produced using different scientific data.
models because of a lack of documentation about the un-
derlying computational processes. 2. New data sources

The database community focuses on information storage, Usual enterprise data stored in databases are structured and
organization, management, and access in software architec- can be described by a so-called (database) schema. Such
tures called database management systems (DBMSs). Always a schema is almost fixed or it is changed only rarely. It is
it is driven by new applications, technology trends, new syner- not the case of collections of scientific as well as environmen-
gies with related fields, and innovation within the field itself. tal data. By Reuter (2005), the key properties of these data col-
The problems (1)e(5) are a natural part of todays database re- lections (irrespective of the many differences) are the
search and development. A natural idea is that EISs based on following:
advanced database technologies could help to deal with these
issues.  The raw data is written once and never changes again. As
Several technological aspects influence DBMS develop- a matter of fact, some scientific organizations require for
ment. Focusing on the scientific data, it is often coming in all projects they support that any data that influences the
streams. The sensor networks producing the data consist of published results of the project be kept available for an ex-
very large numbers of low-cost devices, each of which is tended period of time, typically around 15 years.
a data source, measuring some quantity, e.g. the objects loca-  Raw data comes in as streams with high throughput (hun-
tion, or the ambient temperature. Processing such data is usu- dreds of MB/s), depending on the sensor devices. The
ally completely different from the data stored in enterprise streams have to be recorded as they come in, because in
databases. Data arrive in high-speed streams, and queries most cases there is no way of repeating the measurement.
over those streams need to be processed in an online fashion  For the majority of applications, the raw data is not inter-
to enable real-time responses. Moreover, in comparison to en- esting. What the users need are aggregates, derived values,
terprise data processing, these data are uncertain or imprecise. or e in case of text fields e some kind of abstract of
Other aspects of such data processing include unclear what the text says.
J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586 1581

 In many cases, the schema has hundreds or thousands of procedural access at L5 level provides tables and statements
attribute types, whereas each instance only has tens of at- for their manipulation, usually formulated in the SQL lan-
tribute values. guage. Among objects at L3 level we can find data structures
 The schema of the structured part of the database is not supporting indexing, e.g. well-known B-trees for character
fixed in many cases. As the discipline progresses, new strings and number or R-trees for spatial data. In general, go-
phenomena are discovered, new types of measurements ing upwards the objects and associated operations become
are made, units and dimensions are changed, and once in more complex and some additional integrity constraints can
a while whole new concepts are introduced and/or older occur.
concepts are redefined. All those schema changes have The concept of a multi-layered architecture considers its
to be accommodated dynamically. ideal implementation as a set of abstract machines where a ma-
chine of layer k 1 is implemented via a machine of layer k.
We will consider mainly data representing environmental Although the number five in the architecture considered is
objects and their relationships. Both objects and their relation- a good compromise, in practice performance problems occur.
ships are characterized by attributes. Spatial environmental Simplifying the complexity of layers on the one hand in-
objects (such as lakes, bridges, buildings, clouds, whales, trees creases the run-time overhead on the other hand. Conse-
and cars) have, e.g., a shape, and other attributes that can quently, various ways to optimize DBMS performance are
change over time, e.g. the water temperature in a lake, the po- developed and the number of layers is reduced for some sys-
sition of a whale, etc. That is why time and space are impor- tem functions.
tant components of an environmental system. Recently, The development of the layer L5 during last 10 years re-
environmental data transmission is supported by wireless net- sulted in specification of so-called object-relational (OR)
work technology. data model. Its part is standardized in the standard
Obviously, it is not surprising that in many cases only tra- SQL:1999 (ISO, 1999) and, recently, in SQL:2003 (ISO,
ditional file-oriented solutions are at disposal for collections of 2003). In the OR model tables can have structured components
objects considered. For example, the CORIE (Columbia River of their rows, columns can be even of a user-defined type. Spa-
Estuary) system based on three forms of data (scientific data, tial data, time series or texts belong to this category. Such ex-
catalogue data, and task data) produces in its simulations 5 GB tensible approach resulted in the so-called universal DBMSs
of forecasted data each day (Bright and Maier, 2005). Never- in late 1990s. The core of these engines has been extended by
theless, its Metadata Repository is schema-less, no file for- loosely coupled additional modules (components) for each
mats, database access libraries, or XML schemas need be new data type. The vendors of leading DBMSs call these com-
agreed upon. In connection with Internet, Web services, and ponents extenders, datablades, and cartridges, respectively. For
EISs, such solutions seem to be unsustainable. example, spatial and text components belong to the most suc-
cessful among many others.
3. Layered architecture of DBMS The possibility of user-defined types has introduced a lot of
serious problems into implementation of the DBMS architec-
Everybody using e.g. a relational database is aware of the ture. For some such types, e.g. video, image, text, and audio,
fact, that tables of data occurring on the top of a database sys- there are standardized sets of predicates and functions for ma-
tem are virtual in some sense. More specifically, they provide nipulating their instances, but an open problem remains how to
a logical data structure suitable for user-oriented processing integrate these types into a common framework in DBMS ar-
data in the database. In early 1980s, Harder and Reuter chitecture. The implementation of new access paths, like new
(1983) proposed a mapping model consisting of five layers. types of indexes, usually results in modifications of the DBMS
Table 1 adopted from Harder (2005) shows these five layers kernel, e.g. SQL compiler, queries optimizer, etc. Such
in detail. We can observe objects to be dealt with at each changes are very expensive, time consuming, and error prone
abstraction level and particular functions implementing map- to implement and test new access methods and user-defined
pings between two consecutive layers. For example, the non- types, e.g. contiguous data flow from streaming data sources.
Each vendor uses a different approach to open the host sys-
Table 1 tem architecture to a certain degree. Oracle cartridges are re-
Description of the five-layer DBMS mapping hierarchy stricted to secondary index integration. In IBM DB2
Level of abstraction Objects Auxiliary mapping data extender, there is a framework for indexing data of new data
L5 Non-procedural Tables, views, Logical schema types restricted only to B-trees. It means evaluation of only
access rows description some types of queries can be improved with such indexing.
L4 Record-oriented, Records, sets, Logical and physical In other words, a new functionality is supported, but only
navigational access hierarchies, schema description
for limited class of user requirements.
networks
L3 Record and access Physical records, Free space tables, It seems that contribution of such software is mainly in the
path management access paths DB-key translation cases of requirements that can be decomposed into relatively
tables independent parts evaluated separately in the DBMS core
L2 Propagation control Segments, pages Buffers, page tables and the module implementing a particular data type. In other
L1 File management File, blocks Directories
words, these frameworks are either too complex or not flexible
1582 J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586

enough to cope with the wide range of requirements in do-


Sensor Network Server
main-specific access methods. A real, seamless integration
could be hardly achieved with most of these attempts. In other
words, current implementations of layered DBMS architecture
are not sufficient and fail in the case of universal DBMSs.
The other issue of traditional solutions is that they are avail-
able mainly for static applications. Because of the inherent
space and time components of environmental data, an environ-
Base Station Base Station
mental system can be implemented on top of a spatio-temporal
DBMS. Unfortunately, such database software is also under
development today.

4. New technologies influencing database architectures

For data representing environmental objects a number of Sensor Nodes


new technologies are relevant. Some of them, e.g. sensor
data and sensor networks, stream processing, approaching un- Fig. 1. A multi-layered sensor network architecture.
certain and imprecise data, knowledge discovery methods and
intelligent data analysis, and wireless broadcast and mobile a communications network. Elements of the network architec-
computing are the same as those influencing the database de- ture will make decisions about what data to pass on, such as
velopment in general. We discuss them shortly in the follow- local area summaries and filtering in order to minimize power
ing subsections. use while maximizing information content.
Sensor information processing raises many of the most in-
4.1. Sensor data and sensor networks teresting database issues in a new environment, with a new set
of constraints and opportunities. Huge data sets of environ-
A sensor network is designed to transmit the data from an ar- mental data generated by sensors will be distributed through-
ray of sensors to a data repository on a server. Sensor networks out the world, and can come and go dynamically. For example,
are based on inexpensive micro sensor technology, which will the Earth Observing System (EOS) of NASA is a collection of
enable most environmental objects to report their temperature, satellites producing data regarding atmosphere, oceans, and
pressure, state or location, e.g. via a global positioning system, land about 1/3 of a petabyte of information per year. Since ter-
in real time. These small, battery-powered devices are placed in abytes of data from individual nodes will soon be the norm,
areas of interest, e.g., in the soil, across the rain forests, or even new requirements on computational and data management
a glacial area to track global warming and climate change. Each infrastructures appear. For example, DataDirects enterprise-
sensor node collects environmental data primarily about its im- class S2A 6000 Silicon Storage Appliance directly supports
mediate surroundings. These data can support applications up to 512 workstations and 180 terabytes of storage.
whose main purpose is to monitor the objects attributes. Various From one perspective, sensor networks are similar to dis-
environmental data can be collected, analyzed to forecast the up- tributed databases, but with inherent real-time properties.
coming phenomenon, and send prompt warnings. One important difference is that the evaluation rate of data
The sensors are generally self-powered, wireless devices produced in a sensor network is much higher than typically
with limited processing speed, storage capacity, and communi- considered in distributed DBMSs. This breaks the traditional
cation bandwidth. Such a device draws far more power when information integration paradigm, since there is no practical
communicating than when computing. Thus, when querying way to extract and load data into a common database to
the information in the network as a whole, it is often prefera- each such occurrence. Also strategies of query optimization
ble to distribute as much of the computation as possible to the and query processing must be redefined.
individual nodes. Some system architectures for environmen- There are a lot of examples of such networks in practice.
tal monitoring include a base station equipped by a (relational) For example, Mainwaring et al. (2002) mention experiments
DBMS which communicates with wireless sensor networks. with environmental monitoring in the context of two wildlife
The base station ensures an access and a control from remote habitats: Great Duck Island and James Reserve. Based on
users. With a server behind, a many-layered architecture can the requirements from the researchers studying these habitats,
look similarly to that in Fig. 1. In fact, the network becomes they propose a sensor network architecture for this class of ap-
a new kind of database machine, whose optimal use requires plications. On a much larger scale, the development of Envi-
operations to be pushed as close to the data as possible. In ronmental Observations and Forecasting System (EOFS)
a more complicated case, sensors and/or users of the sensor combines real time in situ monitoring with distribution net-
networks can be even mobile. works that carry data to centralized processing sites. One ex-
Sensor networks provide important data sources and create ample of this is the above cited CORIE project. The
new data management requirements. For example, they do not FLOODNET project (Envisense, 2004) plans to provide a flood
necessarily use a simple one-way data stream over warning in the UK.
J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586 1583

In general, there is a need to design flexible, lightweight da- incompleteness, inconsistency, vagueness, imprecision, and
tabase abstractions that are optimized for data movement as error. Worboys (1998) associates these notions typically with
opposed to data storage (Stonebraker and Cetintemel, 2005). spatial data. Incompleteness is related to totally or partly miss-
ing data: the prototypical situation of this kind is when a data
4.2. Stream processing set is obtained from digitizing paper maps and pieces of lines
are missing. Inconsistency arises when several versions of the
As mentioned in Section 4.1, sensors can produce continu- same object exist, due either to different time snapshots, or
ous, possibly infinite, streams of data. EISs based exclusively data sets of different sources, or different abstraction levels.
only on the traditional store-and-query model cannot handle Vagueness is an intrinsic property of many natural geographic
the volume and velocity of streaming data, whose values might features that do not have crisp or well-defined boundaries.
exist a moment. There are a growing number of applications, Imprecision is due to a finite representation of spatial entities:
monitoring e.g. an environment, where DBMSs are used to cre- the basic example of this kind is the regular tessellation used in
ate a near-real-time image of some critical parts of the environ- raster data, where the element of the tessellation is the smallest
ment. Comparing to the streams of usual scientific data that unit that represents space. Scientific measurements have stan-
typically run at a constant speed, monitoring applications dard errors. Error is everything that is introduced by limited
must be able to accommodate significant fluctuations in the data. means of taking measurements. For example, location data
Traditional DBMSs are unsuited to deal with such streams for moving objects involve uncertainty in current position.
for various reasons (Amato et al., 2004): Individual sensors are not reliable and, consequently, wire-
less communication is also unreliable. Thus, various approaches
 sensor nodes produce and deliver data continuously with- are used to provide more accurate estimation of the environ-
out receiving requests for that data, ment. In multisensor data fusion, approaches like fuzzy sets
 queries over collected data can be less frequent than data or DempstereShafer evidential theory are sometimes used
insertions, (Ramamritham et al., 2004). Sequences and images require
 produced data has often to be processed in real-time approximate processing based on similarities, metrics, etc.
because it can represent events that need a rapid answer, Another source of using techniques based on similarities is the
 queries run continuously because data streams never ter- information retrieval area. Considering environmental data
minate, so that they can see system conditions change dur- equipped by metadata expressed by text strings, we can take
ing their execution, also these methods into account. An excellent survey of similar-
 because of storage constraints, an entire stream cannot be ity measures used which are applicable on environmental data is
stored in the disk, presented in Nunez et al. (2004).
 because data streams are possibly infinite, only non-block- Traditional DBMSs were applied to business data process-
ing operators can be used, and ing, which typically focused on numbers and character strings.
 if the data to be processed is not available, then operators In those application areas, data elements are precise quantities
must process data only when nodes make it available. like address, quantity on hand, balance, status, and delivery
date. As a result, current DBMSs have no facilities for either
In consequence, stream processing is not a data management approximate data or imprecise queries.
task; it is a data-filtering task. New architectures, so called data Huge data sets and their imperfect nature produce a number
stream processing systems (DSPS), have emerged; see e.g. Car- of direct consequences for computing in general. They include
ney et al. (2002). A rather restricted solution, stream-processing (Cohen, 2005):
engine (SPE), is an example of a new database architecture that
enables the execution of queries, computations, and actions on  the notion of practical complexity must be revised in the
streaming data in real time. Such SPE should accept SQL-like sense that any above-linear algorithms might be too time
queries, stream-oriented, continuous queries and execute them consuming; one may even avoid algorithms having large
over live event streams with outputting results in real time. coefficients of linearity,
In SPEs most of the data processing is processed in main mem-  the processed results should reflect existing data
ory, read or write operations to storage is optional, and can be imperfections,
handled asynchronously in many cases.  the ability to perform pre-processing and use incremental
For example, in a recent pilot program, Streambase devel- algorithms will become essential approaches in reducing
oped by Stonebraker (StreamBase Systems, Inc., 2005) should computing times, and
be able to analyze 140,000 messages/s, while a leading rela-  approximate solutions may be the only resort for solving
tional DBMS could handle only 900 messages/s. large complex problems.

4.3. Approaching uncertain and imprecise data


4.4. Knowledge discovery and intelligent data analysis
In addition to data management issues of environmental
data in data streams, many other problems arise. There are dif- Environmental data often need to be analyzed in order to
ferent sources that cause information to be uncertain: obtain information necessary for environmental management
1584 J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586

decisions. Environmental Decision Support Systems (EDSS) based services, which exhibit strong temporal and spatial
are often mentioned in this context. EIS and EDSS are major locality in that clients within the neighbourhood and a certain
building blocks in environmental management and environ- time period tend to seek the same kind of information (Zheng
mental science today. EIS and EDSS are usually said to and Lee, 2005).
have certain characteristics, which distinguish them from stan- The data to be broadcast includes also sensor data. Sensors
dard information systems, e.g. information complexity in time deployed in the environment can broadcast their data periodi-
and space or incompleteness or fuzziness of data items cally or when interesting events happen. Unlike to traditional
(Denzer, 2005). The authors of the project GESCONDA men- computing, client devices cannot make requests to sensors for
tion in Gibert et al. (2005) the high quantity of information the data. Instead, client devices just listen to the broadcast
and knowledge patterns that are implicit in large databases channels passively. Thus, the sensors have the initiative in
coming from environmental domains, specially oriented to en- communication. Sensors may broadcast data periodically, if
vironmental databases. they are measuring a continuous phenomenon producing envi-
Particularly, DM methods are suitable for this purpose. His- ronmental data, or may broadcast data only when a particular
torically, DM has focused on efficient ways to discover models event occurs, if they are detecting whether an RFID tag2 has
of existing data sets. These models must expose some useful just come into range.
aspects of the data, while obscuring details not useful for the Higher-level sensors in a sensor network can pre-process
intended application. In comparison to simple forms of regu- low-level sensor data and then broadcast this derived informa-
larities/dependencies treated by statistical methods, DM tion to client devices. Such processing can require modified
methods can find more complex hypotheses that include database techniques to be successful.
both numerical and logical conditions. Algorithms have been Since environmental data require often to be disseminating
developed by many research communities to perform such op- timely to the user anytime and anywhere, a mobile environ-
erations as classification, clustering, association-rule discov- ment is of increasing importance in this context. Particularly
ery, and summarization. These techniques are now part of in periodic broadcast, data are broadcast periodically on a wire-
mainstream products from the major DBMS vendors and less channel. A mobile client listens to the broadcast channel
most of them are applicable in EISs. and downloads the desired data from the channel according to
OLAP or DW techniques are often sufficient. For example, a query issued from the user or a stored profile of interest on
temperature and pressure trends are required in an environ- the client. Of course, these networks should be also able re-
ment. Derivation of such information typically requires past spond to aperiodic queries.
temperatures and pressures stored in a database and processed Besides this, mobile devices introduce yet another category
along the time dimension. Often multidimensional data struc- of application (Seltzer, 2005): caching relevant portions of
tures are used in the context of such applications. Data in a larger data set on a smaller, low-functionality device. One
OLAP and DW systems are processed by columns rather can think of a mobile device as a cache of a global data set.
than by rows. Data processing uses also special indexing tech- This model has attractive properties e in particular, the ability
niques like bitmap indexes and various trees. Although a lot of to augment the local data set with entries as they are used or
suitable data structures have been developed during last 20 needed. Mobile telephony infrastructure requires similar cach-
years, only few of them, e.g. UB-trees and M-trees, are inte- ing capabilities to maintain communication channels to the de-
grated into commercial DBMSs. Rather specialized engines vices. The access pattern observed in these caches is also read
absorb them. The architecture with two engines united by mostly, and the data itself is completely transitory; it can be
a common parser occurs in practice. Classical transaction da- lost and regenerated if necessary.
tabase and DW database are stored separately and viewed as We observe that location becomes a very important prop-
one database. erty of data and introduces a new dimension to data access
Recent interests in combining DM technology with DBMSs methods. Traditional data access methods are not suitable
require new approaches to storage data sets to be mined and to for such computing and new researches redefine some well-
optimize DM processing. New research directions include: known techniques, e.g. spatial queries, in the mobile environ-
ment with a particular emphasis on broadcast data.
(1) multi-dimensional OLAP for discovering unusual patterns
in stream data;
(2) mining clusters and outliers in stream data for discovering 5. Towards new database architectures
unusual patterns; and
(3) single-pass classification methods for stream DM. Database technology seems to be fundamental for a deploy-
ment of technologies presented in Section 4 in context of EISs.
Some attempts to influence the development of EISs by database
4.5. Wireless broadcast and mobile computing specialists exist even from the past. For example, the Sequoia
2000 (Stonebraker, 1994) project speaks about collaboration
Data broadcast is an attractive alternative to on demand ac-
cess because it can broadcast data simultaneously to a large 2
RFID (Radio-frequency identification) is the latest technology based on
number of clients at a fixed cost. It is suitable for location- radio waves that is useful for precisely identifying objects.
J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586 1585

between computer scientists and environmental researchers to 6. Conclusions


design a next-generation information system for managing
data for global change research. The primary challenges with Environmental data management, analysis, and communi-
a database approach are flexibility without complexity and cation are essential components of environmental character-
ease of use. Moreover, a database approach brings the opportu- ization and decision making. DBMSs, the Internet, and
nity to link all data together on a user level, and it will make all associated Web technologies have become an integrating force
analysis of the data easier, e.g. via the database technology like for these components.
DM. According to Selinger (2005), data research challenges for
A common view on issues mentioned in Section 4 concerns the the next decade include apart from other things the following
DBMS architecture. In fact, todays DBMSs provide a universal tasks:
architecture applicable to a lot of various types of tasks. By words
of Stonebraker and Cetintemel (2005) one size fits all. In new  re-examine DBMS architecture and invent ways to scale
architectures of DBMSs, separate engines rather made to mea- more and better, without sacrificing user-visible availabil-
sure are supposed according to requirements of various applica- ity or performance,
tions. Besides rather traditional applications e OLAP, data  learn what managing content is all about, what is needed
warehouses, and text retrieval, another candidates for a separate and create new models,
engine are:  treat metadata as a first class research.

 stream processing, We have focused mainly on the first issue. The others can also
 sensor networks, improve accessibility and availability of environmental data.
 scientific data bases, Approaching uncertain and imprecise data as well as knowledge
 native XML databases. discovery and intelligent data analysis requires new models and
semantic annotations of the data. Some specific approaches al-
We have tried to highlight some characteristics of the first ready exist. For example, to increase environmental data quality
three technologies with respect to their association to environ- new information processing occurs that preserves and retrieves
mental data management. the origins and processing history d that is, the lineage d of
Considering native XML databases, solutions with separate objects and processes (Bose and Frew, 2005). To ensure that
engines are popular today. Harder (2005) presents XTC archi- the greatest use is made of environmental data, data producers
tecture (XML Transaction Controller) which proves that native should include data lineage (and authenticity information) in
XML DBMS can be implemented along the lines of five-layer the metadata. On a database level, this requires more sophisti-
architecture (see Table 1). Also a possibility of a hybrid engine cated techniques for metadata processing.
occurs. To integrate relational and XML data, IBM develops Everything indicates that the development of new database
a new hybrid DB2 DBMS enabling to work on a truly native technologies has and will have consequences which will affect
XML store that sits side by side with DB20 s relational data re- EISs of the future.
pository. On top of both data stores (relational and XML) sits
one hybrid database engine. Similar solution is used by most Acknowledgement
vendors combining a data warehouse DBMS and a usual on-
line transaction processing DBMS, which are united by a com- This research was supported in part by the National
mon parser. Such architecture can be inspiring for programme of research (Information society project
implementation of other data types too. 1ET100300419).
Another approach evolves the original idea of DBMS ex-
tensibility. Acker et al. (2005) developed an Access Manager
specification, a new programming interface to several layers References
of a DBMS kernel. This enables the programmer to add new
Abiteboul, S., Agrawal, R., Bernstein, P.A., Carey, M.J., Ceri, S., Croft, W.B.,
data structures to the DBMS with a minimum of effort. DeWitt, D.J., Franklin, M.J., Garcia-Molina, H., Gawlick, D., Gray, J.,
There is also the third approach to achieve a flexibility of Haas, L.M., Halevy, A.Y., Hellerstein, J., Ioannidis, Y.E., Kersten, M.L.,
processing data in a database way: to produce a storage en- Pazzani, M.J., Lesk, M., Maier, D., Naughton, J.F., Schek, H.-J.,
gine that is more configurable so that it can be tuned to the Sellis, T.K., Silberschatz, A., Stonebraker, M., Snodgrass, R.T.,
requirements of individual applications (Seltzer, 2005). There Ullman, J.D., Weikum, G., Widom, J., Zdonik, S.B., May 2005. The Low-
ell Database Research self-assessment. Communications of the ACM 48
are fundamentally two properties that a solution must possess (5), 111e118.
to address the wide range of application needs emerging Acker, R., Pieringer, R., Bayer, R., 2005. Towards truly extensible database
today: modularity and configurability. A modular DBMS en- systems. In: Proceedings of DEXA 2005 Conference, LNCS 3588.
gine must allow the developer to use or exclude major subsys- Springer-Verlag, pp. 596e605.
tems depending on whether the application needs them. The Amato, G., Caruso, A., Chessa, S., Masi V., Urpi, A., 2004. State of the art and
future directions in wireless sensor networks data management, 2004-TR-
DBMS must be also configurable to its operating environ- 16, ISTI.
ment: the specific hardware, operating system, and applica- Bose, R., Frew, J., 2005. Lineage retrieval for scientific data processing: a sur-
tion using it. vey. ACM Computing Surveys 37 (1), 1e28.
1586 J. Pokorny / Environmental Modelling & Software 21 (2006) 1579e1586

Bright, L., Maier, D., 2005. Deriving and managing data products in an environ- Mainwaring, A., Polastre, J., Szewczyk, R., and Culler D., 2002. Wireless sen-
mental observation and forecasting system. In: Proceedings of Conference sor networks for habitat monitoring. Intel Research Berkeley, IRB-TR-
on Innovative Data Systems Research (CIDR), January 2005. pp. 162e173. 02e006.
Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Nunez, H., Sanchez-Marre, M., Cortes, U., Comas, J., Martnez, M., Rodr-
Stonebraker, M., Tatbul, N., Zdoni, S., 2002. Monitoring streams e a new guez-Roda, I., Poch, M., 2004. A comparative study on the use of similar-
class of data management applications. In: Proceedings of the 28th Inter- ity measures in case-based reasoning to improve the classification of
national Conference on Very Large Data Bases. Morgan Kaufmann Pub- environmental system situations. Environmental Modelling & Software
lishers, pp. 215e226. 19 (9), 809e819.
Cohen, J., 2005. Updating computer science education. Communications of Ramamritham, K., Son, S.H., Dipippo, L.C., 2004. Real-time databases and
the ACM 48 (6), 29e31. data services. Real-Time Systems 28, 179e215.
Denzer, R., 2005. Generic integration of environmental decision support systems e Reuter, A., 2005. Databases: the integrative force in cyberspace. In: Data
state-of-the-art. Environmental Modelling & Software 20 (10), 1217e1223. Management in a Connected World, LNCS 3551. Springer Verlag,
Envisense, 2004. FloodNet: pervasive computing in the environment. Avail- pp. 3e16.
able at: <http://envisense.org/floodnet/floodnet.htm>. Selinger, P., 2005. Five data challenges for the next decade. In: Key note of the
Gibert, K., Sanchez-Marre, M., Rodrguez-Roda, I., 2005. GESCONDA: an Conference ICDE, held in April 2005, Tokyo, Japan.
intelligent data analysis system for knowledge discovery and management Seltzer, M.I., 2005. Beyond relational databases. Databases 3 (3), 50e58.
in environmental databases. Environmental Modelling & Software 21 (1), Stonebraker, M., 1994. Sequoia 2000 e a reflection on the first three years.
115e120. Sequoia technical report S2K-94-58, Berkeley, CA. Available at: <http://
Harder, T., Reuter, A., 1983. Concepts for implementing a centralized database epoch.cs.berkeley.edu:8000/sequoia/techreports/s2k-93-23/>.
management system. In: Proceedings of International Computing Sympo- Stonebraker, M., Cetintemel, U., 2005. One Size Fits All an idea whose
sium on Application Systems Development, March 1983. B.G. Teubner- time has come and gone. In: Proceedings of the Conference ICDE, April
Verlag, Nurnberg, pp. 28e104. 2005, Tokyo, Japan. pp. 2e11.
Harder, T., 2005. DBMS architecture e still an open problem. In: Proceedings StreamBase Systems, Inc., 2005. StreamBase 2.0. Available at: <http://
of BTW, Karlsruhe, March 2005. pp. 2e28. www.streambase.com/index.html>.
ISO, 1999. Information technology e database languages e SQL e Part 1: Tomasic, A., Simon, E., 1997. Improving access to environmental data using
framework (SQL/framework). ISO/IEC 9075-1:1999. context information. ACM SIGMOD Record 26 (1), 11e15.
ISO, 2003. Information technology e database languages e SQL e Part 2: Worboys, M.F., 1998. Imprecision in finite resolution spatial data. GeoInfor-
foundation (SQL/foundation). ISO/IEC 9075e2:2003. matica 2, 257e279.
LCA. Glossary. Available at: <http://www.lineadecreditoambiental.org/html/ Zheng, B., Lee, D.L., May 2005. Information dissemination via wireless
glossary.html>. broadcast. Communications of the ACM 48 (5), 105e110.

You might also like