Professional Documents
Culture Documents
1 Introduction
It as an unquestionable fact that data is of crucial importance for the geospatial
sector, with emphasis on the aspects of the data acquisition, distribution,
processing, analysis and visualization. There is exponential increase in the amount
of geospatial information available in highly heterogeneous formats. The geospatial
community considering all segments, such as industries, governmental institutes,
organizations, foundations, authorities, or the academic world has to react to this
huge development, from data storage needs to the implementation of new
approaches for the automatic extraction of information from raw datasets.
by the author(s)
1.1
Predictive modelling
www.ogrs-community.org
location is suitable land for agriculture activity, water and the source of material for
everyday usage among others. It is assumed that:
1. The human colonization of the landscape is not random,
2. The scattering of the found archaeological sites can be modeled by certain
characteristics on the natural and cultural environment (Verhagen 2007).
It follows that using GIS and geostatistical analysis a certain portions of the
landscape can be found where their probability of occurrence is high.
1.2
Recent solutions
Both, for Academic research in Archaeology and for CRM require and claim for
methods to predict and explain the reason for location of various archaeological
sites is of great importance. In GIS the predictive modeling applied in archaeology
is one of the first more complex operations that were researched. CRM and
academic use traditionally require two different approaches. In CRM the primary
focus is to able to predict in a certain time the probability of the presence of
archaeological sites, the need to understand why it is there is secondary. In
academic research the central issue is to understand the reasons and mechanisms
behind the choosing of a certain portion of the landscape. The predictive approach
can be expressed as looking for locational factors and the interpretive as
understanding locational choice factors. In the creation of predictive models also
distinguished.
Inductive (CRM) or Deductive (academic) mostly differs on they applying the
known sites locations or not, can also be viewed as the difference of using
deduction from theory and induction from observations. But they are most
adequately described as data driven and theory driven (Wheatley & Gillings 2002)
Data driven models are applied to see if a relationship can be found between a
sample known archaeological sites and a selection of landscape characteristics
(environmental factors) if the correlation found then extrapolated to larger area (if
is possible). Theory driven starts by formulating a hypothesis on the location
preferences, and selecting weighting the appropriate landscape parameters
(Verhagen et al. 2009).
For input data all models uses environmental factors as a thematic layer or a
derived geostatistically processed data such as: elevation, slope, aspect, solar
radiation, distance to water, vertical distance to water, cost distance to water,
distance to water confluences, topographic variation (DEM, DSM), soils (Proximity
to pottery clay), land use, distance to historic trading paths, proximity to firewood,
and proximity to transport routes (Wescott & Mehrer 2007).
Decisions can then be made by comparing objectives or, at the lowest level of
the hierarchy, by comparing attributes. Attributes are measurable quantities or
qualities of a geographical entity or of a relationship between geographical entities.
OGRS 2014
The widely used modelling methods are developing concerning the technical
progress of data-mining techniques. The base of the modeling methods are
Bayesian statistics (weights of evidence), regression analysis, multi-criteria analysis,
supervised classification/ unsupervised classification (or cluster analysis), fuzzy
logic, artificial neural networks, and recently random forest modelling.
In the modeling process between the input data (location of the archaeological
sites) and the different types of raster data (digital elevation model, soil map), is
being evaluated if they have statistical correspondence - either negative or positive.
2 Distributed approach
Data may arrive from different sources independently in different formats resulting
in a big heterogeneous set of data. Relational data stores are not prepared for this
kind of cases. However most of the time, we are curious, if there is any relation or
bond between these data, hence we are looking for interrelationship among data
sets. Data is that they are stored permanently and seldom modified.
www.ogrs-community.org
http://hadoop.apache.org/
OGRS 2014
Hadoop is completely modular. Its core components are HDFS and Mapreduce
which is required, others are optional depending on your need. Some of the widely
used components are HBase, Hive, Pig, and Sqoop.
3 IQmulus
The objective of the IQmulus research project2 is to enable the use of large,
heterogeneous geo-spatial data sets (Big Geospatial Data) for better decision
making through a high-volume fusion and analysis information management
platform. The project, funded by the 7th Framework Programme of the European
Union, is now after the first year from its four-year duration. To ensure user-driven
development, it implements two test scenarios: Maritime Spatial Planning, and
Land applications for Rapid Response and Territorial Management. The
contribution of large number of users from different geospatial segments,
application areas, institutions and countries have already been achieved. The
incorporated project partners represent numerous different facets of the geospatial
world to ensure a value-creating process requiring collaboration among academia,
authorities, national research institutes and industry. Providing improvements for
researchers, developers, academic staff and students allow them to sharing
knowledge, implement new solutions, methods and ideas in the field of
geoinformation through the development and the dissemination of a new platform.
Furthermore, the research project offers new practices for end-users, following the
continuous evolution the big geospatial data management requirements. As end
users are among those providing geospatial support for decision makers, the whole
decision making process can benefit from improvements and involvement of new
practices.
User requirements are driving the development of the IQmulus solutions
through discussing basic questions and problems, expectations and visions about
how a high-volume fusion and analysis information management platform would
be implemented and how would it provide solutions better than their actual
working processes. Users were interviewed on their current workflows as well as
their envision how those could be resolved in an ideal or even futuristic geospatial
world. For example, applications related to Rapid Response and Territorial
Management has been discussed with municipalities, disaster management, water
management and spatial planning authorities or universities research groups. In
addition, they were asked to assess their existing approaches (in-depth analysis of
strengths and weaknesses, e.g., lack of data, lack of computational resources, lack of
high capacity operational system, lack of faster modelling approaches, lack of free
https://www.iqmulus.eu/
www.ogrs-community.org
open source geospatial technology etc.) to find out where the IQmulus platform can
yield significant improvements.
As a consequence of user requirements identification, in the current
development phase the creation of concrete processing services has started. Services
consist of algorithms and workflows focusing on the following aspects: spatiotemporal data fusion, feature extraction, classification and correlation, multivariate
surface generation, change detection and dynamics. In the following phases, system
integration and testing will be done, and the IQmulus system will be deployed as a
federation of these services based on the above mentioned two scenarios. Actual
deployment will be based on a distributed architecture, with platform-level services
and also basic processing services invoked from clusters and conventional
computing environments. Concerning workflow definition and execution, the
concept is to develop a set of domain-specific languages that makes the definition
of several types of spatial data processing and analysis tasks simpler. They can be
compiled to run on parallel processing environments, such as GPGPUs, Cloud
stacks, or MapReduce implementations such as Hadoop. The compiled algorithms
can be packaged as declarative services to run on the respective environment they
have been compiled for. The Apache Hadoop open source framework has already
been chosen as a basis for architecture development. This framework offers the
execution of large data sets across clusters of computers using simple programming
models. It is supplying the next- generation cloud-based thinking and is designed
to scale up from single servers to hundreds of workstations offering local
computation and storage capacity. A Hadoop Distributed File System (HDFS) has
already be installed in cloud environment, providing high-throughput access to
application data. The Hadoop MapReduce system for parallel processing of large
data sets is currently under implementation. Thanks to this solution, users can
achieve rapid response to their needs used by their own datasets in cloud
processing by exploit the system capacities.
Despite of all the features provided by Hadoop eco-system, large raster-based
data storage and processing is not efficient on HDFS. In order to solve this problem
further research and investigation is needed by applying segmentation and
clustering algorithms on raster images based on HDFS and MapReduce method.
4 Conclusion
In conclusion working on the above mentioned open issues, IQmulus aims at the
development and integration of an infrastructure and platform that will support
critical decision-making processing in geospatial applications using Open source
solutions at the first opportunity. Its obvious that IQmulus is not applicable for
every decision support problems. Achievements and further development
directions with benchmark data will be presented in the paper through a concrete
application example arising from user requirements.
OGRS 2014
Acknowledgments
This research is founded by the FP7 project IQmulus (FP7-ICT-2011-318787) a high
volume fusion and analysis platform for geospatial point clouds, coverages and
volumetric data set. The research was also supported by the Hungarian Institute of
Geodesy, Cartography and Remote Sensing (FMI), Department of Geoinformation
and special thanks to Dr. Dniel Kristf the head of the department.
References
Hasenstab, R.J. & Resnick, B. (1990), GIS in historical predictive modelling: The Fort
Drum Project. In Interpreting Space: GIS and Archaeology. Kathleen M. S. Allen,
Stanton W. Green, and Ezra B. W. Zubrow, eds. Pp. 284-306. London: Taylor &
Francis.
Kohler, T.A. & Parker, S.C. (1986), Predictive Models for Archaeological Resource
Location. In: M.B. Schiffer (eds.), Advances in Archaeological Method and
Theory 9, Orlando, Academic Press, pp. 397452.
Soonius, C.M. & Ankum, L.A. (1990), Ede II: Archeologische Potentiekaart, RAAP
rapport 49, Stichting RAAP, Amsterdam.
Verhagen P. (2007), Case Studies in archaeological predictive modelling. Doctoral
Thesis, Leiden University Press,
URL: https://openaccess.leidenuniv.nl/handle/1887/11863.
Kamermans, H., van Leusen, M. & Verhagen, P. (Eds.) (2009), Archaeological
prediction and risk management. Leiden: Leiden University Press.
Wescott L.K. & Mehrer, W.M. (2007), GIS and Archaeological site location
modeling, Taylor and Francis, FL. 58-89.
Wheatley, D. & Gillings, M. (2002), Spatial Technology and Archaeology: the
archaeological applications of GIS. London: Taylor & Francis.
www.ogrs-community.org