This paper provides an assessment of compatibilities and differences between the CORINE2000 and GLC2000 datasets. Efforts are underway to link the new European global land cover map (GLOBCOVER) with the existing global land cover 2000 map (GLC2000) and European CORINE mapping initiative.
This paper provides an assessment of compatibilities and differences between the CORINE2000 and GLC2000 datasets. Efforts are underway to link the new European global land cover map (GLOBCOVER) with the existing global land cover 2000 map (GLC2000) and European CORINE mapping initiative.
This paper provides an assessment of compatibilities and differences between the CORINE2000 and GLC2000 datasets. Efforts are underway to link the new European global land cover map (GLOBCOVER) with the existing global land cover 2000 map (GLC2000) and European CORINE mapping initiative.
K. Neumann a, * , M. Herold b , A. Hartley c , C. Schmullius d a Wageningen University, P.O. box 47, 6700 AA Wageningen, The Netherlands b Friedrich-Schiller-University Jena, Loebdergraben 32, 07743 Jena, Germany c Joint Research Centre of the European Commission, Via Enrico Fermi, 1, Ispra, Italy d Friedrich-Schiller-University Jena, Loebdergraben 32, 07743 Jena, Germany Received 11 November 2005; accepted 19 February 2007 Abstract Given the current lack of interoperability between global and regional land cover products, efforts are underway to link the new European global land cover map (GLOBCOVER) with the existing global land cover 2000 map (GLC2000) and European CORINE mapping initiative. Since both datasets apply different mapping standards, key for a successful implementation is a thorough understanding of the heterogeneities among both datasets. Thus, this paper provides an assessment of compatibilities and differences between the CORINE2000 and GLC2000 datasets. The comparative assessment considers inconsistencies between the thematic legends (using the UN land cover classication system-LCCS), class specic accuracies, and the spatial resolution and heterogeneity of the datasets. The results are summarized with implications for the development of the newGLOBCOVERdatasets. # 2007 Elsevier B.V. All rights reserved. Keywords: Land cover; Interoperability; Harmonisation; GLC2000; CORINE2000; GLOBCOVER 1. Introduction The increasing need for comprehensive and reliable information on land cover and land cover dynamics has led to the development of several global land cover datasets derived from satellite Earth observation. Their development was driven by different national or international initiatives and programmes, and the variety of mapping standards reects the wide range of interests, requirements and methodologies of the originating programmes (Herold et al., 2006a). Available data products include the global land cover for the year 2000 (GLC2000, Bartholome and Belward, 2005) and the European dataset CORINE2000 (JRC, 2004; EEA, 2005a). A new global land cover dataset will be GLOBCOVER derived from ENVISAT-MERIS data for the year 2005. So far, there is only limited compatibility and comparability between different land cover maps and their thematic legends; they rather exist as independent datasets. In general, heterogeneity in land cover maps results from different underlying methods and standards and has multiple facets. They include syntactic issues (e.g. logical data models: vector/raster), schematic heterogeneity (e.g. database models, spatial reference systems, cartographic standards including variable minimum mapping units and mixed units) and semantic aspects (Bishr, 1998). Different mapping methodolo- gies make it difcult to separate land changes themselves from changes that are result of a different www.elsevier.com/locate/jag International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 * Corresponding author. Tel.: +31 317 482430; fax: +31 317 482419. E-mail addresses: kathleen.neumann@wur.nl (K. Neumann), m.h@uni-jena.de (M. Herold), andrew.hartley@jrc.it (A. Hartley), c.schmullius@uni-jena.de (C. Schmullius). 0303-2434/$ see front matter # 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jag.2007.02.004 methodology used to create the map. Semantic incon- sistencies may be a problem for time series analysis of land cover or land use to monitor environmental change and for initiatives that aim to react to environmental change (Comber et al., 2004). The problem of semantic discrepancies has been addressed by several authors. While some of themseek for harmonization guidelines to facilitate future mapping efforts (Bennett, 2001; Jansen, 2005; Herold et al., 2006a) do others show how to tackle inconsistencies between already existing maps to make them comparable (Comber et al., 2005; Hagen, 2003; Visser, 2004). Since the 1980s, international efforts have been made aiming at harmonization of existing and future land cover datasets to support operational Earth observation of land with the goal to overcome current limitations of land cover dataset compatibility and comparability. One of the key drivers in the land cover harmonization process is the global observation of forest cover and land dynamics (GOFC-GOLD) as a platform for international communication and coordi- nation. GOFC-GOLD in cooperation with global terrestrial observing system (GTOS) develops and provides methodological and organizational resources for joint international progress in this arena (Herold and Schmullius, 2004). A major objective is to assist ongoing and upcoming mapping initiatives to foster consistent and comparable ways for creating land cover maps (Herold et al., 2006a). GLOBCOVER is such a new land cover initiative expected to evolve a new era for consistent global land cover assessment. Keys for GLOBCOVER success are the improved spatial data resolution of 300 m(compared to 1 kmin existing maps), the premise to the develop the map product based on thorough understanding of existing datasets, a harmonized legend development based on common land cover classiers, and the consideration of known challenges in coarse scale land cover mapping. This will improve the quality of the GLOBCOVER products and extend the potential eld of applications. From a European perspective, GLOBCOVER is expected to complement and extend the two major coarse scale land cover mapping efforts: CORINE and GLC2000. Both initiatives have followed rigorous internal mapping standards. For example, the CORINE is a regional programme and the legend reects thematic denitions and detail of land categorization of both land cover and land use characteristics. In contrary, the GLC2000 map covers the whole globe and uses the UN land cover classication system (LCCS) developed for describing land cover characteristics. The aim of GLOBCOVER being, as much as possible, interoperable to both existing map products is ambitious but has to start from a thorough understanding on the characteristics and inconsistencies of related land cover information. It has to be understood what are causes of the heterogeneities among existing datasets and which factors are responsible. This study focuses on compar- ison of the CORINE2000 and GLC2000 datasets. In this study, we describe their agreement, and assess factors driving the disagreement. The investigations consider several aspects: land cover heterogeneity and spatial resolution effects, semantic land cover denitions and thematic similarities, and known map accuracy mea- surements. They will be assessed in relation to the measured disagreement between GLC2000 and COR- INE2000 to gain understanding and to develop strategies for the GLOBCOVER. 2. Data and methods 2.1. CORINE2000 dataset The CORINE programme (Coordination of Informa- tion on the Environment) was established in 1985 by the European Commission. One of the major tasks has been the establishment of the CORINE Land Cover Project. The two main objectives are (1) to provide quantitative data on land cover, consistent and comparable across Europe for all interested in the European environmental policy, and (2) to prepare one comprehensive digital land cover database covering the 25 EU member states and other European and North African countries. The mapping is based on the CORINE nomenclature and interpretation methods at an original scale of 1:100,000. The nomenclature comprises 44 land cover classes on three levels at a minimum mapping unit of 25 ha. The datasets have been created under the responsibility of each EU member state by on-screen interpretation and digitizing of Landsat images in a GIS environment. The European wide product was produced by merging the consistent national products to one dataset (EEA, 2005b). Fig. 1 provides a detailed view of the CORINE2000 dataset for the Berlin region and shows the full legend for the third level. The rst CORINE1990 land cover dataset was released in 1999, an updated and extended version followed in 2000. The new CORINE2000 dataset, representing land cover for the period from 1999 to 2001 as well as land cover changes, has become available in 2005. The dataset with a spatial resolution of 250 m can be downloaded free of charge from http://dataservi- ce.eea.eu.int/dataservice/metadetails.asp?id=678 . K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 426 2.2. Global land cover of the year 2000 (GLC2000) The global land cover for the year 2000 (GLC2000) project, coordinated by the European joint research centre (JRC), provides consistent global land cover information for the year 2000. In contrast to former global mapping initiatives the GLC2000 map develop- ment followed a bottom up approach. Eighteen regional land cover map products were derived by regional experts and merged to a global map. To ensure consistency of all regional land cover classications, each regional product was derived by classifying the SPOT-4 VEGETATIONdataset. Further- more, each regional partner applied the land cover classication system (LCCS) produced by United Nations (UN). The global GLC2000 product was produced by harmonization and merging the individual regional products to one global product with general- ized legend (Bartholome and Belward, 2005). Different parts of Europe were covered by ve different regional map products with usually more detailed or regionally specic legends than the global one. Fig. 1 provides a detailed view of the GLC2000 dataset for the Berlin region together with the full legend (JRC, 2004). The regional and global GLC2000 land cover datasets are available free of charge from http://www-gvm.jrc.it/ glc2000/ProductGLC2000.htm. Since GLC2000 provides global coverage, some classes rarely appear for Europe. Nine of the 22 global classes in GLC2000 only represent 0.2% or less of the area covered by CORINE2000 (Table 1). They are considered to be of minor importance when global scale data are studied in the context of Europe and thus will be disregarded in this analysis. Furthermore, the class water bodies will be excluded. The dominance of the GLC2000 class Cultivated and managed areas is striking, throughout the continent of European. Forests are limited to broadleaf deciduous, evergreen needle-leaved and K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 427 Fig. 1. Detail of the CORINE2000 and GLC2000 datasets for the Berlin region with complete legends. Table 1 GLC2000 classes with less than 0.2%spatial coverage of the CORINE mapping area GLC2000 class Area (%) 1 Tree cover, broadleaved, evergreen 0.03 5 Tree cover, needle-leaved, deciduous 0.00 7 Tree cover, regularly ooded, fresh water 0.01 8 Tree cover, regularly ooded, saline water 0.00 9 Mosaic: tree cover/other natural vegetation 0.15 10 Burnt area 0.00 21 Snow and ice (natural and articial) 0.04 mixed wooded areas. Europe also shows a signicantly higher degree of urbanization than the rest of the world. 2.3. Land cover classication system (LCCS) The land cover classication system (LCCS), developed by the United Nations, provides an appro- priate framework for a land cover legend development and translation using exible but standardized set of classiers and thresholds. LCCS is an a priori classication system that is based on independent and universally valid land cover diagnostic criteria, rather than on predened specic land cover classes. It can be used to describe land cover features all over the world at any scale or level of detail with an absolute level of standardization of class denitions between different users (Di Gregorio, 2005). LCCS is currently evolving as an internationally agreed standard for land cover characterization (Herold et al., 2006b). As part of its harmonization activities, the ESA GOFC-GOLD land cover project ofce in cooperation with the JRCused LCCS as key tool for the comparative assessment of GLC2000 and CORINE2000. LCCS was used to reveal and explain thematic similarities and inconsistencies between the class denitions of both datasets. The CORINE legend was translated to LCCS and thus re-described the classes from a land cover perspective. Based on this translation the GLC2000 and CORINE class denitions could be directly compared in terms of their land cover descriptions. While some classes translate easily, others have complex land cover denitions with variety of different thematic and cartographic mixtures (Herold and Schmullius, 2004). This again emphasizes the CORINE mapping approach of visual satellite data interpretations (30 m 30 m pixel and 25 ha minimummapping unit) and delineating many categories with a thematic focus on land use. Land use categories usually aggregate from a mix of different land cover types. Furthermore, several CORINE classes contain specic extensions that add to the complexity of the LCCS-denitions and complicate the translation/comparison process. A key component in using LCCS is a set of common land cover classiers. Since it has been challenging to nd general agreement for dening land cover classes the approach is to agree on the terminology and common classiers rather than categories. The common classiers dened by LCCS and used in this study were: (1) Vegetation/non-vegetation. (2) Terrestrial/aquatic and regularly ooded. (3) Natural and seminatural/managed and articial. (4) Life form/surface type (trees, shrubs, herbaceous, bare, water, snow/ice). (5) Leaf type (broadleaf, needleleaf, mixed, none). (6) Vegetation density (015%, 1540%, 4065%, 65 100%, none). (7) Land use types (urban, agriculture, none). An additional classier not used is phenology or leaf longevity. This one was excluded since CORINE did not consider this information. Through translating both legends into LCCS, each classier can be determined for each class. This information is used to assess the thematic similarity between the different categories (Section 2.4.2). 2.4. Comparing GLC2000 and CORINE2000 The comparative assessment of GLC2000 and CORINE2000 focused on highlighting and explaining thematic and spatial differences between both datasets. Starting point was the determination of areas of spatial agreement between GLC2000 and respective CORINE classes. Given the LCCS legend translation, a crosswalk between the CORINE and GLC2000 classes was developed (Table 2). This comparison does not imply a direct correspondence between the classes. It rather considers common thematic agreement and similarity among the categories, i.e. it provides the obvious choice if a GLC2000 category is to be related to a specic CORINE class. In other words, the correspondence of each GLC2000 class to a certain CORINE2000 class is solely based on the class denitions. Table 2 considers both class agreement and similarity with ladder one referring to signicant thematic overlap between two classes. For a more enhanced comparison three different measures are introduced in this study: correspondence based on denition, thematic similarities between all land cover classes, and a confusion matrix. An important issue is the understanding of the differences between both datasets. Both GLC2000 and CORINE2000 are based on satellite images. These images, however, were received from different sensors (CORINE2000: Landsat ETM+; GLC2000: Spot Vegetation) which were differently calibrated and which detect the spectral characteristics of the Earth surface using different wavelengths. Both satellite data have different spatial resolutions which naturally inuence the presentation of spatial detail, especially in areas with heterogeneous land cover types such as CORINE classes 324 (Transitional woodland-scrub), 241 (Annual crops associated with permanent crops), K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 428 K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 429 Table 2 Summary of correspondence of GLC2000 classes to CORINE classes GLC2000 global class CORINE class (classes in italics show similarity not agreement) 1: Tree cover, broadleaved evergreen, closed to open (>15%) 311 broad-leaved forests 2: Tree cover, broadleaved deciduous, closed (>40%) 3: Tree cover, broadleaved deciduous, open (1540%) 4: Tree cover, needleleaved evergreen, closed to open (>15%) 312 coniferous forests 5: Tree cover, needleleaved decidous, closed to open (>15%) 6: Tree cover, mixed leaftype, closed to open (>15%) 313 mixed forests 7: Tree cover, closed to open (>15%), regularly ooded, fresh or brackish: swamp forests 31 forests 411 inland marshes 8: Tree cover, closed to open (>15%), regularly ooded, saline water: mangrove forests 31 forests 9: Mosaic of tree cover and other natural vegetation (incl. crop component) 324 transitional woodland-scrub 31 forests 10: Tree cover, burnt (boreal forests) 334 burnt areas 11: Shrubcover, closed to open (>15%), evergreen (broadleaved or needleleaved) 322 moors and heathland 323 sclerophyllous vegetation 324 transitional woodland-scrub 12: Shrubcover, closed to open (>15%), deciduous (broadleaved) 322 moors and heathland 324 transitional woodland-scrub 13: Herbaceous cover, closed to open (>15%) 231 pastures 321 natural grasslands 14: Sparse herbaceous or shrubcover (015%) 333 sparsely vegetated areas 322 moors and heathland 332 bare rocks 15: Regularly ooded (>2 month) Shrub and/or Herbaceous cover, closed to open 411 inland marshes 412 peat bogs 421 salt marshes 16: Cultivated and managed areas 21 arable land 22 permanent crops 241 annual crops associated with permanent crops 242 complex agricultural pattern 244 agro-forestry areas 231 pastures 17: Mosaic of cropland/tree cover/other natural vegetation 243 land principally occupied by agriculture, with signicant areas of natural vegetation 231 pastures 18: Mosaic of cropland/ shrub cover/herbaceous cover 243 land principally occupied by agriculture, with signicant areas of natural vegetation 231 pastures 19: Bare areas 331 beaches, dunes, and sand plains 332 bare rocks 333 sparsely vegetated areas 20: Water bodies (natural and articial) 5 water bodies 423 intertidal ats 21: Snow and ice (natural and articial) 335 glaciers and perpetual snow 22: Articial surfaces and associated areas 1 articial surfaces 422 salines Corresponding CORINE classes in standard letters indicate an agreement, classes marked in italics indicate a similarity (see also Table 3). 242 (Complex agricultural pattern), and 243 (Land principally occupied by agriculture, with signicant areas of natural vegetation). The spatial resolution of the both satellite image sources differs signicantly. Spot vegetation provides data with a resolution of 1 km whereas the 30 m resolution of Landsat allows for the detection of more spatial details (EEA, 2005a; JRC, 2004). Different interpretation methods, such as manual interpretation of the Landsat data versus an (semi-) automatic classica- tion of the Spot data, as well as the use of different classication systems lead to further differences between the GLC2000 and CORINE2000 maps. Because of these crucial differences the classes of the two databases cannot fully match. The following sections will explore in more detail the reasons for the differences between both datasets. 2.4.1. Spatial agreement between GLC2000 and CORINE2000 The spatial agreement between GLC2000 and CORINE2000 classes was determined through a spatial overlay, at 250 m pixel resolution, resulting in a confusion matrix labelling classes as agreement, similarity and disagreement (Tables 2 and 3). Class labelling was performed subjectively, based on the comparison of the descriptors of the two legends. The matrix reects the area percentage of each GLC2000 class covered by the individual CORINE categories. For example, the GLC2000 class Articial surfaces can be assigned to the CORINE class group Articial surfaces, consisting of eleven subclasses. The agree- ment between these GLC2000/CORINE classes is 70.5% (Table 3). Thus, more than two-thirds of the GLC2000 class Articial surfaces overlap with the respective CORINE class Built up area. The remaining 29% of the GLC2000 class Articial surfaces cover other CORINE classes, for example Non-irrigated arable land (Table 3). The total agreement between GLC2000 and CORINE2000 is 57%. 2.4.2. Thematic similarities between GLC2000 and CORINE classes A quantitative approach to dene afnities between both legends can be provided based on LCCS translations. The thematic similarities were dened for each GLC2000/CORINE land cover class combina- tion considering the above given seven LCCS classi- ers. Thematic similarities were dened in addition to correspondence (Table 2) since not each corresponding GLC2000/CORINE2000 class necessarily has to have a high thematic similarity or contrary. A similarity matrix for each classier is derived for every individual of the seven common LCCS classier used in this study (see Section 2.3). Each classier matrix contains three values, zero for no agreement, 0.5 for partial agreement, and 1 for full agreement. For the case of the classier leaf type, if both classes strictly have the same leaf type (e.g. broadleaf), the score is one, a mixed leaf type would get a score of 0.5 and a needle- leaf category and all that do not have broadleaf character would receive a score of zero. The overall thematic similarity is the sum of the matrices for all classiers (vegetation/non-vegetation, terrestrial/aqua- tic and regularly ooded, natural and semi-natural/ managed and articial, life form/surface type, leaf type, vegetation density, land use types) divided by the number of classiers (seven). Thus, thematic similarity can range between 0 (=absolute disagreement) and 1 (=complete similarity) and the similarity matrix is shown in Table 4. Table 4 emphasizes similarity scores that except for some of the urban classes there is no complete agreement between the classes of both datasets. This emphasizes the difference in thematic denitions for both maps. Classes with a similarity score of zero do not agree in any of the studied classiers, such as GLC2000 bare areas versus CORINE rice elds. As expected, there is a general thematic afnity between the agreeing classes (Tables 2 and 3) and the thematic similarity scores (Table 4), i.e. CORINE class and a corresponding GLC2000 class usually share the largest amount of thematic similarity. The most prominent exception is CORINE class green urban areas which, in terms of thematic agreement, corresponds to maximum 64% with a number of GLC2000 classes although the class corresponds to the GLC2000 class Articial surfaces and associated areas. Further investigations used an aggregate measure of similarity for each GLC2000 category. In general, it would be important to study the amount of thematic agreement among corresponding classes (Table 4). For a better understanding of the spatial disagreement between two datasets we investigate the thematic similarity with classes that do not agree. The median value of thematic similarity of all CORINE classes that do not correspond to a specic GLC2000 class were calculated to represent the amount of thematic confusion. For example, GLC2000 class 2 (tree cover, broadleaved, deciduous, dense) corresponds to one specic CORINE class (broadleaf forest) with a thematic similarity of 0.96. The median of the similarity values of all other (non-corresponding) CORINE K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 430 K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 431 Table 3 Confusion matrix between GLC2000 and CORINE2000 categories considered in this study The individual values are in area percent for each GLC2000 class. Fields with green background show class agreement, orange background shows class similarity according to Table 2. A pink background highlights class combinations with no agreement but a large amount of confusion, i.e. spatial agreement with a land class not comparable in terms of their thematic class similarities (10%). K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 432 Table 4 Thematic similarity matrix between GLC2000 and CORINE2000 categories considered in this study The individual values are calculated from an aggregate value of seven common land cover classiers determined in LCCS. Fields with bold border show agreement classes according to Table 3. classes is 0.54 and is considered as aggregate measures of the confusion of this class from a semantic perspective. The semantic confusion for GLC2000 class 22 (articial surfaces and associated areas) is almost half as much as for the GLC2000 classes 11 (herbaceous cover, closed-open) and 13 (shrub cover, closed-open, evergreen). While the spatial agreement between GLC2000 and CORINE2000 considers spatial aspects, the thematic similarities do not consider the spatial distribution of corresponding classes. The values for each class are presented in the Table 5. 2.4.3. Spatial homogeneity Local neighbourhood analyses are a valuable tool to determine the heterogeneity of a dataset through analysing pixel variability with respect to their neighbourhood. Such analyses were performed on the GLC2000 dataset to show the land cover classes that are spatially more homogeneously structured than others. The spatial heterogeneity or local class diversity was calculated using a local 3 3 kernel. The heterogeneity value can range between one, i.e. none of the eight neighbour pixels differs from the core pixel (=max- imum local homogeneity) and eight, i.e. all neighbour pixels differ from the core pixel (=maximum local heterogeneity). This information is used to show whether the spatial agreement between GLC2000 and CORINE2000 depends on the heterogeneity of the land cover classes. For this, the percentage of the class pixels located in homogenous areas (local homogeneity values = 1) was calculated (Table 4). Classes represent- ing a homogenous pattern are bare areas, sparsely vegetated areas, and cultivated and managed areas. Most heterogeneous are the mixed agricultural classes and open broadleaf deciduous forests. 2.4.4. Classication accuracy of GLC2000 The validation of GLC2000 has provided statisti- cally robust accuracy information for the global classes. Validation results show an overall accuracy of 68.6% with a very low accuracy for the mosaic classes (Mayaux et al., 2006). The GLC2000 confusion matrix was analysed on the producers accuracies of each GLC2000 land cover class. The producers accuracy was derived by dividing the total number of correct sample units in a land cover class by the total number of sample units of this class from the reference data. The producers accuracy therefore indicates the probability of a reference sample units as being correctly classied (Congalton, 2003), and provides a good indicator of general mapping performance for each class fromthe map producer perspective. Both the producers and users accuracy are reported in Table 5. Three classes do have no robust class specic accuracies reported for them (Mayaux et al., 2006). It should be noted that the GLC2000 accuracy estimates were derived for the global dataset. Perhaps, the specic mapping errors may differ if only Europe is considered. However, these are only statistically robust validation data available. K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 433 Table 5 Characteristics for each GLC2000 class considered in this study GLC 2000 class code Spatial agreement (%) Class area (%) Thematic similarity Spatial homogeneity (%) Semantic confusion (median) Accuracy Producers (%) Users (%) Tree cover, broadleaved, deciduous, dense 2 39.2 9.35 0.50 32.8 0.54 96.7 35.4 Tree cover, broadleaved, deciduous, open 3 26.1 0.45 0.50 2.7 0.50 15.2 19.8 Tree cover, needle-leaved, evergreen 4 53.8 20.01 0.43 43.1 0.43 92.8 47.0 Tree cover, mixed leaf type 6 17.8 9.80 0.50 14.4 0.50 37.1 94.0 Shrub cover, closed-open, evergreen 11 32.6 0.52 0.46 31.4 0.57 Shrub cover, closed-open, deciduous 12 25.8 5.20 0.48 38.3 0.50 25.4 47.1 Herbaceous cover, closed-open 13 49.4 7.95 0.57 27.9 0.57 45.9 33.4 Sparse herbaceous or sparse shrub cover 14 46.6 1.77 0.46 46.3 0.46 62.0 50.5 Regularly ooded shrub and/or herbaceous cover 15 23.0 1.34 0.43 10.3 0.5 35.9 77.1 Cultivated and managed areas 16 76.9 38.20 0.21 58.3 0.46 76.5 73.0 Mosaic: cropland/tree cover/other natural vegetation 17 11.0 1.65 0.54 0.5 0.50 38.8 81.5 Mosaic: cropland/shrub and/or grass cover 18 47.6 1.70 0.50 11.3 0.50 Bare areas 19 60.4 0.35 0.43 80.4 0.43 95.2 93.8 Articial surfaces and associated areas 22 70.5 1.49 0.29 27.4 0.29 2.4.5. Statistical analysis Regression analyses were performed to assess the contribution of each factor (spatial heterogeneity, thematic similarity, mapping accuracy) to the amount of spatial agreement between GLC2000 and COR- INE2000. Bivariate linear regression helped to inves- tigate the strength of the relationship between the amount of agreement and each individual factor shown in Table 5. Before applying multivariate regression models, the joint principal components of four factors: thematic similarity, spatial heterogeneity, Producers and Users accuracy were derived to avoid correlation between the regression predictor variables. Principal components one and two explained more than 88% of the variance of the variables and they were used in multiple linear regression analysis. 3. Results 3.1. Spatial agreement between GLC2000 and CORINE2000 Recognizing the amount of spatial agreement (Table 3), all areas dened as urban areas in CORINE can be aggregated to GLC2000 class articial surfaces. Half of this category corresponds to CORINE class 112 (discontinuous urban fabric). This CORINE category is also confused with other non-urban GLC2000 classes showing that a fair amount of CORINE urban land is not represented as such in GLC2000. Perhaps smaller settlements or the rural/ urban interface zones considered with CORINE are not reected in the coarser scale GLC2000 map. There are also is some amount of CORINE agriculture and forest classes that are committed to the GLC2000 articial surfaces. In general, the mapping of urban areas on coarse scales is known to be a challenging task due to their small fragmented spatial extent, the spectral heterogeneity of the urban environment, and the challenges in discriminating urban and rural land. The detailed CORINE agriculture land use cate- gories basically aggregate to one major agriculture GLC2000 category (class 16). CORINE non-irrigated arable land (class 211), despite some agreement with the corresponding GLC2000 class, shows signicant confusion with several GLC2000 categories, i.e. herbaceous cover, sparse vegetation, and the agriculture mosaic categories. The latter classes may indicate spatial resolution effects, where agricultural areas may appear as mixed units in coarser resolution datasets. Both mosaic classes generally correspond to CORINE class Land principally occupied by agriculture, with signicant areas of natural vegetation. However this category confuses with a whole range of different GLC2000 categories and the overall amount of agreement is very lowfor most of the mixed agricultural classes. This also includes the CORINE class pasture. This class allows up to 50% tree cover (wooded meadows) and thus mixes with GLC2000 forests and agriculture mosaics. Furthermore, GLC2000 does not consider any land use/management practices within pastures or grasslands and therefore, no clear dis- crimination between these two classes is possible in the context of CORINE. The CORINE classes with a strong woody vegetation component (311324) heavily confuse with different GLC2000 classes. This may reect the different crown cover densitythresholds for forest denitions (GLC2000: 15%and CORINE: 30%) and the fact that CORINEdoes not specically contain a distinct shrubland category. Prominent disagreement between such classes are GLC2000 class 3 (open broadleaf deciduous forest) and CORINE class 322 (moors and heathland). Also, parts of the CORINE woody vegetation classes (311 324) are assigned to the GLC2000 herbaceous categories (classes 1315) and Cultivated and managed areas. There is limited overlap between the wetland areas indicated in CORINE and the respective wetland areas of GLC2000. In particular peat bogs seemed to be mixed with a variety of GLC2000 categories. Bare areas and sparsely vegetated areas are highly confused with each other. Many sparsely vegetated areas in CORINE appear as bare areas in GLC2000 and vice versa. 3.2. Key drivers for the spatial (dis)agreement Considering the limited amount of spatial agreement between both datasets, the emerging question refers to the factors driving the (dis)agreement. This study has considered several of these factors. Thematic similarity, spatial heterogeneity, and classication accuracy of GLC2000 were analysed with respect to their inuence on the spatial agreement between GLC2000 and CORINE2000 (Tables 5 and 6). Obviously, all of the considered factors explain some amount of agreement between both datasets, except the users accuracy. The spatial heterogeneity affects the joint dataset disagreement with more homogenous classes showing more agreement. This factor strongly reects the issue of spatial resolution which is different in both datasets. Mapping heterogeneous landscapes is strongly dependent on the minimum mapping unit (Smith et al., 2003) and thus drives the heterogeneity between GLC2000 and CORINE. The thematic K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 434 similarity measure describes the afnity of each GLC2000 class with the non-agreeing CORINE classes (Table 4), i.e. the lower the similarity with dissimilar categories the higher the amount of agreement. Urban areas and cultivated and managed areas show the lowest thematic similarity values and the largest amount of agreement. Both GLC2000 classes comprise of a number of aggregated CORINE classes and are thematically rather distinct. The other GLC2000 categories are thematically more similar to other CORINE categories. One noticeable exception is GLC2000 class 13 (herbaceous cover, closed-open). This class indicates the highest thematic similarity value with a rather high amount of agreement compared to the general trend. The third factor (GLC2000 producers accuracy) highlights a direct relationship with the GLC2000/CORINE agreement. The more accurate the class was mapped from a producer perspective the higher the amount of agreement with CORINE. No linear trend is found between the GLC2000 users accuracy and the class agreement values. The R 2 values for the signicant regressions are rather similar around 50% of explained variance. This suggests that all factors have some amount of contribution for the disagreement and no single factor solely contributes to the observed disagreement. To prove whether rather the interactions of multiple factors may be responsible for the disagreements a principal component analysis and a multivariate regression was performed. Principal components one and two explained more than 88% of the variance of these variables and they were used in multiple linear regression analysis. Principal component one reects the information from the producers accuracy (R = 0.75), spatial heterogeneity (R = 0.60), and the thematic similarity (R = 0.30). The loadings of the rst principal component suggest that the spatial heterogeneity and producers accuracy maybe more important than the thematic similarity or at least have a somewhat similar though negatively correlated infor- mation. The variance of the thematic similarity one strongly depends on two classes (urban and agriculture) and may be of less importance for other categories. The second component two represents the user accuracy (R = 0.97). The regression results of the joint contribution of all factors used the rst two principal components of all four factors is shown in Table 6. This multiple regression model using both principal components explains the majority of the agreement between both datasets (R 2 = 0.81, R = 0.90). Both principal compo- nents have signicant contributions in this regression model with the rst component being signicant at 99% condence level and the second component at 95% condence level. Thus, the considered factors are responsible for most of the disagreement between GLC2000 and CORINE2000. 3.3. Implications for GLOBCOVER The comparative assessment of GLC2000 and CORINE2000 highlights several reasons for the differences between both datasets. The landscape heterogeneity, which strongly relates to the spatial resolution, signicantly inuences the spatial agree- ment between both datasets. Since GLOBCOVER is based on 300 m MERIS data, this reason for disagree- ment should be reduced if the goal is the integration with CORINE data. However, it has to be considered that a ner spatial resolution has a direct impact on the class denitions, in particular for mixed unit classes. One of the main issues remaining for GLOBCOVER development is mapping accuracy. This study has emphasized that mapping accuracy is one of the main drivers of the disagreement between coarse scale land K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 435 Table 6 Regression results explaining the relationships between the agreement of GLC2000/CORINE2000 and different factors that affect the agreement presented in Table 5 Independent variable to predict agreement Correlation with amount of spatial agreement (r 2 ) T-test Pr(>jtj) Spatial heterogeneity 0.46 3.19** 0.0078 Thematic similarity 0.50 3.49** 0.0044 Producers accuracy (GLC2000) 0.50 3.00** 0.0149 Users accuracy (GLC2000) Multiple regression PC1 0.81 5.32** 0.0007 PC2 2.44* 0.0404 Note: **shows a dependence at 99% condence interval and *shows a dependence at 95% condence interval. cover datasets. This has been noted before in the context of land change assessment (Townshend and Justice, 2002), and points at the need for the most accurate mapping approaches accompanied by comprehensive and comparative validation efforts (Herold et al., 2006a). GLOBCOVER has to consider known chal- lenges in deriving specic land cover classes on global scales. Given the GLC2000 experience, mixed unit and mosaic classes, shrublands, herbaceous covers and wetland classes are among the ones usually derived with rather low mapping quality. GLOBCOVER should not start from scratch and take advantage of the comparison of existing global land cover products. If there is general agreement between different products it is certain that these areas represent known land cover characteristics. Mapping error may be reduced if such an approach is used. In terms of thematic denitions, the results indicated a clear need for harmonizing existing denitions. So far, all CORINE categories have been translated into the LCCS language used by GLC2000. The use of LCCS and a set of common land cover classiers have to be adopted for the development and interoperability of the GLOBCOVER product. Given the thematic translation, CORINE contains detailed land use categorizations for articial surfaces and agricultural areas in several categories. They aggregate to land cover types in GLC2000: cultivated and managed areas (all types of crop agriculture), and articial surfaces and associated areas (most types of urban land uses). Both classes indicated individual thematic similarity character and should be treated as such in the mapping process. One problematic class is pasture, which is an agricultural category in CORINE but refers to a land cover class in GLC2000. Further problematic issues for thematic denitions are the different crown cover densities used to discriminate forests from other vegetation types (GLC2000-15% and CORINE-30%). This may be one of the reasons for signicant confusion between different GLC2000 and CORINE woody vegetation categories. The integration of additional coarse scale mapping information (e.g. forest continuous elds) might be useful in this context. Also, CORINE has no specic class for shrubs but some CORINE categories might correspond to the two GLC2000 Shrub Cover classes; this issue needs to be resolved. Furthermore, the vegetation density threshold separating bare and sparsely vegetated areas is different for both datasets and thus resulted in signicant disagreements. An adjustment of density thresholds is essential if CORINE and GLOBCOVER data shall become comparable. 4. Conclusions The study has emphasized the heterogeneity between the GLC2000 and CORINE2000 datasets and the driving factors of disagreement. In the presented study thematic similarities, the spatial heterogeneity and the classication accuracy of GLC2000 were analysed to assess the comparability of both datasets. It has been statistically proven that not a sole factor but rather a joint contribution of all of them are responsible for the observed disagreements between both land cover maps. Inferentially, to make a robust map comparison all these drivers need to be considered. Taking into account that land cover and land use maps are developed for a specic purpose, based on different (remote sensing) data and classica- tion methodologies it would be hindering to dene a rigid set of parameters just to allow a map comparison by default. Flexible standardisation, which allows on the one hand a free map generation according to an individual mapping request and on the other hand a full comparability with other land cover datasets, is increasingly demanded. LCCS is a promising tool to facilitate this development. Finally, all mapping efforts have to be accompanied by robust and comparative accuracy assessment with existing datasets to improve their inter-comparison. Linking the new GLOBCOVER with GLC2000 and CORINE2000 seems challenging and has to consider their specic characteristics and denitions. Never- theless, there is potential for interoperable development of GLOBCOVER. The ner spatial resolution of 300 m is expected to reduce some of the disagreement. There is a strong need for harmonized land cover denitions and their accurate mapping points at generic land cover denitions (common LCCS classiers) as common ground and for exible map product generation. GLOBCOVER should consider known land cover characteristics worldwide available from harmonizing existing land cover datasets. References Bartholome, E., Belward, A.S., 2005. GLC2000: a new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 26 (9), 19591977. Bennett, B., 2001. What is a Forest? On the vagueness of certain geographic concepts. Topoi 20, 189201. Bishr, Y., 1998. Overcoming the semantic and other barriers to GIS interoperability. Int. J. Geograph. Inform. Sci. 12 (4), 299314. Comber, A., Fisher, P., Wadsworth, R., 2004. Integrating land-cover data with different ontologies: identifying change from inconsis- tency. Int. J. Geograph. Inform. Sci. 18 (7), 691708. K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 436 Comber, A., Fisher, P., Wadsworth, R., 2005. What is land cover? Environ. Plan. B: Plan. Design 32, 199209. Congalton, R.G., 2003. Putting the map back in map accuracy assessment. In: Lunetta, R.S., Lyon, J.G. (Eds.), Remote Sensing and GIS Accuracy Assessment. CRC Press, Boca Raton, FL, pp. 111. Di Gregorio, A., 2005. Land Cover Classication System (LCCS): Classication Concepts and User Manual. FAO, Italy. EEA, 2005a. IMAGE2000 and CLC2000. Products and Methods CORINE Land Cover Updating for the Year 2000. Ispra, Italy. EEA, EEA data service, Corine land cover 2000 vector by country (CLC2000): http://dataservice.eea.eu.int/dataservice/metadetail- s.asp?id=667, 2005b. Hagen, A., 2003. Fuzzy set approach to assessing similarities of categorical maps. Int. J. Geograph. Inform. Sci. 17 (3), 235249. Herold, M., Schmullius, C., 2004. Report on Harmonization of Global and Regional Land Cover Products, Workshop report at FAO, Rome, Italy, 14-16 July 2004, GOFC-GOLD report series 20. URL: http://www.fao.org/gtos/gofc-gold/series.html. Herold, M., Woodcock, C., Di Gregorio, A., Mayaux, P., Belward, A., Latham, J., Schmullius, C.C., 2006a. A joint initiative for harmo- nization and validation of land cover datasets. IEEE Trans. Geosci. Remote Sens. 44 (7), 17191727. Herold, M., Latham, J.S., Di Gregorio, A., Schmullius, C.C., 2006b. Evolving standards on land cover characterization. J. Land Use Sci. 1 (24), 157168. Jansen, L.J.M., 2005. Harmonisation of land-use class sets to facilitate compatibility and comparability of data across space and time. In: Proceedings of the 12th CEReS International Symposium, Chiba, Japan. JRC, GLC2000 homepage: http://www-gvm.jrc.it/glc2000/default GLC2000.htm, 2004. Mayaux, P., Strahler, A., Eva, H., Herold, M., Shefali, A., Naumov, S., Dorado, A., Di Bella, C., Johansson, D., Ordoyne, C., Kopin, I., Boschetti, L., Belward, A., 2006. Validation of the global land cover 2000 map. IEEE Trans. Geosci. Remote Sens. 44 (7), 1728 1739. Smith, J.H., Wickham, J.D., Stehman, S.V., Yang, L., 2003. Effects of landscape characteristics on land-cover class accuracy. Remote Sens. Environ. 84 (3), 342349. Townshend, J.R.G., Justice, C.O., 2002. Towards operational mon- itoring of terrestrial systems by moderate-resolution remote sen- sing. Remote Sens. Environ. 83 (12), 351359. Visser, H., (Editor) 2004. The Map Comparison Kit: methods, soft- ware and applications. RIVM report 550002005/2004, Bilthoven, The Netherlands. K. Neumann et al. / International Journal of Applied Earth Observation and Geoinformation 9 (2007) 425437 437