On-going climate changes, especially their extreme manifestations, such as heat waves, cold periods, heavy rains or snowfalls, storms, floods or droughts, have an increasing impact on economic, political and social processes (IPCC 2012; IPCC 2013; Sillmann et al. 2014). Reliable assessments of their trends and impacts on these processes are critical for the development of adequate local and/or regional strategies for adapting and mitigating negative effects of climate change, for example, for sustainable agriculture and forestry, or planned infrastructure. But such assessments are still missing for various parts of the world. This circumstance is an essential driver for the development of climatic characteristics’ monitoring and climate modeling to assess possible future trends. Local and remote observations, as well as numerical modeling of climatic processes, resulted in an unprecedented growth of data archives. For example, the Copernicus programme, the European Union’s flagship programme on monitoring the Earth’s environment using satellite and in-situ observations, anticipates a massive increase in satellite data volume. It is estimated that solely the Sentinel missions, Copernicus’ space component, will produce 4 TB of processed data each day (Copernicus 2016). The volume of climate simulation experiments, which formed the basis of the 5th Assessment Report of the Intergovernmental Panel on Climate Change (IPCC), reaches 2 PB (WCRP 2017). The European Centre for Medium-Range Weather Forecasts (ECMWF) archive of meteorological data currently holds more than 90 PB of data and continues to grow by additional 3 PB every month.
The advent of the big data era forced the experts to think about the development of tools for transformation of raw data into information and knowledge (Rowley 2007). At present, much more efforts are spent on data managing and pre-processing than those devoted to actual data evaluation. Such an increase in data archive volume makes using the traditional approach to climate information analysis doubtful, and requires new approaches based on distributed networks and usage of modern information technologies. The fact that emergence of large data archives would create problems for the analysis of current and future environmental processes was apparent by the end of the last century. As a result, several large programs have been launched in the world to solve them. For example, in the USA, efforts in this direction are united by the general concept of Cyberinfrastructure, which, according to National Science Foundation (NSF), is an information-computational infrastructure that supports distributed science (e-Science), including data, people, computers, and exploits Internet technology (Web 2.0, Grid, etc).
Currently, the main efforts and resources in the world are focused on creating a sustainable distributed cyberinfrastructure for open, permanent, reliable and secure access to high quality Earth observation data and corresponding metadata. Data Observation Network for Earth is one of such projects (https://www.dataone.org/). It is the basis for combining multiple federal network repositories to provide search, retrieval and uniform access to the data. Within the framework of international cooperation Earth System Grid Federation (ESGF) portals have been created to integrate geographically distributed scientific datasets. As a part of this collaboration, Earth System Grid (ESG) is developed to facilitate the analysis of global climate change. It provides access to climate projection data. In particular, the historical climatic data and modeling results for climate scenarios, drawn up while preparing the recent IPCC report, were disseminated through ESGF. Currently, about 2 PB of data are stored in ESGF nodes distributed around the world. The next step required is to convert generic climate analysis tools into services (Schnase et al. 2014). For example, the Climate Analytics-As-a-Service (CAaaS) approach combines high-performance computing and data analytics with scalable data management, cloud virtualization, adaptive analytics, and Application Programming Interfaces (APIs) related to subject areas to improve access to large climate data collections. Several years ago World Meteorological Organization (WMO) initiated a large program for development of the climate service system based on thematic web services (http://gfcs-climate.org/), intended for Supporting Climate Change Adaptation (WMO 2016).
According to Candela, Castelli & Pagano (2013), all these efforts can be described as a development for climate domain of a Virtual Research Environment (VRE) which is a system with following major features: (i) it is a web-based working environment; (ii) it is tailored to serve the needs of a targeted community; (iii) it is expected to provide a community with the whole array of products needed to accomplish the community’s goal(s); and (iv) it promotes sharing of research results.
One of the leading to VRE development directions that emerged as a response of the challenge of big data and their intensive use, was the creation of thematic computing systems accessed via web portals (Gordov & Fazliev 2004). The main idea of this approach is to develop a system that comprises data, tools for processing and visualization of results, computing resources to be used by researchers to perform selected data analysis, and to provide Internet access to the system using a unified user-friendly interface. An example for atmosphere domain is the web portal ATMOS (Gordov, Lykosov & Fazliev 2006). The fact that climate science deals with georeferenced data requires usage of a combination of web and GIS techniques. As a response, the web GIS platform ‘Climate’ (http://climate.scert.ru/) was developed (Gordov et al. 2016; Okladnikov et al. 2015). It is a free, cross-platform, composite application with an open-source computational backend that complies with common data format conventions for climate data and provides functionality of common desktop software in a window of an Internet browser. Forming thematic VRE web GIS ‘Climate’ is providing interdisciplinary distributed research groups of non-experts in information technologies (climatologists, ecologists, biologists, and decision makers) with easy-accessible reliable online tools for rapid analysis and visualization of multidimensional heterogeneous climatological datasets obtained from various sources.
However, the reliable analysis of climate changes, and nature and society’s responses them require skills in dealing with big datasets, abilities to interact with powerful computing resources and complex numerical models, knowledge of modern methods of statistical analysis, and usage of high-level programming languages. The skills mentioned are not typical for specialists in the field of economic, political and social sciences, and unfortunately, it is completely uncharacteristic for decision-makers. It should be added that each thematic domain has its own terminology, sets of data with their standards and models (Lutz et al. 2009). Therefore, an integration of all those in the Internet-accessible research environment to provide specialists and decision-makers with reliable tools for studying economic, political and social consequences of climate change should be done. Moreover, to develop scientifically based foundation to climate change adaptation/mitigation practical measures, one must deal with three types of domains, which are climatology, applied domains and related decision support systems (DSS).
General approach to this type of transdisciplinary problem should rely upon a VRE with three layers of data processing services, which are Data, Information and Knowledge (De Roure, Jennings & Shadbolt 2001). Here Data is understood as uninterpreted bits and bytes of raw data, Information is data equipped with meaning and Knowledge means information applied to achieve a goal, solve a problem or enact a decision. We refer to desired result of such integration as a web-GIS platform ‘Climate+’. The ‘Climate+’ platform should integrate three sets of thematic domains, which are climatic and meteorological processes, applied impact problems using calculated characteristics of climatic and meteorological processes and decision support systems using applied problem solutions. The important prerequisite required for such integration is an ontology description, since each domain type has its own knowledge representation and integration of those might be controlled on the base of their ontology description. Role and necessity of ontologies usage in geophysical sciences have been demonstrated in the papers of Athanasis et al. (2009), Bogdanović, Stanimirović & Stoimenov (2015), Brodaric, Fox & McGuinness (2009), Husain et al. (2011) and Lutz et al. (2009).
In this paper we describe first steps in development of the ‘Climate+’ platform, namely applications and ontologies that refer to three layers (Data, Metadata (Information) and Ontology (Knowledge)) of data processing services of the VRE under development. Since the ‘Climate’ platform forms the backbone of the ‘Climate+’, firstly we present it in more detail and describe its architecture, major components, and workflow. Then we describe the Ontology layer and developed ‘Climate+’ platform Applied Ontologies. Special attention will be paid to semantic heterogeneity, solving and formalization of thematic domains related with applied tasks, in particular, a solution of a reduction problem. In the conclusion next steps of the platform development are discussed.
Some examples of successful applications of the web GIS platform ‘Climate’ in studies of ongoing Siberia climate change and its impact can be found in the papers of Riazanova et al. (2016), Ryazanova & Voropay (2017) and Shulgina, Gordov & Genina (2011).
The ‘Climate’ platform architecture is shown on Figure 1. It represents a typical client-server structure, where in general case the server might be a set of geographically distributed standalone nodes providing common (federated) interface (API), and client applications (basically, Web-GIS client).
The server part of the architecture includes a high-performance computing system with a data storage attached. It is presented by two tiers:
- resources tier, including data and metadata;
- server applications (middleware) tier.
The client part of the architecture is based on modern graphical web browser. It is presented by a single ‘Client applications’ tier, respectively.
The ‘Resources’ tier of the platform ‘Climate’ employs two basic layers, which are Data and Metadata layers. The data layer contains datasets, located on the data storage system either in the form of collections of netCDF files or PostGIS databases. Metadata layer is presented by the Metadata database (MDDB), which describes geospatial datasets and their processing routines, and provides effective system functioning (Okladnikov, Gordov & Titov 2016). The database contains structured spatial and temporal characteristics of available geospatial datasets, their locations, and configurations of software components for data analysis. Its structure has the ‘Dataset’ and ‘Dataset collection’ levels. A dataset is defined as a set of data which is a) given on a single temporal and spatial grid; b) covers the same time range; and c) obtained under the same scenario of simulation. According to the chosen data storage model (Okladnikov, Gordov & Titov 2016), spatial datasets are mostly represented by collections of netCDF files grouped by spatio-temporal features and placed in the hierarchy of directories on data storage systems. Each netCDF file stores one or more variables containing values of meteorological parameters on a given spatio-temporal domain. The files in a dataset are usually named according to the same pattern to provide their automatic search. Along with data variables, netCDF files contain horizontal, vertical and time domain grids. Dataset collection is defined as a collection of datasets created by an organization in the framework of the same research project but specified on different spatial and/or temporal grids, or for different scenarios. The collection may consist of one dataset.
There are two major parts of MDDB. The first part contains description of all available for analysis datasets: spatio-temporal domain, lists of meteorological parameters and locations of data files on storage system. It is used to search data files and to provide metadata on requests from the backend. The second part contains description of processing routines represented by various pipelined call sequences of dedicated computational modules and their configuration options. Some data analysis routines might be applied to specific meteorological parameters only. Thus the connections between computing modules and data arrays are set in the MDDB. Some tables in the MDDB contain multi-lingual descriptions of datasets and processing routines in a human-readable form, and are used for filling elements of the graphical user interface. This provides rapid actualization of available datasets, parameters and processing routines lists in the graphical user interface just after their integration into the MDDB.
The ‘Server applications’ middleware tier consists of two basic software components: computational backend and geoportal.
The computational backend contains data processing and visualization software components. The data processing is a key software component containing computational modules based on GNU Data Language (GDL, http://gnudatalanguage.sourceforge.net/) and Python and providing integral geospatial data statistical processing as well as API to work with netCDF, Hierarchical Data Form (HDF), ESRI Shapefile data files and PostGIS databases. Depending on the result type required visualization component of the backend generates files in the following formats: GeoTIFF, ESRI Shapefile, Encapsulated PostScript, CSV, XML, netCDF, float GeoTIFF.
Spatial Data Infrastructure (SDI) geoportal contains two basic components: web portal and Geoserver (http://geoserver.org). Geoserver provides cartographical web services such as Web Mapping Service (WMS), Web Feature Service (WFS) and Web Processing Service (WPS). In general, the Web processing service provides standard HTTP interface for remote configuring and launching data processing software modules and presenting results in generic formats. The services can be used by either standard GIS environments or web applications.
The web portal serves as a connection point between different SDI elements (geospatial data, metadata, services and client applications). Its main feature is providing unified API for client web applications which comply with the conventional Boundless/OpenGeo architecture (Becirspahic & Karabegovic 2015). The web portal provides server-side part of the Web-GIS client application which complies with general INSPIRE (INfrastructure for SPatial InfoRmation in Europe, https://inspire.ec.europa.eu) requirements to geospatial data visualization and implements computational processing services launching to support solving tasks in climate monitoring.
The ‘Climate’ platform described above is aimed at area of basic and applied climatology. Since adaptation to climate change and mitigation of its negative consequences nowadays is urgently required, the thematic VRE should deals with applied tasks of different domains in which climate change impact should be considered. To meet these challenges and to make a step from climate science to climate services we develop the ‘Climate+’ platform whose architecture is shown on Figure 3. The ‘Climate’ platform is working with resources related only to the climatic data and metadata layers. Subsequent usage and re-usage of results of data processing by procedures of the Computational backend is a concern of a user only. However, while dealing with applied regional problems caused by climate change, a special role is assigned to the decision-support system (DSS), which relies upon solutions of computational problems (Applied Tasks) describing changes of states of spatio-temporal objects (rivers, lakes, roads, etc.) at different time intervals. Resulting information resources (results of calculations) should be saved since they would be requested in different decision-making tasks. In the ‘Climate+’ platform the results of the calculations performed in the Applied Tasks box are added to the data and metadata layers and can be used later.
The variety of applied computational problems is significant, and the input data required for their solving is related to data from different subject domains. Therefore, the description of intensions of input data for computational applications and data collections is semantically heterogeneous, that could lead to inadequate decisions related to the interpretation, integration, exchange of data, as well as in retrieving and usage of relevant information. Overcoming semantic heterogeneity in spatial data infrastructures was described in (Lutz et al. 2009). To solve the problem of semantic heterogeneity in the ‘Climate+’ platform, an ontology layer characterizing the properties of the data collections (Reanalysis, Observations, Modeling Data) is created. This ontology is used to select input data for Applied Tasks applications.
Formalized 156 climatic and meteorological characteristics currently presented in the collections are named in accordance with the WMO taxonomy (WMO 2016). They form a common shared vocabulary of the domain and are presented in the ‘Climate+’ platform in the form of OWL-ontology described in the next section. At present the contradictions in the definitions of physical quantities in the platform are solved by an expert. The ‘Meta+’ application is used to automatically create individuals of the ontology of the climate and meteorological data collections (Data collection ontology, DCO). The Expert System application allows one to match the applied problems input data intensions with the corresponding collections of climatic data. The set of applications Applied Tasks Facts Mining creates ontology individuals for each specific application. Examples of such individuals for the ontology of climate data collections properties and the ontology of the input data of the problem of freezing and thawing of a river are given in the next section.
Formalizing the properties of data and metadata and their representation in the form of ontologies in the ‘Climate+’ platform is useful not only for achieving semantic homogeneity, but also for building a knowledge base in the subject domains for decision support tasks. This comes from the fact that in the decision-making system, knowledge representations are used from a significant number of subject areas in which modeling is performed with different degrees of granulation. Without an explicit formal representation of terms, statements and concepts, within the framework of one specification language (in our work of the OWL 2 QL language), the interpretation of the adopted decision may be incorrect.
Ontology layer of Climate+ platform
The ontology layer is a part of the knowledge layer and is used to solve the following tasks:
- Semantic search for collections of meteorological and climatic resources, properties of applied problems solutions and decisions made available in the ‘Climate+’ platform;
- Detection of contradictions between definitions of physical quantities in data collections and their matching in collections and input data of applied problems;
- Formalization of the results of applied problems solutions;
- Building ontological knowledge bases for IDSS.
The ontological layer contains three groups of OWL-ontologies. The first group includes the ontology of the collections of climatic and meteorological data (DCO) (Alipova et al. 2017). To construct it we created the ontology of climatic and meteorological characteristics (WMOO) (Bart, Privezentsev & Fazliev 2017). Key individuals of DC ontology describe the data collection properties and are intended for selection by user input data for its applied tasks. The second group is applied ontologies of input and output applied tasks data (OIODAT) (Bart et al. 2017). Solutions of applied tasks can be used in intellectual decision support system (IDSS). The third group of ontologies is an ontological knowledge base (IDSS OKB) (Kaklauskas 2015). The ontologies of the first two groups are described in more detail below. The third group of ontologies is at process of development now and is not discussed in detail in this paper.
The ontologies are associated with the applications shown in Figure 3. The ‘Meta+’ application supports automatic creation of individuals of the climate and meteorology data collections ontology. The ‘Meta+’ application is written in Python 3 with the use of the Owlready library (https://pypi.python.org/pypi/Owlready). The ‘Climate+’ platform is aimed at presenting data using GIS technologies (human interface). Its further development is aimed at enabling researchers to select and use sets of climate or meteorological data or parts of the sets as input data in their applied tasks via agent interface. Most of the collections contain data that do not relate to all spatiotemporal objects on the Earth; different data collections often contain different sets of physical quantities. To find spatiotemporal objects and their meteorological and climatic parameters, it is required to create an application ‘Expert System’ for selecting the necessary objects and their characteristics. The basis of this expert system is the knowledge base on the spatial objects of data collections and their parameters.
The third group of applications, ‘Applied Tasks Facts Mining’, is designed to extract facts from numerical solutions of specific applied problems and to present them in the form of individuals of the corresponding OWL ontologies. Below the individuals characterizing such solutions are given for the problems of describing the freezing of a river as an example. Demonstration case study task ‘The decision support task’ determines the timeframe for closing the navigation on the Ob river and opening the ice crossings on the same river.
Climate+ applied ontologies
Key elements of OWL-ontologies are classes, properties, and individuals. The individuals are defined during the solution of a reduction problem, as well as most properties. Some of the ontology classes are used for specification of the ontology properties domains and ranges.
Three applied ontologies are described below in some details:
- meteorological and climate information ontology of the ‘Climate+’ platform’s web portal (iao:);
- WMO taxonomy of physical quantities (wmo:);
- ontology for description of web portal applied tasks (tsu:).
Table 1 presents the main classes of the ontologies (Alipova et al. 2017; Bart et al. 2017; Bart, Privezentsev & Fazliev 2017). The namespace prefix is placed before the class name; the class name abbreviation used in the text below is given after the class name in parentheses.
Properties of the applied ontologies are presented in Tables 2 and 3. The first three columns contain the domain, object property and range, respectively; the fourth column is the property abbreviation, used in the text and in Figure 4 (Bart et al. 2017).
Ontology of climate and meteorological information collections. Classes and properties
Collections of climate and meteorological data form the basis for the work of ‘Climate+’ platform services. Facts included in the ontology represent the properties of data climate collections of the ‘Climate’ platform over 19 numerical data collections, which include 40 data sets and 793 numerical data arrays. The climate information ontology includes the description of 170 spatiotemporal objects, characterized by 156 physical quantities. The ontology of climate data collections created is a formal OWL 2 QL description of properties of these data. Its components are specified in the namespace iao:.
The ontology model of numerical arrays of the ‘Climate’ web portal uses the following information model for storage and presentation. Numerical data are represented in data arrays, stored in netCDF files. The data arrays are grouped into data sets. All the data arrays in a set should satisfy the following conditions: (a) obtained at the same spatial or temporal grid; (b) they should be collected under the same simulation or observation conditions, (c) netCDF files that include the same physical quantities. The data sets are grouped into data collections. Data collection is an ensemble of data sets obtained by an organization within a project but represented on different spatial or temporal grids or for different model scenarios. A collection can consist of one or several data set.
The ontology model for description of numerical arrays is based on the presentation of a real spatiotemporal 3D object, that exists on a certain geographical territory and during a certain time interval. Numerical values of the physical properties of this object are estimated in the model specified in a form of a spatiotemporal grid; the model is called the spatiotemporal system. The spatiotemporal system (iao:SS) is a 4D grid that is defined by lists of numerical values of: longitudes (iao:Lol), latitudes (iao:Lal), heights (iao:HLl), and time (iao:Tl). These lists are subclasses of the class (iao:PD)—physical data; therefore, they contain numerical values of a physical quantity (iao:PQ) in certain units (iao:U) and can be described by: number of values of the array of this physical quantity (iao:dn), proportional step of the physical quantity values (iao:dpq), physical quantity minimum (iao:dmiv), physical quantity maximum (iao:dmav), or numerical values of the physical quantity (iao:dv). The list of time labels (iao:Tl) is additionally described by the initial (iao:dit) and final time labels (iao:dft).
A data array (iao:DA) is an ordered list of numerical values of a physical quantity (iao:PQ); it is a property of a spatiotemporal system (iao:osts) at each 4D point (longitude, latitude, height, and time) of the spatiotemporal system (iao:STS). In the context of OWL language, the data array (iao:DA) is a subclass of the class (iao:PQ) of physical data; therefore, it is the numerical array of values of one physical quantity (iao:PQ) in certain measurement units (iao:U) and is described by the number of array components (iao:dnv) and the physical quantity maximum (iao:dmav) and minimum (iao:dmiv). The set of data arrays (iao:DA) belongs (iao:dda) to a data set (iao:DS), which differs from other data sets in the model scenario (iao:S), spatial resolution (iao:SR), time step (iao:Ts), and membership in a collection (iao:C). A data collection (iao:C) consists of a set (iao:dds) of data sets (iao:DS) and belongs (iao:do) to one organization (iao:O).
Short version of WMO ontology of physical quantities
Main classes of the WMO ontology generalized WMO taxonomy (STFC 2016) are listed in Table 1. The structure of this table is described below. The top level of the taxonomy of WMO physical quantities distinguishes several groups of physical quantities: hydrological, land surface (wmo:LSP), air (wmo:MP), and space quantities; each of these groups includes subgroups. Figure 4 shows individuals of the WMO ontology (Bart, Privezentsev & Fazliev 2017). They represent the classes of physical quantities of the soil category (wmo:SC), long-wave radiation (wmo:LWRC), and temperature category (wmo:TC). The ontology individuals wmo:ST and wmo:SM are instances of the wmo:SCy class, the individual wmo:WMO_DLWRFlux is an instance of the wmo:LWRC class, and the individual wmo:ST, of the wmo:TC class.
Figure 4 shows examples of individuals of the OWL-ontology of climate information characterized INM CM4 data collection, WMO taxonomy of physical quantities, and freeze-up on a river applied tasks (T2). The last problem is formulated in papers (Shen 2016; Shen and Chiang 1984; Bart et al. 2018). Data representing the solution of task T2 is located in the data layer (Applied Tasks Data).
Individuals of the OWL-ontology are shown as ovals, and literal values, as dashed rectangles; arrows show properties with unique identifiers from the above tables located in small empty rectangles above. Similarly to a RDF-graph, each arrow connection represents a ‘subject–predicate–object’ triad.
Figure 4 shows schematically the individuals describing the properties of the data collections and the tasks could be solved. The simplification in Figure 4 is to consider only one of the four types of structures of individuals of the ‘freeze-up on a river’ task. The individual shown in Figure 4 characterizes the temperature of the water surface in the river during the autumn period. The IAO namespace contains the properties of the collections and data sets of the ‘Climate+’ platform. Each data array is characterized by its space-time resolution as well as by the name of the physical quantity and its dimensionality. The namespace T2 contains individuals describing the properties of the arrays that are input and output data for the freezing model of a river (Bart et al. 2018). To solve the applied task the model should be provided with these input data. Sometimes in practice the names of variables in climatic data collections and in applied models can have different names for the same variables. The task of comparing these names can be simplified and partially automated by usage of the WMO taxonomy. Using saved in the knowledge base rules for comparing the names of climatic data from the collections with those accepted by WMO and using the WMO terms in the model description one can determine availability in the collections of the data required for calculations. The characteristics determining location and/or time of the beginning of the process of ice formation on a river are of interest for the semantic evaluation of the results.
IDSS ontological knowledge base
Critical for decision-making individuals such as time intervals for the formation of ice on the river, ice thicknesses permitting the movement of different weight vehicles over ice, the stage of ice melting on which traffic on the ice of the river is prohibited, and the time interval during which not all river stretches are accessible for shipping are exported to the ontological knowledge base (OKB) used by the ‘IDSS’ application. The description of this IDSS OKB, ‘IDSS’ application and interfaces for working with ‘Expert System’ and ‘IDSS’ applications is omitted here. Figure 5 shows the structure of the individual characterizing the time intervals that are critical for decision-making in the autumn season.