Iftikhar U. Sikder
University of Maryland, Baltimore, USA
University of Maryland, Baltimore, USA
Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited.
This chapter introduces the research issues on spatial decision-making in the context of distributed geo-spatial data warehouse. Spatial decision-making in a distributed environment involves access to data and models from heterogeneous sources and composing disparate services into a meaningful integration. The chapter reviews system integration and interoperability issues of spatial data and models in a distributed computing environment. We present a prototype system to illustrate the collaborative access to data and as a model for supporting spatial decision-making.
Distributed access to data and services brings closer involvement of different communities, regardless of geographic locations and social orientations. Diverse tools and Web services are now available to extract data and models from online repositories. The vision of geo-spatial data warehouse challenges the fundamental criticism directed against Geographic Information Systems (GIS): being an "elitist" tool that harbors the gap between system users and non-users (Pickles, 1995). A unique advantage of having a distributed geo-spatial data warehouse is that access to geospatial information and services for decision makers and planners will promote Collaborative Spatial Decision Making (CSDM, NCGIA, 1995). Hence, many complex environmental problems can be resolved through collaboration, which would have been difficult to resolve otherwise. This chapter reviews the research issues of distributed GIS services and geospatial data warehouses in the context of collaborative decision-making. In the following sections, we identify some essential features of a distributed spatial data warehouse relevant to spatial decision-making, with respect different standards and protocols. Then, we discuss system integration and interoperability issues of spatial data and models. A prototype collaborative decision support system is presented to illustrate collaborative access to data, and model for supporting spatial decision. Finally, we identify the future research trends of distributed data warehouse in the framework of emerging research trends of semantic Web.
It has been reported that as much as 80% of general information contains spatial components (OGC, 2001). Typically, these include spatial data or geo-referenced information, such as digital or analog map, network, GPS data and satellite-based imagery. As spatial data and services are increasingly becoming available, there is a growing demand for robust information processing for explorative analysis, where a user is empowered to extract multiple services from different repositories. The use of spatial data cuts across many disciplines, ranging from ecosystem modeling to location commerce. This has resulted in many stand-alone native spatial data structures and domain models. The challenge is, however, to enable interoperability of the heterogeneous systems to communicate in a distributed computing environment.
Serving spatial data from disparate sources and disseminating them to target users is also the vision of NSDI (National Spatial Data Infrastructure), expressed by the Mapping Science Committee (NRC, 1993). However, the vision of NSDI could not conceive of the enormous growth of Internet, which underemphasizes the importance of effective processes of dissemination to users (NRC, 1999). At the object level, the Open GIS Consortium (OGC) is developing a number of specifications. The Open Geodata Interoperability Specification (OGIS) defines types and methods required for an interoperable system. The notable feature of OGIS is Abstract Specification. It includes the Essential Model and Abstract Model, which form the basis for development of Implementation Specifications (OGC, 2002a). The specification promises to provide interoperability of geospatial data and modeling system, through the use of common language for sharing geo-data and standardized definitions of interfaces. However, OGIS is an operational model, not a data standard (Gardels, 1995). Systems developed in compliance with the operational model will be interoperable, otherwise they will not. The specification does not explicitly provide any interoperability mechanism to link with legacy systems to allow geo-processing function. In addition, at the conceptual level, semantic conversion or mapping with existing systems with OGIS is still very difficult (Camara, 1997). A higher level of semantic modeling is still required before the actual mapping of OpenGIS and existing system could take place (Yuan, 1997).
Conventional approaches to interoperate different data warehouses rely on metadata or data dictionary (Inmon, 2001). This method is essentially dependent on catalog interoperability through standardization. In the case of geospatial data, the metadata standard developed by the Federal Geographic Data Committee (FGDC, 1999) emphasizes the content standard of geospatial database, rather than the database itself. For example, the standard does not specify how the two different geometric representations of a real world feature can be semantically mapped, rather it specifies how to document each representation in a standard protocol. Therefore, we still need another layer of semantic translator to communicate with them.
One of the main functionalities of a distributed data warehouse is to provide the building blocks of decision support systems. The system should be able to combine heterogeneous spatial and non-spatial components provided by the data warehouse. The level of support may extend to:
In order to support these features from distributed data warehouse, a DSS should have access to scalable and reusable components, which can be assembled in a modular fashion at different level of representation. This leads to the fundamental problem of system integration and interoperability issues in GIS.
A distributed spatial data warehouse framework for decision support needs to address various complex research issues ranging from technical, social and institutional aspects. In this chapter, we will mainly focus on the technical issues on how distributed geospatial services can be effectively utilized in decision making or planning. Needless to mention, the institutional and social aspects are of no less importance in realizing the spatial services; these aspects have been dealt with separately (Sikder et al., 2002). We need to explore how the classical decision support framework fits with the emerging distributed autonomous services, and whether these new developments can cope with the arrangements of new standards, protocol, institutional regulations and so forth. The emerging emphasis on the decentralization of resources and services might need a novel approach to tailor decision models from disparate sources and customize them for user groups.
Unlike any alphanumeric data, spatial data require a complex organization of geometry, feature, theme, topology, projection and referencing systems (Adam et al., 1997; Worboys, 1999). Different level of abstraction of real world objects and geographic entities gives rise to different representation schemes. At the implementation level, the domain specific data models can hardly communicate to realize user request. The proliferation of different data types and model limits common geo-processing services, such as spatial selection, intersect, union, buffer and overlay, which are essential modules for selecting subset from data space of a warehouse. Some of the desirable characteristics of a distributed geospatial data warehouse are:
These characteristics call for an interoperable architecture where application domains, such as environmental modeling or resource optimization tools, become a part of the data processing services of the warehouse. Distributed data warehouse research should be directed towards the spatial interoperability aspect of distributed data brokering and spatial decision-making.
An essential feature of spatial decision support system is the integration of geographic data and geo-processing function or services in a distributed environment. Often, GIS models are developed for a specialized purpose, tightly woven with the data model, without regards to system interoperability. In the past, many integration frameworks have been proposed (Chou & Ding, 1992; Nyerges, 1993; Abel et al., 1994). However, these approaches do not support a model management system to support dynamic processing in a different spatio-temporal scale. Additionally, these approaches include lower level simple data transfer to high-level complex coupling. As far as software reusability is concerned, in a distributed environment, where there is little agreement on different components and modeling paradigms, lower level integration does not add much benefit when diverse models are to be communicated at a higher level. Moreover, there is an inherent dichotomy of the GIS model and environmental model. While the former focuses on representation of space-time relationship and spatial features, the latter is concerned with dynamic processes. Such space-process dichotomy determines the distinction in abstract models and languages used by GIS and models (Maidmet, 1996). To get around this problem, a unified conceptual model of the problem domain is essential prior to system development. In high performance distributed computing environments, modularizing model components calls for an object-oriented approach, which can accommodate flexible and iterative model processes, incorporating prototyping, use of class libraries, reuse and re-engineering of other application code, and late configuration to changing requirements.
Such object oriented modeling tools could be regarded as a generic decision model for problems of a certain class that can be customized through an instantiation process. Model formulation, from the user point of view, becomes the simple process of choosing and applying a set of special purpose domain oriented concepts for describing the problem domain. As far as the semantic contents of the models are concerned, a meta-model ontology would be necessary to describe spatial data and services in a modeling language. This aspect is discussed later in detail.
The middleware approach to access spatial services through a broker relies on a standard definition of "interfaces." The interface allows communication with legacy systems, regardless of native implementation language. Broker-based solutions, such as COM (Component Object Model) or CORBA (Common Object Broker Architecture), offer an Interface Definition Language (IDL) to communicate different tiers of a solution. CORBA's naming service provides a mechanism to locate and register objects in an ORB (Object Request Broker)-based system. Through an encapsulating interface, the requesters of services (clients) are separated from the provider of services (server). ORB core delivers clients' requests for object implementation and gets the request back from the server to the client. This is initiated in a client's IDL stub and their IDL skeleton delivers the request to object implementation. This allows the client to have access, not only to remote methods (e.g., geo processing functions), but also provide a mechanism to instantiate spatial objects as if the system is running on a local machine. In the Internet, these services can be communicated through IIOP (Internet Inter-Orb Protocol). A distributed geospatial data warehouse can be created using the building block of CORBA, instead of developed from scratch. Typical problems associated with distributed computing, such as data replication, synchronization of operations at multiple sites, transaction management, and query optimization, can be efficiently executed in CORBA. Wang (2000) confirms this in an experimental CORBA-based system to facilitate geo-processing function as well as data retrieval.
Similarly, DCOM (Distributed Component Object Model) serves as the basis of Microsoft's solution for component interoperability across networks. In a DSS, COM compliant GIS can be use to create custom-tailored applications by making use of various components of different applications and returning the processing result to the native GIS. For instance, a user can create an instance of a component of a statistical package, which could then be used to process local GIS data, then the result is returned to the native GIS (Ungerer et al., 2002; Zhang et al., 2000). These embeddable components allow movement away from proprietary languages. In a distributed environment, coupling COM with GIS in an Internet map server allows clients to invoke various geo-processing functions through a Web browser, and the map server processes the request and returns the result to the client browser through a Web server. These tools are now commercially available, e.g., ESRI's MapObjects IMS, ArcIMS. In a somewhat homogenous environment (e.g., Java-to-Java), remote methods can be invoked from other Java virtual Machine (JVM). Although RMI (Remote Method Invocation) provides a simpler implementation model than CORBA and DCOM, a full-fledged distributed GIS in RMI is plausible. RMI has been reported to be less scaleable than RPC (Wang, 2000).
OpenGIS Simple Feature Specification (OGC, 2002b) outlines spatial data access for COM, CORBA and SQL with respect to "feature" — the basic unit of geospatial data — and "geometry." The specification emphasizes data access, rather than geo-processing (1998, Cuthbert). From the decision support point of view, having data access at the client's end without robust geo-processing capabilities amounts to little help. Moreover, in a middleware-based solution, the client has to pull a massive amount of data at his/her end and manage it locally. Such approaches assume the client's explicit ability to manipulate server connection and invoke remote objects. Thus, frequent spatial processes, such as spatial join between data from two different servers, needs to be coordinated at the client's end. Object level manipulations of spatial process often fail to provide high-level views to the application developer. In a DSS user or decision maker's view, spatial features or geometry need to be realized at a higher level of abstraction, while at the same time maintaining the transparency of system processes. Such systems are yet to be realized within the decision support framework and geospatial interoperability of data and models.
Collaborative decision-making and public GIS, often called GIS2 (Densham et al., 1995; Sheppard, 1995) involves a "bottom-up" planning, reflecting users' perspectives to explore the projected planning scenarios. Decision-making for environmental planning is inherently distributed in nature over space and time (Agnew, 1993; Craig et al., 1999). In a real life situation, it is often difficult to achieve the decision makers' view or an effective pattern of social interactions because of heterogeneous group behavior and undefined agendas (Mosvick et al., 1987). Also, the spatial nature of decision conflicts among the different users often needs to be resolved in real time. Equipped with transparent computing resources and visualization, decision makers can explore the multiple decision scenarios in a collaborative environment and converge to consensus through mediation.
There is growing interest on collaborative decision making over the Internet. Rao & Jarvenpaa (1991) outlined the theoretical aspect of the effectiveness of the group decision support system with regards to the theories of communication and human information processing capabilities. Dillenburg et al. (1992) put forward the idea of "distributed cognition," which acknowledges that group decision-making can be supported by tools to allow explicit representation and manipulation (visualization) of shared information. Lotov et al., (2001) illustrated implementation of the Point Associated Trade-Off techniques (PAT), Feasible Goals Method (FGM) and Interactive Decision Maps (IDM) techniques for visualization of the variety of feasible criterion vectors for group decision-making. In collaborative decision-making, a distributed data warehouse will provide users with these basic building block of model components and interactive simulation interfaces, which is informative and responsive to accommodate a newer approach to effective participation in decision-making.
GEO-ELCA (Exploratory Land Use Change Analysis) is a collaborative spatial decision support system to support users to assess the non-point pollution impact of land use changes. Built in the framework of COM coupling strategy, the system makes use of local GIS data and geo-processing facilities, along with an external simulation model components — the non-point pollution model. GEO-ELCA implements a middleware component to allow for receiving a client's response as http requests through an active-X enabled browser to support GIS functionality through a customized Web browser. User requests are received and managed by a middleware component or Web server, which administers requests and transmission of response between client and middleware tiers. The request is processed in an Application Programming Interface (API), and the result is sent back to the Web browser in the form of images supported by the browser. Typical examples of requests are: change land use type, identify a feature, zoom in, database query, display different themes etc. The Web server is developed with MapObjects components and ESRI's Internet Map Server (IMS; ESRI, 2002). MapObjects is a set, lightweight COM, which comprises ActiveX controls and automation objects specifically for mapping purposes, which is used along with any ActiveX container.
Internet Map Server provides a mechanism for receiving and dispatching requests from a client Web browser to a MapObjects application, eliminating the need for any Web server programming. Currently, the application implements the so-called "Simple Method" (Schueler, 1999) for estimating exports of various pollutants' runoff from different land uses. The open-end architecture of GEO-ELCA can be further customized using embeddable COM compliant external simulation models.
Figure 1: GEO-ELCA architecture for distributed access to process model and spatial data
A key feature of GEO-ELCA is to provide users an exploratory tool to access the appropriate environmental data set and model base to visualize the consequences of a user decision. For example, a user can employ a visual query to find out a particular land use category. The query output can be further processed to a execute a simulation model.
Since the interaction process is in visual mode, a user need not have to have a prior idea of model components or parameters. A user can initiate a change in land use type by graphically selecting a polygon. The server side application processes the request and makes the necessary update in the database to reflect the corresponding the changes of the pollutant coefficients. Every request for change in land use type results in recalculation of the mass export of pollutant and corresponding statistics. The processed result is sent back to the Web server and then to the client side. GEO-ELCA allows the various features of GIS services on the Web. The system allows dynamic selection of a feature type (i.e., polygon - land use class) interactively, so that a user can change attribute items and identify a feature property. The consequences of user decisions initiate the simulation model to estimate the yearly pollution load. The output can be visualized as pollutant distribution in terms of different classification schemes (e.g., standard deviation, plain break, quantile etc.), with modified map legend (both continuous and unique data type). The resulting pollution map can be visualized with multiple theme overlay. At its current stage of development, the system does not offer any mediating algorithm to resolve multiple scenarios. Such algorithm should involve a group consensus building mechanism through conflict resolution. Tools such as Analytic Hierarchic Process (Satty, 1990) or evolutionary search (e.g., genetic algorithm; Bennett, 1999) can be easily plugged in the system.
Figure 2: Visualization of collaborative explorative scenario
One of the shortcomings of the CORBA/DCOM and middleware-based approaches is that the client's application should have a prior understanding, to some extent, of the metadata of the responses of each object implementation. Any changes made to the services or to the data source requires a corresponding change in the object implementation. In a volatile Web environment, it is often difficult to keep track of each service and reflect the corresponding variations at the object implementation. This has necessitated the use of autonomous and reflexive objects, or so called "agents," to proxy different services. The agents communicate through a different level of ontology or well-specified domain vocabularies (Guarino, 1997). Through an URI (Uniform Resource Identifier) or a namespace, agents can register services. Since the namespaces are uniquely determined, there is no possibility of semantic mismatching of resources, regardless of their domain labels. These resources are formally described in RDF (Resource Description Framework) (Lassila & Swick, 1999), which provides a machine-readable metadata description through a triple (resource, property, value) to describe resource content in a Web page.
A distributed warehouse needs to express geometric primitives in an agent-understandable language. Open GIS's Abstract Specification provides such simple features in which geometric properties are defined. These features are encoded in a markup language released as GML (Geography Markup Language) (OGC, 2002c). Since GML has its root in RDF, using GML's core schema, one can define applications or domain schemas following RDF. For example, users can create new feature types or collection types from abstract feature types or collection types, which represents real world categories, such as "road," "bridges," etc. However, RDF lacks in inference and logical layer. In order to describe complex domain models, we need to specify data type and consistent expression for enumerations. Built on the top of RDF, DAML (DARPA Agent Markup Language) offers semantics of higher order relationships by including modeling constructs from descriptive logic. DAML based semantic translation and reasoning engine can be used in the local process to invoke generic procedures. DAML provides a declarative representation of Web service, model objects and user constraint in Web markup ontology to enable automated reasoning about declarative API (McIlraith, 2001). In a collaborative environment, multiple agents (both human or system automata) can specify different modeling parameters and constraints with different degree of preference, so that the resulting models conforms to the semantics of the model developer, expressible in a universally accepted ontology. Sikder et al. (2002) have proposed a multi-agent-mediated approach (OSIRIS-Ontology based Spatial Information and Resource Integration Scheme) to provide semantic interoperability and model composition.
We have addressed some of the issues relevant to distributed data warehouse and collaborative spatial decision-making. The heterogeneity of spatial object and geo-processing model components are discussed with respect to system integration framework. We have noted that currently there is no standard for re-use specification of existing spatial models at higher levels of abstraction, which would allow effective communication in the distributed spatial data warehouse architecture. There is a strong need for a generic model formalism to link models to the domain-specific knowledge. A formal description of the spatial data component and process model will allow better interoperation among heterogeneous systems. User or decision maker's views on spatial features or geometry needs to be realized at a higher level of abstraction without sacrificing object level interaction. The design of GEO-ELCA integrates GIS application models with a component-based framework, and serves complex analysis and simulation models by providing a mechanism for exploratory decision scenario. We have also noted that future development of distributed systems should comply with the emerging semantic Web framework in an agent-assisted environment. In terms of collaborative decision-making, the added advantage is that community based geo-spatial vocabulary, and the corresponding modeling semantics, can be communicated effectively. This will eventually allow multiple representations of spatial features at a different level of hierarchy in compliance with decision issues.
Abel, D.J., & Kilby, P.J. (1994). The systems integration problem. International Journal of Geographical Information Systems, 8 (1), 1–12.
Adam, N. & Gangopadhyay, A. (1997). Database issues in Geographic Information Systems. Kluwer International.
Agnew, J. (1993). Representing space: Space, scale and culture in social science. In J. Duncan & D. Ley, (Eds.), Place/Culture/Representation (pp. 251–71). London and New York: Routledge.
Bennett, D.A., Wade, G.A., & Armstrong M.P. (1999). Exploring the solution space of semi-structured geographical problems using genetic algorithms. Transactions in GIS, 3 (1), 89–109.
Camara, G., Thome R., Freitas, U., & Miguel, A. (1999). Interoperability in practice: Problem in semantic conversion from current technology to OpenGIS. In A. Vckovski, K. Brassel & H. Schek (Eds.), Interoperating Geographic Information Systems, (Eds.) Second International Conference.
Chou, H. C., & Ding, Y. (1992). Methodology of integrating spatial analysis/modeling and GIS. Proceedings, 5th International Symposium on Spatial Data Handling, (pp. 514–523). Charleston, SC.
Craig, W., & Elwood, S. (1998). How and why community groups use maps and geographic information. Cartography and Geographic Information Systems, 25 (2), 95–104.
Cuthbert, A. (1999). OpenGIS: Tales from a Small Market Town. In A. Vckovski, K. Brassel and H. Schek (eds.), Interoperating Geographic Information Systems) Second International Conference.
Densham, P. J., Armstrong, M. P., & Kemp, K. (1995, September 17–21). Report from the specialist meeting on collaborative spatial decision making, Initiative 17. U. C., Santa Barbara: National Center for Geographic Information and Analysis.
Dillenburg, P., & Grimshaw, J.A. (1992). A computational approach to socially distributed cognition: Interaction learning situations with computers. European Journal of Psychology of Education, 7 (4), 353–372.
ESRI. (2002). http://www.esri.com.
FGDC. (1999). http://www.fgdc.gov/.
Gardels, K. (1995, April). Open Geodata — CERES and ELIB. Geo Info Systems.
Guarino, N. (1997). Semantic matching: Formal ontological distinctions for information organization, extraction, and integration. In M. Pazienza, (Ed.), Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology (pp. 139–170). Frascati, Italy: International Summer School, SCIE-97.
Inmon, W. (2001). A brief history of metadata: From master files to distributed metadata. Retrieved from http://www.billinmon.com/library/whiteprs/Metadata.pdf.
Lassila, O., & Swick, R. (2001). Resource Description Framework (RDF) Model and Syntax Specification, W3C (World-Wide Web Consortium). Retrieved from http://www.w3.org/TR/REC-rdf-syntax/.
Lotov, A., Bushenkov, V., Chernov, A., & Kistanov, A. (2001). Feasible goals method. Search for smart decisions. Retrieved from http://www.ccas.ru/mmes/mmeda/book5.htm.
Maidment, D.R. (1996). Environmental modeling with GIS. In Goodchild, M. F. et al (eds.), GIS and Environmental Modeling: Progress and Research Issues (pp. 315–323). Fort Collins, Colorado: GIS World, Inc.
McIlraith, S., Son, T. C., & Zeng, H. (2001, March/April). Semantic web services. IEEE Intelligent Systems, 16 (2), 46–53.
Mosvick, R., & Nelson, R. (1987). We've got to start meeting like this: A guide to successful business meeting management. Glenview, IL: Scott, Foreman.
National Research Council. (1999). Distributed geolibraries: Spatial information resources, workshop summary. Commission on Geosciences, Environment and Resources. Washington, D.C.: National Academy Press.
National Research Council. (1993). Towards a coordinated spatial data infrastructure for the nation. Washington, D.C.: Mapping Science Committee, National Academy Press.
NCGIA. (1995, September). Collaborative Spatial Decision Support. Specialist Meeting of NCGIA Research Initiative 17. Santa Barbara, CA.
Nyerges, T. (1993). Understanding the scope of GIS: its relationship to environmental modeling. In M. F. Goodchild, B. O. Parks & L. T. Steyaert (eds.), Environmental Modeling with GIS (pp. 75–93). New York: Oxford University Press.
Open GIS Consortium. (2001). http://www.opengis.org.
Open GIS Consortium. (2002a). OpenGIS Abstract Specification. Retrieved from http://www.opengis.org/public/abstract.html.
Open GIS Consortium. (2002b). OpenGIS. Simple Features Specification for CORBA/SQL/OLE/COM. Retrieved from http://www.opengis.org/techno/specs.htm.
Open GIS Consortium. (2002c). OpenGIS. Geography Markup Language. (GML) Implementation Specification. Retrieved from http://www.opengis.org/techno/specs.htm.
Pickles, J. (1995). Ground truth: The social implications of Geographic Information Systems. New York: Guildford Press.
Rao, V.S., & Jarvenpaa, S.L. (1991). Computer support of groups: Theory-based models for SDSS research. Management Science, 37 (10), 1347–1362.
Saaty, T. L. (1990). Multicriteria Decision Making: The Analytic Hierarchy Process. Pittsburgh: Expert Choice Inc/RWS Publications.
Schueler, T. (1999). Microbes and urban watersheds. Watershed Protection Techniques, 3 (1), 551–596.
Sheppard, E. (1995). GIS and society: Towards a research agenda. Cartography and Geographic Information Systems, 22 (1), 251–317.
Sikder, I., & Gangopadhyay, A. (2002a, October–December). Design and implementation of a web-based collaborative spatial decision support system: Organizational and managerial implications. Information Resources Management Journal, 15 (4).
Sikder, I., & Yoon, V. (2002 b). Software agent oriented framework for ontology driven interoperability of geo-spatial models. Proceedings of the Decision Sciences Institute 2002 Annual Meeting. San Diego.
Ungerer, M., & Goodchild, M. (2002). Integrating spatial data analysis and GIS: A new implementation using the COM. International Journal of Geographical Information Science, 16, (1), 44–53.
Wang, F. (2000). A distributed geographic information system on the Common Object Request Broker Architecture (CORBA). GeoInformatica, pp. 89–115.
Worboys, M. (1995). GIS: A computing perspective. Taylor and Francis.
Yuan, M. (1997). Development of a global conceptual schema for interoperable geographic information. Inerop'97. Santa Barbara: UCSB.
Zhang, Z., & Griffith, D. (2000). Integrating GIS components and spatial statistical analysis in DBMS's. International Journal of Geographical Science, 14, 543–556.
Part I - ERP Systems and Enterprise Integration
Part II - Data Warehousing and Data Utilization