Distributed Data Access and Replication

     

Distributed data forms a bottleneck for most of the grid systems. Earlier, we have discussed such grid applications (e.g., the EUROGRID project). This complexity of data access and management on a grid arises from the scale, dynamism , autonomy, and the geographical distribution of the data sources. These complexities should be made transparent to grid applications through a layer of grid data virtualization services. These services provide location transparency and easier data management techniques.

Data virtualization services, such as federated access to distributed data, dynamic discovery of data sources by its content, dynamic migration of data for workload balancing, and schema management, all help to provide transparencies (e.g., location, naming, distribution, ownership, etc.) to the data. These virtualization services need to address various types of data including flat files, relational, objects, and streaming data. In order to derive a common data management and interface solution to data access and integration, the OGSA initiative has started a working group titled the DAIS (Data Access and Integration Service) group. The DIAS group is responsible for data virtualization services and standard interfaces for data access and integration.

We will begin this discussion with OGSA platform requirements on data management, and then we will spend time reviewing the work done by the DAIS group on data access, integration interfaces, and the framework definition.

Some OGSA requirements on data management [9] are:

  • Data access service. Exposes interfaces that help the clients to access the data in a uniform manner from heterogeneous data sources.

  • Data replication. Allows local compute resources to have local data, and thereby improves the performance. Services that may be applied for data replica functions can include:

    • Group services for clustering and failover

    • Utility computing for dynamic provisioning

    • Policy services for QoS requirements

    • Metering and accounting services

    • Higher-level services such as workload management and disaster recovery services

The OGSA still needs to address these data replication requirements by providing common interfaces to replication services. The implementation has to provide an adapter that can move data in and out of the heterogeneous physical and logical environments, without any changes to the local data access systems.

  • Data caching service. Utilized to improve data access performance.

  • Metadata catalog and services. Allow us to search for a multitude of services, based upon the object metadata attributes.

  • Schema transformation services. Allow for the conversion of schema from one form to another. An example of this transformation is the conversion of XML data using XSL transformation engines.

  • Storage services. Storage activities are treated as resources and can be modeled as CMM services.

We have now provided discussions to better understand the required OGSA platform services. Let us now explore some of the initial work activities in this area.

Conceptual Model

The conceptual model captures the data management systems, the data resources they contain, and the data sets resulting from the data requests performed on these data resources.

Based upon the principle of keeping the existing data management systems and its interfaces intact, this model attempts to expose the underlying data model and native language/API of the resource manager.

There are two types of resources: resources that are "external" to the OGSI-complaint grid and their OGSI resource service logical counterparts.

Figure 10.12 represents both the external resources and its logical counter parts .

Figure 10.12. The external resources and Logical resources of a database management system.

graphics/10fig12.gif

Let us review this conceptual relationship model in the following details:

  • External Data Resource Manager (EDRM) and the Data Resource manager (DRM). This represents a data management system, such as relational database management system, or a file system. The Data Resource Manager is a Grid service that represents the external data resource manager and it binds to an existing EDRM. This provides management operations, including start and stop. These management functionalities are managed by specific vendors , and hence, may be out of scope for DAIS. Figure 10.13 shows the relation between DRM and EDRM.

    Figure 10.13. The conceptual model for the Data Resource Manager Grid service.

    graphics/10fig13.gif

  • External Data Resource (EDR) and Data Resource (DR). The external data resource is the data managed by the EDRM. This can be a database in a DBMS or a directory in a file system. The Data recourse is a Grid service that binds to EDR. Data resource is the contact point to the data and it exposes the metadata about the external data resource. It must provide the data management (access and update) and query capabilities. Figure 10.14 shows the relation between DR, EDR, and EDRM.

    Figure 10.14. A logical data resource.

    graphics/10fig14.gif

  • External data set (EDS) and data set (DS). This is the logical data similar to a relational database view or file cache that is separate from the EDRM or EDR, however, it can be retrieved and stored in the EDR (see Figure 10.15). The data set forms a service wrapper for the EDS. These logical views exposed some challenging interfaces to manage the data.

    Figure 10.15. A logical data set.

    graphics/10fig15.gif

In addition to the above external and logical models, the DAIS proposed some logical resources for the following topics:

  • Data Activity Session. This is a logical data session for all data access operations. It must maintain and manage the context data operations. Figure 10.15 illustrates this activity session for a requester.

  • Data Request. This is logical information regarding a request submitted by a requester to the data access session. There is currently no grid service to manage this activity. This request can be a query, data manipulation operations, or other such related activities.

Service Implementation

There have been some previous discussions regarding the GGF. These discussions are focusing on the service implementation modeling from the above grid services portTypes. There are two proposed models with their own advantages and disadvantages.

  1. Each portType becomes a Grid service. The complexity surrounds the client-side utilization where it has to be concerned with a number of grid service instances (GSH and GSRs) in order to manage a database.

  2. All portTypes are implemented in a single service implementation. This simple case provides an aggregated service view. The complexity lies with the service implementer to maintain state information on the client's discrete activities. However, the client has to manage only one GSH and its corresponding GSRs.

The DAIS portTypes are designed with the following principles in mind:

  • OGSI complaint

  • Extensible and pluggable with new storage systems and access mechanisms

  • Easy to understand and use by the client

  • This solution must satisfy Web service and grid service communities

  • Easier integration with the existing data managers and resources

Based on the above design goals, the DIAS constructs the following logical portType hierarchy with a clear separation of the interface functionality (Figure 10.16).

Figure 10.16. A logical portType functionality separation.

graphics/10fig16.gif

For more information on this topic, it is important to refer to the DAIS specifications for details.

Summary

The DIAS is a work-in-progress activity in the GGF. The specification has not yet matured, hence, we can expect a number of changes to this specification especially with the involvement of major database vendors. There are some reference implementations that exist today. The OGSA-DAI [10] project is one such major project that is concerned with the construction of a middleware to assist with access and integration of data from separate data sources, via the grid.



Grid Computing (IBM Press On Demand Series)
Windows Vista(TM) Plain & Simple (Bpg-Plain & Simple)
ISBN: 131456601
EAN: 2147483647
Year: 2002
Pages: 118

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net