The Grid Problem

     

Grid Computing has evolved as an important field in the computer industry by differentiating itself from the distributed computing with an increased focus on the resource sharing, coordination, and high-performance orientation. Grid Computing is trying to solve the problems associated with resource sharing among a set of individuals or groups.

These Grid Computing resources include computing power, data storage, hardware instruments, on-demand software, and applications. In this context, the real problems involved with resource sharing are resource discovery, event correlation, authentication, authorization, and access mechanisms. These problems become proportionately more complicated when the Grid Computing solution is introduced as a solution for utility computing, where industrial applications and resources become available as sharable . The best example of this is in the IBM Corporation's Business On Demand resource implementations in Grid Computing.

This commercial on-demand utility concept spanning across Grid Computing services has introduced a number of challenging problems to the already complicated grid problem domains. These challenging problems include service-level management features, complex accounting, utilization metering, flexible pricing, federated security, scalability, open -ended integration, and a multitude of very difficult arrays of networking services to sustain. It is key to understand that the networking services can no longer be taken for granted, as these very important services now become the central nervous system for the enablement of all worldwide Grid Computing environments.

The Concept of Virtual Organizations

The concept of a virtual organization is the key to Grid Computing. It is defined as a dynamic set of individuals and/or institutions defined around a set of resource-sharing rules and conditions (Foster, Kesselman, & Tuecke). All these virtual organizations share some commonality among them, including common concerns and requirements, but may vary in size , scope, duration, sociology, and structure.

The members of any virtual organization negotiate on resource sharing based on the rules and conditions defined in order to share the resources from the thereby automatically constructed resource pool. Assigning users, resources, and organizations from different domains across multiple, worldwide geographic territories to a virtual organization is one of the fundamental technical challenges in Grid Computing. This complexity includes the definitions of the resource discovery mechanism, resource sharing methods , rules and conditions by which this can be achieved, security federation and/or delegation, and access controls among the participants of the virtual organization. This challenge is both complex and complicated across several dimensions.

Let us explore two examples of virtual organizations in order to better understand their common characteristics. The following describes these two examples in simple-to-understand terms.

  1. Thousands of physicists from different laboratories join together to create, design, and analyze the products of a major detector at CERN, the European high energy physics laboratory. This group forms a "data grid," with intensive computing, storage, and network services resource sharing, in order to analyze petabytes of data created by the detector at CERN. This is one example of a virtual organization.

  2. A company doing financial modeling for a customer based on the data collected from various data sources, both internal and external to the company. This specific virtual organization customer may need a financial forecasting capability and advisory capability on their investment portfolio, which is based on actual historic and current real-time financial market data. This financial institution customer can then be responsive by forming a dynamic virtual organization within the enterprise for achieving more benefit from advanced and massive forms of computational power (i.e., application service provider) and for data (i.e., data access and integration provider). This dynamic, financially oriented, virtual organization can now reduce undesirable customer wait time, while increasing reliability on forecasting by using real-time data and financial modeling techniques. This is another example of a virtual organization.

With a close observation of the above-mentioned virtual organizations, we can infer that the number and type of participants, the resources being shared, duration, scale, and the interaction pattern between the participants all vary between any one single virtual organization to another. At the same time, we can also infer that there exist common characteristics among competing and sometimes distrustful participants that contributed to their virtual organization formation. They may include (Foster, Kesselman, & Tuecke) some of the following items for consideration:

  1. Common concerns and requirements on resource sharing. A virtual organization is a well-defined collection of individuals and/or institutions that share a common set of concerns and requirements among them. For example, a virtual organization created to provide financial forecast modeling share the same concerns on security, data usage, computing requirements, resource usage, and interaction pattern.

  2. Conditional, time-bound, and rules-driven resource sharing. Resource sharing is conditional and each resource owner has full control on making the availability of the resource to the sharable resource pool. These conditions are defined based on mutually understandable policies and access control requirements (authentication and authorization). The number of resources involved in the sharing may dynamically vary over time based on the policies defined.

  3. Dynamic collection of individuals and/or institutions. Over a period of time a virtual organization should allow individuals and/or groups into and out of the collection; provided they all share the same concerns and requirements on resource sharing.

  4. Sharing relationship among participants is peer-to-peer in nature. The sharing relation among the participants in a virtual organization is peer-to-peer, which emphasizes that the resource provider can become a consumer to another resource. This introduces a number of security challenges including mutual authentication, federation, and delegation of credentials among participants.

  5. Resource sharing based on an open and well-defined set of interaction and access rules. Open definition and access information must exist for each sharable resource for better interoperability among the participants.

The above characteristics and nonfunctional requirements of a virtual organization lead to the definition of an architecture for establishment, management, and resource sharing among participants. As we will see in the next section, the focus of the grid architecture is to define an interoperable and extensible solution for resource sharing within the virtual organization.

Grid Architecture

A new architecture model and technology was developed for the establishment, management, and cross-organizational resource sharing within a virtual organization. This new architecture, called grid architecture, identifies the basic components of a grid system, defines the purpose and functions of such components and indicates how each of these components interacts with one another (Foster, Kesselman, & Tuecke). The main attention of the architecture is on the interoperability among resource providers and users to establish the sharing relationships. This interoperability means common protocols at each layer of the architecture model, which leads to the definition of a grid protocol architecture as shown in Figure 3.1. This protocol architecture defines common mechanisms, interfaces, schema, and protocols at each layer, by which users and resources can negotiate, establish, manage, and share resources.

Figure 3.1. This illustrates a layered grid architecture and its relationship to the Internet protocol architecture (Foster, Kesselman, & Tuecke).

graphics/03fig01.gif

Figure 3.1 illustrates the component layers of the architecture with specific capabilities at each layer. Each layer shares the behavior of the component layers described in the next discussion. As we can see in this illustration, each of these component layers is compared with their corresponding Internet protocol layers , for purposes of providing more clarity in their capabilities.

Now let us explore each of these layers in more detail.

Fabric Layer: Interface to Local Resources

The Fabric layer defines the resources that can be shared. This could include computational resources, data storage, networks, catalogs, and other system resources. These resources can be physical resources or logical resources by nature.

Typical examples of the logical resources found in a Grid Computing environment are distributed file systems, computer clusters, distributed computer pools, software applications, and advanced forms of networking services. These logical resources are implemented by their own internal protocol (e.g., network file systems [NFS] for distributed file systems, and clusters using logical file systems [LFS]). These resources then comprise their own network of physical resources.

Although there are no specific requirements toward a particular resource that relates to integrating itself as part of any grid system, it is recommended to have two basic capabilities associated with the integration of resources. These basic capabilities should be considered as "best practices" toward Grid Computing disciplines. These best practices are as follows :

  1. Provide an "inquiry" mechanism whereby it allows for the discovery against its own resource capabilities, structure, and state of operations. These are value-added features for resource discovery and monitoring.

  2. Provide appropriate "resource management" capabilities to control the QoS the grid solution promises, or has been contracted to deliver. This enables the service provider to control a resource for optimal manageability, such as (but not limited to) start and stop activations, problem resolution, configuration management, load balancing, workflow, complex event correlation, and scheduling.

Connectivity Layer: Manages Communications

The Connectivity layer defines the core communication and authentication protocols required for grid-specific networking services transactions. Communications protocols, which include aspects of networking transport, routing, and naming, assist in the exchange of data between fabric layers of respective resources. The authentication protocol builds on top of the networking communication services in order to provide secure authentication and data exchange between users and respective resources.

The communication protocol can work with any of the networking layer protocols that provide the transport, routing, and naming capabilities in networking services solutions. The most commonly used Network layer protocol is the TCP/IP Internet protocol stack; however, this concept and discussion is not limited to that protocol. The authentication solution for virtual organization environments requires significantly more complex characteristics. The following describes the characteristics for consideration:

Single sign-on

This provides any multiple entities in the grid fabric to be authenticated once; the user can then access any available resources in the grid Fabric layer without further user authentication intervention.

Delegation

This provides the ability to access a resource under the current users permissions set; the resource should be able to relay the same user credentials (or a subset of the credentials) to other resources respective to the chain of access.

Integration with local resource specific security solutions

Each resource and hosting has specific security requirements and security solutions that match the local environment. This may include (for example) Kerberos security methods, Windows security methods, Linux security methods, and UNIX security methods. Therefore, in order to provide proper security in the grid fabric model, all grid solutions must provide integration with the local environment and respective resources specifically engaged by the security solution mechanisms.

User-based trust relationships

In Grid Computing, establishing an absolute trust relationship between users and multiple service providers is very critical. This accomplishes the environmental factor to which there is then no need of interaction among the providers to access the resources that each of them provide.

Data security

The data security topic is important in order to provide data integrity and confidentiality. The data passing through the Grid Computing solution, no matter what complications may exist, should be made secure using various cryptographic and data encryption mechanisms. These mechanisms are well known in the prior technological art, across all global industries.

Resource Layer: Sharing of a Single Resource

The Resource layer utilizes the communication and security protocols defined by the networking communications layer, to control the secure negotiation, initiation, monitoring, metering, accounting, and payment involving the sharing of operations across individual resources .

The way this works is the Resource layer calls the Fabric layer functions in order to access and control the multitude of local resources. This layer only handles the individual resources and, hence, ignores the global state and atomic actions across the other resource collection, which in the operational context is the responsibility of the Collective layer.

There are two primary classes of resource layer protocols. These protocols are key to the operations and integrity of any single resource. These protocols are as follows:

Information Protocols

These protocols are used to get information about the structure and the operational state of a single resource, including configuration, usage policies, service-level agreements, and the state of the resource. In most situations, this information is used to monitor the resource capabilities and availability constraints.

Management Protocols

The important functionalities provided by the management protocols are:

  • Negotiating access to a shared resource is paramount. These negotiations can include the requirements on quality of service, advanced reservation, scheduling, and other key operational factors.

  • Performing operation(s) on the resource, such as process creation or data access, is also a very important operational factor.

  • Acting as the service/resource policy enforcement point for policy validation between a user and resource is critical to the integrity of the operations.

  • Providing accounting and payment management functions on resource sharing is mandatory.

  • Monitoring the status of an operation, controlling the operation including terminating the operation, and providing asynchronous notifications on operation status, is extremely critical to the operational state of integrity.

It is recommended that these resource-level protocols should be minimal from a functional overhead point of view and they should focus on the functionality each provides from a utility aspect.

The Collective Layer: Coordinating Multiple Resources

While the Resource layer manages an individual resource, the Collective layer is responsible for all global resource management and interaction with a collection of resources. This layer of protocol implements a wide variety of sharing behaviors (protocols) utilizing a small number of Resource layer and Connectivity layer protocols.

Some key examples of the common, more visible collective services in a Grid Computing system are as follows:

Discovery Services

This enables the virtual organization participants to discover the existence and/or properties of that specific available virtual organization's resources.

Coallocation, Scheduling, and Brokering Services

These services allow virtual organization participants to request the allocation of one or more resources for a specific task, during a specific period of time, and to schedule those tasks on the appropriate resources.

Monitoring and Diagnostic Services

These services afford the virtual organizations resource failure recovery capabilities, monitoring of the networking and device services, and diagnostic services that include common event logging and intrusion detection. Another important aspect of this topic relates to the partial failure of any portion of a Grid Computing environment, in that it is critical to understand any and all business impacts related to this partial failure are well known, immediately, as the failure begins to occur ”all the way through its corrective healing stages.

Data Replication Services

These services support the management aspects of the virtual organization's storage resources in order to maximize data access performance with respect to response time, reliability, and costs.

Grid-Enabled Programming Systems

These systems allow familiar programming models to be utilized in the Grid Computing environments, while sustaining various Grid Computing networking services. These networking services are integral to the environment in order to address resource discovery, resource allocation, problem resolution, event correlation, network provisioning, and other very critical operational concerns related to the grid networks.

Workload Management Systems and Collaborative Frameworks

This provides multistep , asynchronous, multicomponent workflow management. This is a complex topic across several dimensions, yet a fundamental area of concern for enabling optimal performance and functional integrity.

Software Discovery Services

This provides the mechanisms to discover and select the best software implementation(s) available in the grid environment, and those available to the platform based on the problem being solved .

Community Authorization Servers

These servers control resource access by enforcing community utilization policies and providing these respective access capabilities by acting as policy enforcement agents .

Community Accounting and Payment Services

These services provide resource utilization metrics, while at the same time generating payment requirements for members of any community.

As we can observe based on the previous discussion, the capabilities and efficiencies of these Collective layer services are based on the underlying layers of the protocol stack. These collective networking services can be defined as general-purpose Grid Computing solutions to narrowed-domain and application-specific solutions. As an example, one such service is accounting and payment, which is most often very specific to the domain or application. Other notable and very specialized Collective layer services include schedulers , resource brokers , and workload managers (to name a few).

Application Layer: User-Defined Grid Applications

These are user applications, which are constructed by utilizing the services defined at each lower layer. Such an application can directly access the resource, or can access the resource through the Collective Service interface APIs (Application Provider Interface).

Each layer in the grid architecture provides a set of APIs and SDKs (software developer kits) for the higher layers of integration. It is up to the application developers whether they should use the collective services for general-purpose discovery, and other high-level services across a set of resources, or if they choose to start directly working with the exposed resources. These user-defined grid applications are (in most cases) domain specific and provide specific solutions.

Grid Architecture and Relationship to Other Distributed Technologies

It is a known fact that in the technology of art that there are numerous well-defined and well-established technologies and standards developed for distributed computing. This foundation has been a huge success (to some extent) until we entered into the domain of heterogeneous resource sharing and the formation of virtual organizations.

Based on our previous discussions, grid architectures are defined as a coordinated, highly automated, and dynamic sharing of resources for a virtual organization. It is appropriate that we turn our attention at this stage toward the discussion regarding how these architecture approaches differ from the prior art of distributed technologies, that is, how the two approaches compliment each other, and how we can leverage the best practices from both approaches.

Our discussion will now begin to explore notions of the widely implemented distributed systems, including World Wide Web environments, application and storage service providers, distributed computing systems, peer-to-peer computing systems, and clustering types of systems.

World Wide Web

A number of open and ubiquitous technologies are defined for the World Wide Web (TCP, HTTP, SOAP, XML) that in turn makes the Web a suitable candidate for the construction of the virtual organizations. However, as of now, the Web is defined as a browser “server messaging exchange model, and lacks the more complex interaction models required for a realistic virtual organization.

As an example, some of these areas of concern include single-sign-on, delegation of authority, complex authentication mechanisms, and event correlation mechanisms. Once this browser-to-server interaction matures, the Web will be suitable for the construction of grid portals to support multiple virtual organizations. This will be possible because the basic platforms, fabric layers, and networking connectivity layers of technologies will remain the same.

Distributed Computing Systems

The major distributed technologies including CORBA, J2EE, and DCOM are well suited for distributed computing applications; however, these do not provide a suitable platform for sharing of resources among the members of the virtual organization. Some of the notable drawbacks include resource discovery across virtual participants, collaborative and declarative security, dynamic construction of a virtual organization, and the scale factor involved in potential resource-sharing environments.

Another major drawback in distributed computing systems involves the lack of interoperability among these technology protocols. However, even with these perceived drawbacks, some of these distributed technologies have attracted considerable Grid Computing research attention toward the construction of grid systems, the most notable of which is Java JINI. [1] This system, JINI, is focused on a platform-independent infrastructure to deliver services and mobile code in order to enable easier interaction with clients through service discovery, negotiation, and leasing.

Application and Storage Service Providers

Application and storage service providers normally outsource their business and scientific applications and services, as well as very high-speed storage solutions, to customers outside their organizations. Customers negotiate with these highly effective service providers on QoS requirements (i.e., hardware, software, and network combinations) and pricing (i.e., utility-based, fixed, or other pricing options).

Normally speaking, these types of advanced services arrangements are executed over some type of virtual private network (VPN), or dedicated line, by narrowing the domain of security and event interactions. This is oftentimes somewhat limited in scope, while the VPN or private line is very static in nature. This, in turn, reduces the visibility of the service provider to a lower and fixed scale, with the lack of complex resource sharing among heterogeneous systems and interdomain networking service interactions.

This being said, the introduction of the Grid Computing principles related to resource sharing across virtual organizations, along with the construction of virtual organizations yielding interdomain participation, will alter this situation. Specifically, this will enhance this utility model of application service providers and storage service providers (ASP/SSP) to a more flexible and mature value proposition.

Peer-to-Peer Computing Systems

Similar to Grid Computing, peer-to-peer (P2P) computing is a relatively new computing discipline in the realm of distributed computing. Both P2P and distributed computing are focused on resource sharing, and are now widely utilized throughout the world by home, commercial, and scientific markets. Some of the major P2P systems are SETI@home [2] and file sharing system environments (e.g., Napster, Kazaa, Morpheus, and Gnutella).

The major difference between Grid Computing and P2P computing is centered on the following notable points:

  1. They differ in their target communities. Grid communities can be small with regard to number of users, yet will yield a greater applications focus with a higher level of security requirements and application integrity. On the other hand, the P2P systems define collaboration among a larger number of individuals and/or organizations, with a limited set of security requirements and a less complex resource-sharing topology.

  2. The grid systems deal with more complex, more powerful, more diverse, and a highly interconnected set of resources than that of the P2P environments.

The convergence of these areas toward Grid Computing is highly probable since each of the disciplines are dealing with the same problem of resource sharing among the participants in a virtual organization. There has been some work, to date, in the Global Grid Forum (GGF) focused on the merger of these complimentary technologies for the interests of integrating the larger audience.

Cluster Computing

Clusters are local to the domain and constructed to solve inadequate computing power. It is related to the pooling of computational resources to provide more computing power by parallel execution of the workload. Clusters are limited in scope with dedicated functionality and local to the domain, and are not suitable for resource sharing among participants from different domains. The nodes in a cluster are centrally controlled and the cluster manager is aware of the state of the node. This forms only a subset of the grid principle of more widely available, intra/interdomain, communication, and resource sharing.



Grid Computing (IBM Press On Demand Series)
Windows Vista(TM) Plain & Simple (Bpg-Plain & Simple)
ISBN: 131456601
EAN: 2147483647
Year: 2002
Pages: 118

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net