Organizations Building and Using Grid-Based Solutions to Solve Computing, Data, and Network Requirements

     

These organizations and individuals are the real users of Grid Computing. They are benefiting from resource sharing and virtualization. As of now these projects are mostly in the scientific areas. We will be discussing some of the major grid projects and infrastructures around the world. In general, these grid users need:

  • On-demand construction of virtual computing system with the capabilities to solve the problems at hand including scarcity of computing power, data storage, and real-time processing

  • A provision for collaborative visualization of the results of the above process

  • A dynamic construction of virtual organizations to solve certain specific problems at hand

United States Department of Energy: Science Grid (DOE)

The DOE Science Grid [13] aims to provide an advanced distributed computing infrastructure based on Grid Computing middleware and tools to enable a high degree of scalability in scientific computing. The vision is to revolutionize the use of computing in science by making the construction and use of large-scale systems of diverse resources as easy as using today's desktop environments.

The following describes characteristics of DOE:

  • Most of the DOE projects are widely distributed among collaborators and non-collaborators. It requires a cyberinfrastructure that supports the process of distributed science with sharable resources including expensive and complex scientific instruments.

  • All of the science areas need high-speed networks and advanced middleware to discover, manage, and access computing and storage systems.

The DOE Science Grid is an integrated and advanced infrastructure that delivers:

  • Computing capacity adequate for the tasks in hand

  • Data capacity sufficient for scientific tasks with location independence and manageability

  • Communication power sufficient for the above tasks

  • Software services with rich environments that let scientists focus on the science simulation and analysis aspects rather than on management of computing, data, and communication resources

The construction of grids across five major DOE facilities provides the computing and data resources. To date major accomplishments include the following:

  • Integration of DOE's Office of Science supercomputing center providing large-scale storage systems into the grid

  • Design and deployment of a grid security infrastructure for collaboration with U.S. and European High Energy Physics projects, helping to create a single-sign-on solution within the grid environment

The following work is used by the DOE's Particle Physics Data Grid, Earth Systems Grid, and Fusion Grid projects:

  • A resource monitoring and debugging infrastructure for managing these widely distributed resources

  • Several DOE applications use this grid infrastructure including computational chemistry , ground water transport, climate modeling, bio informatics, and so on.

European Union: EUROGRID Project

The EUROGRID [14] project is a shared-cost Research and Technology Development project (RTD) granted by the European Commission, with the participation of 11 partners and 6 European Union countries , in order to create an international network of high performance computing centers. This project will demonstrate the use of GRIDs in selected scientific and industrial communities in order to address the specific requirements of these communities, and highlight the benefits of using GRIDs.

The major objectives of the EUROGRID project are:

  • To establish a European GRID network of leading high performance computing centers from different European countries

  • To operate and support the EUROGRID software infrastructure

  • To develop important GRID software components and to integrate them into EUROGRID (fast file transfer, resource broker, interface for coupled applications, and interactive access)

  • To demonstrate distributed simulation codes from different application areas ( biomolecular simulations, weather prediction, coupled CAE simulations, structural analysis, real-time data processing, etc.)

  • To contribute to the international GRID development and work with the leading international GRID projects

The application-specific work packages identified for the EUROGRID project are described in the following areas:

Bio Grid. The BioGRID project develops interfaces to enable chemists and biologists to submit work to high performance center facilities via a uniform interface from their workstations, without having to worry about the details of how to run particular packages on different architectures.

Metro Grid. The main goal of the Metro Grid project is the development of an application service provider (ASP) solution, which allows anyone to run a high resolution numerical weather prediction model on demand.

Computer-Aided Engineering (CAE) Grid. This work project focuses on industrial CAE applications including automobile and aerospace industries. It aims at providing services to high performance computing (HPC) customers who require huge computing power to solve their engineering problems.

The major partners in this work package are Debis SystemHaus and EADS Corporate Research Center. They are working to exploit the CAE features like code coupling (to improve system design by reducing the prototyping and testing costs) and ASP-type services (designing application-specific user interfaces for job submission).

High Performance Center (HPC) Research Grid. This HPC research grid is used as a test-bed for the development of distributed applications, and as an arena for cooperative work among major scientific challenges, using computational resources distributed on a European scale. The major partners in this work-package are the HPC centers.

The EUROGRID software is based on the UNICORE system developed and used by the leading German HPC centers.

European Union: Data Grid Project

DataGrid [15] is a project funded by the European Union that aims to enable access to geographically distributed computing power and storage facilities belonging to different institutions. This will provide the necessary resources to process huge amounts of data coming from scientific experiments in different disciplines.

The three real data- intensive computing applications areas covered by the project are:

  • High Energy Physics

  • Biology and Medical Image Processing

  • Earth Observations

High Energy Physics (led by CERN, Switzerland)

One of the main challenges for High Energy Physics is to answer longstanding questions about the fundamental particles of matter and the forces acting between them. In particular, the goal is to explain why some particles are much heavier than others, and why particles have mass at all . To that end, CERN is building the Large Hadron Collider ( LHS ) , one of the most powerful particle accelerators.

The search on LHS will generate huge amounts of data. The DataGrid Project is providing the solution for storing and processing this data. A multitiered, hierarchical computing model will be adopted to share data and computing power among multiple institutions. The Tier-0 center is located at CERN and is linked by high-speed networks to approximately 10 major Tier-1 data-processing centers. These will fan out the data to a large number of smaller ones ( Tier -2).

Biology and Medical Image Processing (led by CNRS, France)

The storage and exploitation of genomes and the huge flux of data coming from post-genomics puts growing pressure on computing and storage resources within existing physical laboratories. Medical images are currently distributed over medical image production sites ( radiology departments, hospitals ).

Although there is a need today, as there is no standard for sharing data between sites, there is an increasing need for remote medical data access and processing.

The DataGrid project's biology test-bed is providing the platform for the development of new algorithms on data mining, databases, code management, and graphical interface tools. It is facilitating the sharing of genomic and medical imaging databases for the benefit of international cooperation and health care.

Earth Observations (led by ESA/ESRIN, Italy)

The European Space Agency missions download 100 gigabytes of raw images per day from space. Dedicated ground infrastructures have been set up to handle the data produced by instruments onboard the satellites . The analysis of atmospheric ozone data has been selected as a specific test-bed for the DataGrid. Moreover, the project will demonstrate an improved way to access and process large volumes of data stored in distributed European-wide archives.

TeraGrid

The TeraGrid [16] project was first launched by the NSF and was a multiyear effort to build and deploy the world's largest, fastest distributed infrastructure for open scientific research. The TeraGrid includes 20 teraflops [17] of computing power distributed at five sites, facilities capable of managing and storing nearly 1 petabyte of data, high-resolution visualization environments, and toolkits for Grid Computing. These components will be tightly integrated and connected through a network that will operate at 40 gigabits per second ”this is the fastest research network on the planet today.

The major objective of this project includes creation of a high-speed network; grid services that provide data sharing, computing power, and collaborative visualization; and to provide facilities that create the technology requirements (e.g., data storage, bandwidth, etc.).

The five sites in the project are:

  • National Center for Supercomputing Applications (NCSA) at the University of Illinois

  • San Diego Supercomputer Center (SDSC) at the University of California

  • Argonne National Laboratory in Argonne, Illinois

  • Center for Advanced Computing Research (CACR) at the California Institute of Technology in Pasadena

  • Pittsburgh Supercomputer Center (PSC)

The TeraGrid project is sometimes called a "cyberinfrastructure" that brings together distributed scientific instruments, terascale and petascale data archives, and gigabit networks. Figure 2.8 shows different layers of the TeraGrid architecture.

Figure 2.8. TeraGrid architecture.

graphics/02fig08.gif

Base Grid Services Layer (Resource Layer)

Some of the base services required for the TeraGrid are authentication and access management, resource allocation and management, data access and management, resource information service, and accounting. This layer forms the building block for the other high-level services.

Core Grid Services (Collective Layer)

With a main focus on coordination of multiple resources, core grid services include functionalities for data movement, job scheduling, monitoring, and resource discovery.

Advanced Grid Services

These are high-level application services, which provide super schedulers , repositories, categorization, resource discovery, and distributed accounting.

Based on the above architecture, the TeraGrid is defining protocols, schema, and interfaces at each layer of the above architecture but not implementation-specific details. These interfaces provide interoperability between the sites implementing the TeraGrid project.

NASA Information Power Grid (IPG)

NASA's Information Power Grid [18] (IPG) is a high-performance computational and data grid. Grid users can access widely distributed heterogeneous resources from any location, with IPG middleware adding security, uniformity , and control.

Some of the major projects undertaken by IPG are:

Resource Broker

A grid user has to make a resource selection from a large number and variety of resources that they could use for an application. For each potential resource, the resource selection system considers the following factors:

  • Computer system characteristics, such as amount of memory, amount of disk space, CPU speed, number of CPUs, type of operating system, available software, and so on

  • The time required for the execution of the job

  • The cost to use that resource or computer system

Performance Prediction

There are several types of predictions that are useful when deciding where to run applications. These include job/application execution time on different computer systems, wait time in scheduling queues before the job begins executing, and the time to transfer files between computer systems.

Job Manager

Job Manager is used to reliably execute jobs and maintain information about jobs. These jobs consist of file operations (i.e., copy a file between machines, create a directory, delete a file or directory, and so on) and execution operations (i.e., execute an application on a specific computer system).

Portability Manager (PM)

Portability is a key issue with the grid environment and PM is responsible for the establishment of a suitable environment for the execution of the user application by automatically identifying the dependencies of each user program.

Framework for Control and Observation in Distributed Environments (CODE)

The CODE project provides a secure, scalable, and extensible framework for making observations on remote computer systems. It then transmits this observational data to where it is needed, performing actions on remote computer systems and analyzing observational data to determine what actions should be taken. Observational data is transmitted using a distributed event service.

Test and Monitoring Service

The IPG Test and Monitoring Service will provide a framework for examining the health of the grid, so that problems with, or degradation of, grid resources are promptly detected ; the appropriate organization, system administrator, or user is notified; and solutions are dispatched in a timely manner.

Dynamic Accounting System (DAS)

DAS provides the following enhanced categories of accounting functionality to the IPG community:

  • Allows a grid user to request access to a local resource via the presentation of grid credentials

  • Determines and grants the appropriate authorizations for a user to access a local resource without requiring a preexisting account on the resource to govern local authorizations

  • Exchanges allocation data between sites to manage allocations in a grid-wide manner instead of a site-specific manner

  • Provides resource pricing information on the grid

  • Collects and reports the necessary data to ensure accountability of grid users for the use of resources and to enable resource providers to better manage their grid resources

CORBA-IPG Infrastructure

The CORBA-IPG infrastructure gives CORBA-enabled applications, such as object-oriented propulsion systems being developed at NASA Glenn Research Center, the ability to utilize the widely distributed resources made available by the NASA IPG.



Grid Computing (IBM Press On Demand Series)
Windows Vista(TM) Plain & Simple (Bpg-Plain & Simple)
ISBN: 131456601
EAN: 2147483647
Year: 2002
Pages: 118

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net