The N1 Grid vision provides resources for services in a efficient and flexible manner. In many ways, the grid technologies that have been used over the last decade are in line with the N1 Grid vision for the data center. Grid-based computing has been around for a number of years and has grown quite popular. In fact, it is measurably more successful than UC in the number of implementations, and it performs some impressive tasks. The N1 Grid software can be used to enable grid computing, just as it can be used to enable other business services, such as web services and data warehouses. This section discusses the relationship between what is historically known as the grid and the N1 Grid.
A grid is a collection of computers that are available to perform various tasks as part of an autonomous system. The grid has a workload management system and specially written applications that enable it to parse out tasks and gather the results. This distributed system enables careful control over the quality of a service, enabling grids to perform critical functions. Grids can span hundreds or thousands of nodes and require little ongoing care. A node can leave the system and re-enter it with little effect. The central workload management system simply stops forwarding tasks to the node if it is not active. Grid computing usually includes geographically dispersed systems, often involving several different platforms and vendors. Systems might, or might not, be dedicated to the grid. They might be running other applications and, based on usage, enabled to run grid applications based on utilization policies.
One of these grid implementations is Sun's own compute farm, appropriately called "the ranch." The ranch enables UltraSPARC® processor engineers to harness the power of tens of thousands of CPUs across three Sun data centers to run compute-intensive processes for UltraSPARC chip design and verification. These systems have an amazing utilization rate of 98 percent (365 days a year). Other examples of grid systems include weather forecasting and computer graphics generation. SETI@Home is an implementation of grid technologies and is an example of grid-based peer-to-peer computing working to accomplish a daunting task.
The advantages of a grid should be obvious. In fact, grids provide similar benefits to UC, but with less complexity. The challenge is finding the right applications for the grid. Not all applications fit into a grid model at least as grid is defined today. The grid usually requires middleware that enables compute tasks to be run in a distributed manner. An example of middleware is the Globus Toolkit. The middleware handles authentication and authorization, as well as service and workload management.
The N1 Grid Engine provides automated resource management (distributed resource management, or DRM) and works along with Sun hardware and the middleware toolkits. DRM provides the interface layer or container for the various management tools, such as the Sun MC software, the computing portal interfaces, and the Globus framework. DRM runs on top of an operating system such as the Solaris OS. FIGURE 11-2 shows the N1 Grid Engine architecture.
Figure 11-2. N1 Grid Engine Architecture
To use a grid system, applications need to be rewritten so that they are "grid-aware." Typically this includes using a toolkit such as Globus to enable integration of the distributed computing APIs in Java, C, or C++.
Sun Systems and the Grid
No two grids are alike, and one size does not fit all situations. There are three key classes of grids that scale from single systems to supercomputer-class compute farms that utilize thousands of processors:
The N1 Grid Engine software provides the power and flexibility required for cluster grids. N1 Grid Engine software orchestrates the delivery of computational power based on enterprise resource policies set by the organization's technical and management staff. The system uses these policies to examine the available computational resources within the campus grid, gather these resources, and then allocate and deliver them automatically in a way that optimizes usage across the grid.
To enable cooperation within the campus grid, project owners need to negotiate policies, have flexibility in the policies for manual overrides for unique project requirements, and have the policies automatically monitored and enforced.
FIGURE 11-3 shows how a grid system works. A user submits some type of batch job into the queue. The system can consult various prioritization policies to determine the run level or priority of the job. Resources are found, scheduled according to the grid engine software, and executed. Results are sent back to the user or parent process.
Figure 11-3. Grid Engine Software
A grid system is very much aligned with the goals of the N1 Grid. It enables the system to maintain its resources in a way that requires little administration. A few administrators can manage the Sun "ranch," which normally might take tens if not hundreds of administrators. Server utilization is not a problem. The grid "control plane" ensures that each system is adequately utilized.
Today, the grid does not fit every type of application or service. It is very much a batch-oriented system that favors large jobs that require large amounts of CPU resources, storage, and I/O. Large business applications, like ERP, are not suitable for the grid. However, changes are underway. Oracle is moving many of its applications toward grid-like resource management, where workload clustering is not necessary to increase scalability of the database service.
Web Services and the Grid
Web services are becoming popular enterprise standards for applications. They enable a standards-based approach to accessing networked applications. The grid architecture has previously not handled web-based applications, such as those defined as web services.
A proposed set of standards, called Open Grid Service Architecture (OGSA), will introduce web services into the grid architecture. OGSA builds on the Globus Toolkit. It enables the grid to handle web services by using standard web service interface descriptions, along with grid-based directory, registration, and discovery. OGSA has the potential to change the deployment model of web services across the industry. Some of built-in resource management features in the grid might simplify the ongoing resource management concerns, which is in alignment with the N1 Grid vision. But even without OGSA, grids are very powerful and are being managed by the N1 Grid software today.
Grid Applications Using the N1 Grid Software
Grid systems require various tools and agents to be installed to provide the grid capabilities. Operating systems, across various platforms, are also necessary. In addition, the network and storage must be configured. The N1 Grid PS software can be used to provision the operating system, the network, and the storage for grid systems. The Solaris 9 OS Resource Manager and N1 Grid Containers can be used to provide resource metering and control and to enable grid systems to be scheduled or "dialed up or down" based on system needs. The N1 Grid SPS can also be used to update grid applications and to handle any other end-to-end provisioning tasks necessary for the grid to be enabled. As with any new technology, pilot projects provide excellent opportunities to test drive the technology. The ultimate goal for all of these technologies is to simplify, reduce cost, and provide high flexibility.
Grids and Policies
The N1 Grid Engine Enterprise Edition (N1 Grid EEE) software is an excellent example of how policy has emerged within the data center. The N1 Grid EEE uses policies to determine how to deliver resources to jobs so that highly optimized resource usage is achieved across the campus grid. The policy module makes the N1 Grid EEE substantially different than N1 Grid Engine and other similar competitive products because they schedule jobs based on priority alone. The N1 Grid EEE software policy module measures the resources used over a period of time (week, month, or quarter), compares the usage to the amount of resources assigned in the policies, and adjusts priorities accordingly.
To enable cooperation within the campus grid, project owners using the grid need to have several capabilities:
The N1 Grid EEE provides these capabilities. For example, the N1 Grid EEE enables administrators to allocate compute cycles to a project with an immediate deadline, where a project with a lower priority might get less resources.
Policies are determined by the particular needs of the organization at any given moment and are implemented by the system administrator. The N1 Grid EEE provides four policy systems that are described in "Policy-Based Computing" on page 236. These four policies enable an organization to create and enforce resource usage policies that fit the organization's needs and culture. Each organization can decide the level of enforcement required to protect each user.
The N1 Grid EEE policy management automatically controls the use of shared resources in the cluster to best achieve the goals of the administration. High-priority jobs are dispatched preferentially and receive better access to resources. The administration of an N1 Grid EEE cluster can define high-level utilization policies. Along with the routine policies, jobs can be submitted with an initiation deadline. Deadline jobs perturb routine scheduling. Administrators can also temporarily override share-based, functional, and initiation deadline scheduling. An override can be applied to an individual job or to all jobs associated with a user, a department, a project, or a job class.