|< Day Day Up >|| |
Services is a term that you will hear used quite often in Oracle Database 10g, and in the realm of high availability. While the concept of service, or a service, is nothing new to the computing world, the concept of services in Oracle Database 10g has been completely redefined. So, what exactly are services?
Services can best be thought of as being associated with an application on the front end that needs to connect to the database on the back end. In the grid computing world, customers (end users) do not care where the application is going when they run it. They do not think in terms of the database at the back end, behind it all. Customers think of computing in terms of services, or applications at the front end-e-mail, calendaring applications, order entry, accounting, reporting, and so on. To the customer at the front end, it matters not where the power behind the scenes comes from. Just as a utility customer who plugs in an appliance does not care where or how the electricity comes in, an application user does not care where or how the data gets there. Utility customers do not think of appliances in terms of the power source; they think of appliances in terms of appliances-a hair dryer and a Nintendo are completely different in the mind of the consumer, regardless of the fact that they both get plugged into the same grid. From the grid perspective, all that matters is that the electricity is there when it is asked for.
By the same token, the application user cares not where the data that they need comes from. All that matters is that when the application gets 'plugged in' to the database, it gets what it requires-service, in the form of data. Services, therefore, are a way for the DBA to think in terms of who is plugging into the database. In a general sense, services are essentially associated with an application that a customer may be using in your environment-connecting to, or plugging in to, a database in a grid at the back end. In a more specific sense, however, services are defined as an abstract way to group logical workloads. Services should be representative of a workload generated by a group of consumers that have something in common-primarily they are using the same functionality within the database, and they have the same needs in regard to availability and priority within the database. In addition, they generally use a similar amount of resources. You might also think of services in terms of a grouping of clients accessing the same or similar database objects, perhaps within the same schema.
By this definition, it is probably the simplest and most helpful to think of a service in terms of the application itself, as opposed to thinking of a service in terms of the database. The application connects to a service, which is defined within the database grid. A service gives the HA DBA the ability to isolate workloads and manage them independently. This is more important than ever, in an era of consolidation and centralized computing. More and more, applications are being consolidated to run against a single back-end database that is part of a highly available, clustered environment. Services are a crucial part of this architecture, as they enable this isolation within the database, and allow for individual prioritization and monitoring. A service can be enabled or disabled based on the needs of the consumers of the service and the need to do maintenance on all or a portion of the database. For example, by isolating services from each other, it is possible to do application-specific maintenance on a schema associated with the service without affecting other services/applications. Simply disable the service for which the maintenance is scheduled, and then reenable it once the maintenance is completed.
Aside from isolating different applications and workloads from one another, the service definition within the database grid determines which nodes and/or instances the service (client application) can run on. In the event of a failure, Oracle relocates services (client applications) based again on the service definition, which defines which nodes the service is allowed to run on if the primary service has failed. All of this is irrelevant and transparent to the user/consumer of the service. The HA DBA, on the other hand, has the power to determine where these services run, their priority, and how they are handled in the event of a failure.
So, how is this managed on the back end? We see now that a service is viewed by the user as a front-end application. But how does the HA DBA make sense out of this? How is this controlled from the database perspective? Well-the answer is that there are several pieces to that puzzle. At the most basic level, services are defined at the database level via the parameter SERVICE_NAMES = My comma delimited list of service names. With this parameter, the HA DBA can define various connection types into the database, at the instance level, that are associated at the client end with various different applications. For example, a given instance in a database cluster may have the following SERVICE_NAMES parameter defined,
SERVICE_NAMES=payroll, accounting, reporting, oltp
while another instance in the same database cluster may have a different value for SERVICE_NAMES defined:
Thus, clients connecting via the payroll or OLTP service will be able to connect to either node, depending on its availability, while the accounting and reporting clients will only be able to connect to the first instance. This gives the HA DBA the flexibility to segment different portions of the user population across different instances. In addition, we are prioritizing services by saying that the payroll and OLTP services are more critical and less tolerant of failure, so these services need to be able to run on either node in the cluster. Obviously, the more instances existing in the cluster, the more flexibility you will have.
As you can see, one way of prioritizing services is to define which services get the most/best service by granting certain applications the ability to run on more than one node at any given time. For example, suppose you have a three-node cluster. Based on your business needs and the resources at your disposal, you may decide that the payroll and OLTP services can run on any of the three nodes, the accounting service can run on Node1 or Node2, and the reporting service can only run on Node3. When all three nodes are functioning correctly, all of these applications are running correctly and will have access to their defined node. However, in the event of a failure of one of the nodes, it is only the OLTP and payroll services that are still guaranteed to have access to both remaining instances. The accounting service will have access to at least one remaining instance, but there is no guarantee that the reporting service would still have access.
By defining it this way, the HA DBA is essentially saying that OLTP and payroll have higher priority than the other services. Should Node1, Node2, or Node3 go down, the OLTP and payroll services would still have access to the two remaining nodes, but we will not necessarily want all applications running against these remaining two nodes. Instead, we only want our highest priority applications running-that is, the applications that have the greatest requirements in our business environment for high availability. By giving the reporting service access to only one of the nodes, we are saying that the priority for that service is not as high. In the event of a failure of Node3, we do not want that service running on the other nodes, as the remaining two nodes are going to be more heavily loaded than they would be otherwise. If Node1 or Node2 fails, the reporting service could easily be disabled on Node3. This helps to ensure not only that these applications are running, but also that there is enough capacity to handle the load until such time as the failed node can be repaired or a new node brought back online.
In the preceding example, it is easy enough to see that, should a node fail, the OLTP and payroll services are guaranteed to have access to one of the remaining nodes. However, as we explained, there may be an undue load placed on those remaining nodes. Suppose that the surviving nodes are Node1 and Node3. These are nodes that we have also defined as being available for the reports and accounting services. Now, all of the services are still accessible, which is a good thing. However, all services are running on two nodes now instead of three. Ideally, this will have been planned out such that the nodes with the highest number of services assigned are also the nodes that are the most robust-that is, have the greatest capacity. However, this may not always be the case. Therefore, this could impact our most important services-namely, the payroll and OLTP services. As we mentioned, we could easily disable the reporting service for a period of time, but that is a manual operation.
In this regard, Resource Manager can be used at the service level to define priorities for a given service. In prior releases, Resource Manager was used primarily at the session level, but with Oracle Database 10g, consumer groups can be defined for a given service so that services such as OLTP and payroll can be given a higher priority than services such as accounting and reporting. This can be done via Enterprise Manager, as we will discuss later in this chapter.
Resource Manager can intelligently manage resources such that when a machine is at full utilization of a given resource, certain groups/services are limited in how much of that resource they can utilize, based on the consumer group definition to which they are mapped. However, when the machine is not fully utilized, the database (knowing that there is excess capacity available) intelligently allows groups to consume more than their quota, because the capacity is there. In our earlier example, assume that the reporting service was mapped to a consumer group that allots it 10 percent of the total CPU. Thus, if the machine is 100-percent utilized, clients connecting to the reporting service will only be allotted 10 percent of the CPU, overall, while the remaining services are allowed to use 90 percent of the CPU. However, at times when the machine is not fully utilized (meaning the remaining services are not using their allotted 90 percent), the reporting service is allowed more than 10 percent of CPU, if needed, since the excess capacity is there. Therefore, at times when all three instances in our three-node cluster are running, the reporting service will most likely be able to run unfettered. However, if a node fails, leaving the remaining nodes running at higher loads than normal, the limits applied through Resource Manager will kick in automatically.
|< Day Day Up >|| |