Autonomic computing | High-Volume Web Sites Team - More about High-Volume Web Sites

< Day Day Up >

Autonomic computing was conceived to lessen the spiraling demands for skilled IT resources, reduce complexity and drive computing into a new era that may better exploit its potential to support higher order thinking and decision making. Immediate benefits will include reduced dependence on human intervention to maintain complex systems accompanied by a substantial decrease in costs. Long-term benefits will allow individuals, organizations, and businesses to collaborate on complex problem solving.

Fully autonomic computing systems will be:

Self-configuring: Able to adapt to dynamically changing environments

Self-healing: Able to discover, diagnose, and act to prevent disruptions

Self-optimizing: Able to tune resources and balance workloads to maximize use of IT resources

Self-protecting: Able to anticipate, detect, identify, and protect against attacks

When you implement autonomic systems, you are free to focus on more strategic and higher-level issues. For the business, the core benefits of autonomic computing are improved resiliency, ability to deploy new capabilities more rapidly, and increased return from IT investments.

Autonomic computing capabilities play a critical role in the development of grid computing. Grids can become the most complex computing environments available. Autonomic computing will allow grids to be easily managed and to ensure that they deliver the levels and quality of service demanded by businesses.

Learn more

Autonomic computing overview and resources (Collection of resources) www.ibm.com/developerworks/tivoli/autonomic/library/1016/1016_autonomic.html
Autonomic computing (Web site) www.ibm.com/autonomic/index.shtml
Tivoli: Autonomic computing (Web site) www.ibm.com/software/tivoli/features/oct2002/autonomic.html
Administration made easier: Scheduling and automation in DB2 Universal Database (Article, in PDF)

Fundamental to autonomic computing is what is known as the autonomic control loop. The goal is the ability to monitor current workload, analyze it versus historical trends and existing computing power, plan the reallocation of those resources, and execute the movement of workload or resources to maximize computing responsiveness to meet service level objectives.

The monitor function implements mechanisms that collect, aggregate, filter, and report details (metrics, topology, etc.) about the resource. The data may be sampled, as with performance metrics, or they may be unsolicited, as with event data. In addition, the monitor function can aggregate data from various external and internal sources, to enable it to produce more meaningful, and higher level, information.

The analyze function implements mechanisms that model complex situations. The analyze function may provide some learning capability, which will in turn allow it to better anticipate future situations. Examples of the analyze function include problem determination, workload forecasting, policy analysis, and event correlation.

The plan function provides a way to coordinate interrelated actions over time. A plan is constructed to achieve a goal, especially, goals related to service level agreements. The goal describes a desired state, for example, browse response time is less than 1.2 seconds. The plan specifies the actions to take and the sequence in which they should be invoked to move from the current state to the desired state. The plan function must also be aware of how long the plans it generates takes to enact and when they have to be completed.

For example, in an automatic order-processing environment, consider a situation where, 15 minutes before the end of the day, more than 2% of the orders initiated on that day have not yet been processed. In this case, the plan function could reasonably decide to provision extra elements of various types in an attempt to ensure that the outstanding orders are processed in time.

Today, this is most frequently embedded in automation scripts. For example, restart sequences are essentially a plan that takes into account the dependency of services on one another. Further, plan information is important in software distribution to properly sequence installs based on prerequisites, corequisites, and exrequisites. There is also plan functionality related to how software is sent between distribution nodes to account for constraints in network bandwidth.

One way to think of a plan is as a workflow. That is, a plan specifies a partial order of actions (for example, start an HTTP process), possibly with entry and exit conditions. This is useful in restart sequences since, for example, one should not restart a servlet until the servlet engine is running, which in turn requires a running JVM.

The execute function interprets plans and interacts with the element effectors to ensure that the appropriate actions occur. For example, if a plan is represented by a workflow, the execute function determines which actions in the workflow graph can be executed (for example, their inputs are satisfied) and then chooses the best one to perform. In the case of software distribution, the inputs are install dependencies and best may relate to balancing network loads and/or minimizing node down times.

Knowledge: Autonomic computing functions require common knowledge to work in a coordinated way. Examples of this knowledge may include the following:

State of the managed element and management actions; for example, in the case of a printer, state may be whether it is online or offline; in the case of a Web server, state may be the number of connections currently held
Dependencies between entities being managed; examples include: service dependencies (for example, an HTTP service requires a name service); install dependencies; and capacity dependencies
Plans constructed to achieve management goals
Goals, such as those expressed by service level agreements and policies, such as preferred ways of achieving a goal

< Day Day Up >