|< Day Day Up >|| |
The material in this section came from the IBM developerWorks e-business on demand Resource Center Web site. IBMers can find this article at http://www-106.ibm.com/developerworks/ondemand/.
If you want to radically improve the efficiency of your business, you need tools and techniques that can help you integrate your business processes with your partners, suppliers, and customers. You need to respond to change with flexibility and speed. You need to evolve into an e-business on demand environment.
More than "e-sourcing," or utility-like computing, e-business on demand is a much broader idea about the transformation of both business and technology.
This transformation is based on open standards—so different systems can work together and link with devices and applications across organizational and geographic boundaries. (Using open standards assures that adoption of on demand computing will not lead to your business becoming wed to IBM, or tied too closely to any single vendor.)
From the technology standpoint (the primary focus for developers), an on demand environment has three essential capabilities:
Integration— Systems are seamlessly linked across the enterprise and across its entire range of customers, partners, and suppliers.
Virtualization— To make the best use of technology resources and minimize complexity for users, it uses grids to make the collective power of computing resources in the grid available to anyone in the grid who needs them.
Automation— It has self-healing, autonomic capabilities—so it can respond automatically and work around problems, security threats, and system failures.
In a networked world, you have to do more than simply integrate everything inside your enterprise. You have to connect your enterprise with other enterprises, other business processes, other applications, and billions of pervasive computing devices. You can't just rip and replace your existing data, applications, and transaction systems to make them homogenous with those of your business partners. Open standards allow all technologies to connect and integrate, and allow IT to become more modular. Linux and Java technologies brought open standards to the enterprise; today XML and Web services let you share information and applications across business lines.
Java technology began life as a way to add interactivity to Web pages through client-run applets and applications, but the most popular current use is in server-based J2EE systems. The developerWorks Java zone has an extensive collection of articles, tutorials, tools, and downloads for Java developers of all skill levels.
XML, or Extensible Markup Language, is a markup language that you can use to create your own tags. It was created by the World Wide Web Consortium (W3C) to overcome the limitations of HTML, the Hypertext Markup Language that is the basis for all Web pages. Like HTML, XML is based on SGML—Standard Generalized Markup Language. XML was designed with the Web in mind.
Linux is the fastest growing server operating system in the world, thanks to its powerful functionality, rock-solid stability, and open source foundation. Apps developed on Linux—from Web and e-mail servers to departmental and enterprise vertical applications—are reliable, portable, and cost efficient.
Businesses have been searching for a technology solution to enable their infrastructures to be as flexible as their highly fluid business models. They've found an answer in Web services architecture, a set of industry standard methods that enable simplified programmatic connections between applications. Web services focuses on simple, Internet-based standards to address heterogeneous distributed computing.
We recommend that you start with the following products:
IBM WebSphere Studio is an open comprehensive development environment for building, testing, and deploying dynamic on demand e-business applications. Founded on open technologies and built on Eclipse, WebSphere Studio provides a flexible, portal-like integration of multi-language, multi-platform, and multi-device application development tools that maximize your productivity, increase ROI and improve overall time to value.
IBM WebSphere Application Server is a high-performance and extremely scalable transaction engine for dynamic e-business applications. The Open Services Infrastructure allows companies to deploy a core operating environment that works as a reliable foundation capable of handling high volume secure transactions and Web services. WebSphere continues the evolution to a single Web services-enabled, Java 2 Enterprise Edition (J2EE) application server and development environment that addresses the essential elements needed for an on demand operating environment.
You can use WebSphere Business Integration Server V4.2 to quickly integrate new or existing applications or systems on diverse platforms, create and rapidly deploy new business processes, or solve a variety of business integration needs.
Rational XDE provides a frictionless design and development environment. At the core of this eXtended Development Experience is Rational XDE Professional and Rational XDE Modeler, visual design and development tools that are tightly integrated with WebSphere, giving you a single user experience.
Integration is the heart of e-business on demand. Your business becomes more powerful when you integrate horizontally, connecting with the vast amounts of data, legacy systems, and custom business applications inside and outside your business. Integration gives you real time transaction processing, data mining, and decision support systems to bring your business up to Web speed.
At the core of e-business on demand is business integration. Transforming into an on demand business requires building a dynamic infrastructure based on tightly integrated, streamlined critical business processes. Processes efficiently linked across your company and with those of key trading partners, suppliers and customers. Integrated business processes that give you flexibility, the ability to respond immediately to almost any customer demand, market opportunity, or external threat.
To gain this flexibility, a well-thought-out integration strategy based on a robust platform is key. A platform for automating and managing value chain processes both inside and outside the firewall. To slash cycle times and costs. To speed time to market. To gain business agility in the face of competitive pressures.
Companies evolving into e-businesses on demand have made WebSphere Business Integration the cornerstone of their integration strategy. WebSphere features a solid integration foundation with the comprehensive e-business capabilities you need in an on demand era.
These five capabilities include:
Model— Design, simulate and plan business processes.
Integrate— Link people, processes, applications, systems and data.
Connect— Extend processes to your customers and partners.
Monitor— Control and track business processes.
Manage— Review, analyze and improve processes and performance.
From enabling you to quickly redesign existing or deploy new processes, to easing the tracking and managing business events, a common thread runs across these five capabilities: real business results for both line of business management and IT.
You have the choice: deploy these capabilities separately or together to meet your business needs. Regardless, you have the leading platform, WebSphere Business Integration, for e-business on demand. Built with award-winning technology you can rely on to transform to an on demand business.
DB2 Universal Database, WebSphere MQ, and WebSphere Studio Application Developer are three popular IBM e-business software products. Many customers have used one, two, or even three of these products in a single deployment. These products can function very well on their own, but are very useful for data integration when used together. This section explains how data integration can be each achieved with each of these products using XML as a data interchange medium.
IBM DB2 Universal Database is a stable, reliable, and scalable database that will allow us to store the data used in our business application.
DB2 Universal Database (DB2 UDB) Version 7 provides a scalable, Web-ready database that delivers the performance, scalability, reliability, and availability needed for the most demanding e-commerce, CRM, ERP, and Business Intelligence applications. DB2 UDB is a relational database for today's heterogeneous computing environments and open standards that is able to access and integrate multiple data types from multiple geographically separated sources on different platforms.
IBM DB2 XML Extender is an end-to-end solution for storing and retrieving XML documents.
DB2 XML Extender provides new data types to store XML documents in DB2 databases and new functions to work with these structured documents. Managed by DB2, these documents are stored as character data or external files. Retrieval functions enable you to retrieve complete documents or individual elements.
IBM WebSphere Studio Application Developer allows the professional application developer to quickly and easily build, test, integrate and deploy Java and J2EE applications.
The IBM WebSphere MQ product provides application programming services that enable application programs to communicate with each other using messages and queues. This form of communication is referred to as asynchronous messaging. It provides assured, once-only delivery of messages. Using WebSphere MQ means that you can separate application programs, so that the program sending a message can continue processing without having to wait for a reply from the receiver. If the receiver, or the communication channel to it, is temporarily unavailable, the message can be forwarded at a later time. WebSphere MQ also provides mechanisms for generating acknowledgements of messages received.
The Extensible Markup Language (XML) is a key technology that facilitates information exchange and e-business transactions.
While these are useful in their own right, integrating them can be even more powerful. Business data is often a central component of many integration scenarios, hence the need for DB2. That data will more than likely need to be passed to other systems within the enterprise or to other organizations—XML fits the bill as a pervasive and useful transport medium for this data. The DB2 XML Extender provides the tools in order to easily interface these technologies. WebSphere MQ is a very useful application in order to transport the business data via messaging. And lastly, WebSphere Studio Application Developer has all the right tools necessary to implement the integration, from database to XML support. In addition, if your integration scenario requires HTML pages, servlets, or Enterprise Java Beans, WebSphere Studio Application Developer is an excellent tool for writing J2EE applications, with an included WebSphere Application Single Server test environment.
The following is from "Tivoli Identity Manager 4.4 Logical Component Structure," redbook tip (TIPS0045), published in April 2003.
The logical component design of IBM Tivoli Identity Manager may be separated into three layers of responsibility, which are depicted in Figure A-5. They are:
The Web User Interface layer
The Application layer
The Service layer
Figure A-5: Tivoli Identity Manager logical component architecture
The Web User Interface module is a set of combined subprocesses that provide content to a user's browser and initiate applets (run both on the client and the server), such as the Workflow Design and the Form Creation. The Web User Interface is the interconnecting layer between the user's browser and the identity management application layer.
In Figure A-5 on page 276, there are three types of user interaction points: The end user, supervisor, and administrator. These types are merely conceptual, as IBM Tivoli Identity Manager allows the customer to define as many different types of users with different permissions as they'd like.
However, in this figure, it is important to note that the system is built with a general concept of the capabilities of the system users. For example, it is assumed that the administrator needs advanced capabilities and requires a more advanced user interface, possibly requiring a thicker client (applet). It is assumed that the supervisor needs slightly less advanced capabilities but may still require concepts like an organizational chart. Since the number of supervisors in an enterprise may be great, a thick client is not practical. Lastly, there are no assumptions made for the end user. The interface presented to the end user must be a thin client with very basic and intuitive capabilities.
The Web User Interface subsystem contains all modules necessary to provide a Web-based front end to the applications of the Applications subsystem, as shown in Figure A-6.
Figure A-6: Tivoli Identity Manager Web User Interface subsystem
The core of the IBM Tivoli Identity Manager system is the Application Layer. Residing on an application server, the application layer provides the management functionality of all other process objects.
The Web User Interface subsystem contains all modules in the Platform subsystem that provide provisioning specific capabilities, such as identity management, account management, and policy management. Each application makes use of the core services in the Platform subsystem to achieve its goals. It is the Applications module that provides the external interface to the provisioning platform. Below is a brief description of each module.
The Application Interface module consists of all application specific user interface components. For example, the interface required to create a provisioning policy or an account is organized in this module. This module makes use of other modules in the Web User Interface subsystem, such as the Form Rendering and Search modules. Figure A-7 shows the modules of the application layer.
Figure A-7: Tivoli Identity Manager Application Interface module
If the IBM Tivoli Identity Manager server is the application of complex rules that have been developed, then the applications server is the engine that runs those rules or objects. It is communicating not only to the user facing Web server, but also to the agents residing on the managed services and to directories for storage of information.
The Core Services subsystem contains all modules in the Platform subsystem that provide general services that can be used within the context of provisioning, such as authentication, authorization, workflow, and policy enforcement. These services often make use of other services to achieve their goals. Figure A-8 on page 279 shows the involved components.
Figure A-8: Tivoli Identity Manager Core Services subsystem
The IBM Tivoli Identity Manager system uses an LDAPv3 directory server as its primary repository for storing the current state of the enterprise it is managing. This state information includes the identities, accounts, roles, organization chart, policies, and workflow designs.
A relational database is used to store all transactional and schedule information. Typically, this information is temporary for the currently executing transactions, but there is also historical information that is stored indefinitely to provide an audit trail of all transactions that the system has executed.
The back-end resources that are being provisioned by IBM Tivoli Identity Manager are generally very diverse in their capabilities and interfaces. The IBM Tivoli Identity Manager system itself provides an extensible framework for adapting to these differences to communicate directly with the resource. For a more distributed computing alternative, a built-in capability to communicate with a remote agent is provided. The agents typically use an XML-based protocol (DSML—Directory Services Markup Language) as a communications mechanism.
DSML provides a method for processing structured directory based information as an XML document. DSML is a simple XML schema definition that enables directories to publish profile information in he form of an XML document so that it can be easily shared via IP protocols, such as HTTPS, which is shown in Figure A-9.
Figure A-9: Tivoli Identity Manager DSML connectivity
Transactions from the IBM Tivoli Identity Manager server are sent securely via HTTPS to the service agent and then processed by the agent.
For example, if a service has just been connected to the IBM Tivoli Identity Manager server, the accounts that already exist on the server may be reconciled or pulled back in order to import the users' details into the IBM Tivoli Identity Manager LDAP directory. If a password change or a provisioning of a new user occurs, the information is transferred to and then processed by the agent. The agent deposits the new information within the application or operating system that is managed.
Due to the nature of support issues for every adaptor and the development costs for new adaptors, it is becoming easier to use IBM Directory Integrator as the data bus that transports information and data to the services. Using IBM Tivoli Identity Manager's JNDI API, it is possible to connect to just about any service, application, or endpoint, as shown in Figure A-10 on page 281.
Figure A-10: Tivoli Identity Manager IBM Directory Integrator connectivity
By using IDI and IBM Tivoli Identity Manager, communications to resources such as a CSV file or a Microsoft Excel spreadsheet containing contractor employment details may be possible through a parser connector. Tivoli Identity Manager would see this as a service that it manages and that can be incorporated into its workflow.
The following material comes from "Understanding Web Services" by Arthur Ryman (Senior Technical Staff Member, WebSphere Studio Development, IBM Toronto Lab), published in July 2003. IBMers can find this at http://www7b.software.ibm.com/wsdd/library/techarticles/0307_ryman/ryman.html.
In the beginning, there was the Document Web. The Document Web was powered by Web servers that spoke Hypertext Transfer Protocol (HTTP) and delivered information marked up in Hypertext Markup Language (HTML) for display on Web browsers. The Document Web was created to serve the needs of scientists who needed to share research papers. Businesses saw that the Document Web was a good thing, and many businesses created Web sites to advertise their products and services.
Then businesses wanted to do transactions with their customers, so the Application Web was created. The Application Web was powered by Web application servers, such as IBM WebSphere Application Server, which dynamically generated HTML documents from server-side business logic written in programming languages such as Java. Web application servers acquired distributed programming capabilities, so that they could scale over multiple machines to handle large transaction rates. Web application servers also learned to speak Wireless Application Protocol (WAP) and deliver information marked up in Wireless Markup Language (WML), so that mobile customers could participate in the Application Web. e-business flourished.
Then e-businesses wanted to integrate their processes with other e-businesses, so the Service Web was created. The Service Web is powered by Web application servers that speak Simple Object Access Protocol (SOAP), and deliver information marked up in Extensible Markup Language (XML). Today, the Service Web is in its adolescence. We are currently witnessing the rapid maturation and deployment of a stack of interrelated standards that are defining the infrastructure for the Service Web.
The building block of the Service Web is the Web service, a set of related application functions that can be programmatically invoked over the Internet. The information that an application must have in order to programmatically invoke a Web service is given by a Web Services Description Language (WSDL) document. WSDL documents can be indexed in searchable Universal Description, Discovery, and Integration (UDDI) Business Registries so that developers and applications can locate Web services.
This section describes the Service Web, outlines the key standards of SOAP, WSDL, and UDDI, and discusses new tools for developing Web services. Armed with this information, you should be able to understand how Web services can enhance your business and how you can begin developing them.
Web services provide an application integration technology that can be successfully used over the Internet. To illustrate this integration, consider how the Web is currently used for planning travel. Many Web sites sell airplane tickets, and most of them are well-designed for personal travel. However, they may not be useful for planning business travel, because many businesses have internal processes that control travel.
A typical business process for travel might require the use of preferred airlines, car rental agencies, hotels, and travel agents. A travel request might also require management approval to ensure that it is within corporate guidelines and expense constraints. Employees are highly motivated to follow their employer's business processes since failure to do so may result in denial of expense claims. A business may provide a travel planning application to help employees prepare travel requests.
Suppose you are an employee using your company's travel planning application. If the request form asks you to fill in your desired flight times, you naturally want to know what flights are available. Since your travel planning application is not integrated with the airline Web site, you must launch your Web browser, go to the airline Web site, find the schedule query page, and then enter your origin, destination, and dates and times of travel. The airline Web application then returns a list of scheduled flights, from which you choose the flight you want. But since the information on the airline Web page is probably in a different format than that required by the travel application, you must write down or copy the information, switch back to the travel application, enter the information in the travel application, and then submit your request.
A simple task that should have taken seconds, instead takes minutes, because there is no easy way to integrate the airline Web application with the corporate travel planning application.
Figure A-11: Weak Integration on the Application Web
Consider how Web services could change this scenario.
Suppose the airline develops a Web service that allows applications to obtain the list of available flights between two cities on a given date. Now, the corporate travel planning application can programmatically invoke the airline flight schedule Web service, so that the employee does not have to navigate through the airline Web site. The ease of use of the corporate travel planning application is greatly improved. There is also a minor benefit to the airline since now its servers are more responsive. The Web service-based transaction is completed with a single request, whereas the traditional Web site returns many HTML pages and images to the user.
Figure A-12 shows how Web services can improve application integration.
Figure A-12: Improved integration on the service Web
Continuing this scenario, suppose that airlines, hotels, and car rental agencies provide Web services that allowed applications to programmatically purchase airplane tickets, book hotel rooms, and reserve rental cars. In most cases, a travel planning application could make all of the necessary arrangements without the aid of a human travel agent.
The use of Web services would reduce costs, improve quality, and increase function. For example, suppose the airline also had a Web service that monitored flight departure and arrival times. The travel planning application could query the airline to determine if your flight was delayed, and if so, it could then notify your hotel and your car rental company to hold your reservations.
Here, the economic motivation for businesses to implement Web services is improved customer service and more efficient management of inventory. For example, if the car rental agency knows that you will be two hours late, it might be able to avoid the cost of transporting cars between agencies in order to meet the worst-case peak demand.
More interestingly, the widespread implementation of Web services would also enable a whole new wave of advanced applications. For example, suppose your flight is oversold. Your travel application calls you in the departure lounge on your WAP phone and asks what you want to do. Your travel application can access an e-marketplace Web service that lists seats on this flight. If you have a confirmed seat, your travel application asks you if you would like to put your seat up for auction, shows you the current bids, and lets you specify an asking price.
Conversely, if you don't have a confirmed seat, your travel application asks you if you want to bid on a seat, shows you the current prices, and lets you specify a bid. If you need to check your agenda in order to make a decision, your travel planning application accesses your corporate calendar application Web service and displays your agenda on the WAP phone. If you don't have a confirmed seat but really need to get to that meeting on time, the travel application verifies that corporate expense guidelines are met and approves your bid. If your bid is accepted, the travel application submits the bill to the corporate expense statement Web service.
Figure A-13 shows how powerful new applications can be assembled from the Web services of multiple suppliers.
Figure A-13: The new Web services economy
As the number of Web services increases, their interactions will become more complex and we may even begin to attribute characteristics such as intelligence or self awareness to the Service Web. The usual vision of artificial intelligence as arising through some very complex thinking program must be reevaluated. More likely, intelligence may arise as a network effect. After all, the brain itself is simply a very complex network of relatively simple neurons. It is likely that the first time a collection of interacting Web services displays a useful, unanticipated behavior, we will start to think of it as intelligent.
For example, suppose you also rent a cell phone with your car, because you want to avoid long distance charges or because your phone has no local coverage. Suppose also that while driving back from your meeting to the airport, your newly rented cell phone rings and your travel application tells you that the flight has been delayed. The first time this happens to you, you might be surprised and you might think the travel application was intelligent, especially if this capability had not been explicitly programmed into it.
This type of apparently intelligent behavior could arise through the following sequence of simple Web service interactions. When you rented the car and cell phone, the travel application requested the rental agreement from the car rental agency and submitted it to the corporate expense application. When the expense application received the invoice, it scanned the items and noticed a cell phone. The expense application then updated the corporate phone directory with the cell phone number. Later, the travel application requested the departure time from the airline and noticed that it had changed. The travel application then queried the corporate phone directory, retrieved your new cell phone number, and called you.
Most travelers would consider these kinds of notifications very convenient, but other kinds of unanticipated interactions might be undesirable. As we go forth and bravely build the Service Web, we should build appropriate safeguards and controls into it in order to prevent undesired consequences. Considerations of security and privacy will be more important on the Service Web than they are today on the Application Web, and could well become the differentiators between successful e-businesses and their competitors.
The Service Web is being built on Internet standards. One of the key attributes of Internet standards is that they focus on protocols and not on implementations. The Internet is composed of heterogeneous technologies that successfully interoperate through shared protocols, not shared middleware. No single vendor can impose a standard on the Internet, and no single programming technology will dominate. Fortunately, Web services standards are being cooperatively developed by IBM, Microsoft, Ariba, and many others, and are being submitted to the World Wide Web Consortium (W3C).
Open source software development plays a crucial role in preserving the interoperability of vendor implementations of standards. To understand this, consider the Apache Web server, which has rapidly gained market share to become the dominant Web server. The Apache Web server acts as the reference implementation of HTTP. Consider what would happen if the next release of Microsoft Internet Explorer or Netscape Navigator included a change to HTTP that made it only work with Microsoft Internet Information Server or Netscape Enterprise Server. The market share of such a vendor-specific Web browser would drop to zero. The existence of the Apache Web server prevents any Web browser or server vendor from fragmenting HTTP. IBM is actively contributing implementations of Web services standards to open source projects in order to ensure interoperability on the Service Web.
This section discusses three major Web services standards:
SOAP— A standard for messaging over HTTP and other Internet protocols
WSDL— A language for precisely describing the programmatic interfaces of Web services
UDDI— A business registry standard for indexing Web services, so that their descriptions can be located by development tools and applications.
Figure A-14 shows how these standards are related to each other:
Figure A-14: How SOAP, WSDL, and UDDI are related
Application servers host Web services and make them accessible using protocols such as HTTP GET, HTTP POST, and SOAP/HTTP. The Web services are described by WSDL documents, which can be stored on the application server or in special XML repositories. A WSDL document may reference other WSDL documents and XML Schema (XSD) documents that describe data types used by Web services. An XML repository is useful for managing WSDL and XSD documents. The WSDL documents contain the URLs of the Web services. The Web services are described and indexed in a UDDI business registry that contains the URLs of the WSDL documents.
SOAP is a standard for sending and receiving messages over the Internet. SOAP was initially proposed by Microsoft as a way to do Remote Procedure Call (RPC) over HTTP. IBM and other vendors contributed to the development of the SOAP 1.1 standard, which was then submitted to the W3C. The current SOAP specification allows transports other than HTTP, and other messaging styles in addition to RPC.
The SOAP specification defines an XML envelope for transmitting messages, a method for encoding programmatic data structures as XML, and a binding for using SOAP over HTTP.
The SOAP specification defines the SOAP encoding style, which defines a way to express data structures in XML. The SOAP protocol permits the use of other encoding styles, but this is a potential source of fragmentation, because if a Web service uses a different encoding style, then its reach is limited to only those applications that support that particular encoding style. Alternative encoding styles should be used only after careful consideration of the intended users of the Web service. SOAP messages are not required to use any encoding style in which case the message body is treated as literal XML.
The SOAP specification defines a binding for using SOAP over HTTP, but other transports may be used. For example, Simple Mail Transport Protocol (SMTP) can be used to send SOAP messages to e-mail servers. Using SMTP is useful for asynchronous message delivery.
SOAP is a relatively simple protocol, and is easy to implement, as witnessed by the explosion of implementations. One of the first implementations was done in Java by developmentor. IBM then released another Java implementation, called IBM SOAP4J, on the IBM alphaWorks Web site. The SOAP4J implementation was later contributed to the Apache XML project and was released as Apache SOAP 2.0 which runs on Apache Tomcat, IBM WebSphere Application Server, and other servlet engines. Microsoft released a SOAP Toolkit for use with Visual Basic. Many implementations of SOAP are now available, as shown by the list of Web services at XMethods.
SOAP has rapidly matured. In the Java world, many Java Specification Requests (JSR) were created to standardize APIs. The main JSRs are JSR 101, which defines the Java binding for SOAP, and JSR 109, which defines the deployment model for Web services. These JSRs are included in J2EE 1.4. The Apache Axis project implements JSR 101, while WebSphere Application Server V5.0.2 implements both JSR 101 and JSR 109. In the Microsoft world, Web services are supported by ASP.NET, which is part of the .NET Framework.
After the SOAP 1.1 specification was submitted to the W3C, it formed the Web services Activity to advance Web services standards, and created the XML Protocol Working Group to turn SOAP into a W3C standard. Recently, the SOAP 1.2 specification became a W3C Recommendation.
WSDL—For an application to use a Web service, the programmatic interface of the Web service must be precisely described. In this sense, WSDL plays a role analogous to the Interface Definition Language (IDL) used in distributed programming. The description must include such details as the protocol, host and port number used, the operations that can be performed, the formats of the input and output messages, and the exceptions that can be thrown.
There were several proposals for languages to solve this problem. Microsoft first proposed Service Description Language (SDL) and provided an implementation of it in their SOAP Toolkit. After reviewing the proposal, IBM countered with the Network Accessible Service Specification Language (NASSL) and released an implementation of it in SOAP4J as the NASSL Toolkit. Ideas from NASSL influenced Microsoft's SOAP Contract Language (SCL), which was also planned to describe Web service composition (also referred to as workflow or orchestration). These proposals converged with input from several other vendors as the WSDL 1.1 specification, which was then contributed to the W3C. IBM and Microsoft continued to work on Web services composition and released the Business Process Execution Language for Web services (BPEL4WS) specification.
A notable feature of WSDL is that interfaces are defined abstractly using XML Schema and then bound to concrete representations appropriate for the protocol. WSDL has predefined bindings for the protocols HTTP GET, HTTP POST, and SOAP/HTTP, but is extensible. For example, in the case of HTTP GET, the binding specifies that the input message is encoded in a URL of an XML request body.
WSDL permits the description document to be composed from parts using an import mechanism. It is useful to put the abstract definitions in a separate file, so that they can be imported by many concrete bindings. For example, the travel industry could define an abstract flight schedule Web service interface, which would then have concrete bindings by many airlines. Each airline would create its own WSDL file that specified the protocols, hosts, and port numbers used by the airline's application servers, and would import the abstract definition of the Web service from the travel industry's XML repository.
WSDL is being standardized by the W3C Web Services Description Working Group which is part of the Web services activity, and drafts of the WSDL 1.2 specification are available for public review.
UDDI addresses the problem of how to find Web services. UDDI defines a business registry where Web service providers can register services, and where developers and applications can find them. IBM, Microsoft, and Ariba implemented the original UDDI registries, but other vendors now also have implementations. A service provider only has to register a Web service at one of the business registries, because updates to any registry will be replicated in all of the other registries that are part of the UDDI Business Registry Network.
Figure A-15 shows how the UDDI Business Registry Network is used by providers and users of Web services.
Figure A-15: The UDDI Business Registry Network
UDDI is based on four types of entity—Business Entity, Business Service, Binding Template, and Technology Model:
A Business Entity describes the business that is providing the services. The Business Entity description includes categorization information so that searches can be performed for specific types of businesses.
A Business Service is a class of services within a business. Each Business Service belongs to some Business Entity.
The Binding Template and Technology Model (tModel) together define a Web service as described by WSDL. The tModel corresponds to the abstract description and the Binding Template corresponds to its concrete binding to a protocol. Each Binding Template belongs to some Business Service, but many Binding Templates can refer to the same tModel. A UDDI business registry is itself a SOAP Web service. It provides operations to create, modify, delete, and query each of the four entity types.
You can use UDDI at both development time and run time. At development time, you can search the registry for suitable services and use it to locate the appropriate WSDL file. Development tools can then be used to generate client proxies from the WSDL files so that the application can access the service. At run time, the application does not have to use UDDI if the target Web service is fixed and always available. However, if the Web service becomes unavailable, the application can query the registry to determine if the service has been moved to another location. This capability is useful when service providers have to rehost their services. UDDI can also be used at run time, for example, to determine all available services that implement some interface. For example, all car rental agencies could be located and queried at run time to find the lowest rates or the best availability at that moment.
The UDDI specification is being standardized by OASIS, along with new Web services standards for security, reliability, and other enterprise computing qualities of service.
Although SOAP, WSDL, and UDDI form the infrastructure for Web services, developers should be shielded from their details, so that they can focus on the business problem being solved. Development tools that facilitate Web services development are now available.
Early versions of several IBM development tools, such as the WSDL Toolkit, the Web Services Toolkit, and the IBM XML and Web Services Development Environment, appeared on the IBM alphaWorks Web site, which continues to host emerging Web services technology. The IBM XML and Web Services Development Environment is now integrated into the IBM WebSphere Studio product, which contains a full suite of Java, J2EE, Web, XML, data, and Web services tools along with tools for building, debugging, testing, deploying, and profiling applications.
Figure A-16 shows the WebSphere Studio desktop.
Figure A-16: WebSphere Studio desktop
The Navigator pane shows the projects, folders, and files in the currently open workspace. The Editor pane contains two open files: StockQuoteService.java and StockQuoteServiceSoapBindingStub.java. Many files may be open at once, but only one at a time can be selected for editing. You select the files using the tabs at the top of the pane. When a file is selected, its contents are displayed and it is outlined in the Outline pane. Parts of the file can be selected by clicking in the Outline pane. The All Tasks pane lists problems in the solution.
Double-clicking on a task opens the file and scrolls it to the line where the problem occurred.
WebSphere Studio has a built-in Java development environment and a set of XML and Web services development tools. The XML tools include editors for XSD and WSDL. The Web services tools include a wizard that can create a Web service from a Java class, an EJB, or a set of SQL statements. The wizard can then generate a WSDL file that describes the Web service, generate a Java client proxy and server skeleton from a WSDL file, generate a Web test client from a WSDL file, and explore the Web services in a UDDI Business Registry. The Web services wizard frees you from dealing with the details of XML, SOAP, WSDL, and UDDI, and lets you focus on the business problem at hand. For more information on the Web services tools in WebSphere Studio, see the IBM Systems Journal article or the book chapter in Java Web Services Unleashed.
The key value business value of Web services is that they enable interoperability of applications that use different hardware, operating systems, programming languages, or middleware. While SOAP, WSDL, and UDDI are a good starting point, they contain too many features to ensure consistent implementation and interoperability. For example, early experience with SOAP exposed problems in achieving interoperability when using SOAP encoding. To ensure that Web services were interoperable, IBM, Microsoft, and other vendors formed the Web Services Interoperability Organization (WS-I.org). The membership of WS-I.org includes all major vendors and many customers, so there is good reason to believe that interoperability will become a reality in the near future.
WS-I.org established a Basic Profile 1.0 that clarifies the specifications for SOAP, WSDL, and UDDI, and identifies a subset of features that can be implemented by all vendors in an interoperable way. The Basic Profile 1.0, which covers SOAP, WSDL, and UDDI, is in the final stages of approval and is part of J2EE 1.4. WebSphere Application Server V5.0.2 and WebSphere Studio V5.1 will enable the creation of WS-I.org compliant Web services. The Basic Profile 1.1 will include support for SOAP with Attachments and WS Security. SOAP with Attachments allows binary documents, such as images, to be efficiently transported with XML SOAP messages. WS Security defines how portions of SOAP messages can be digitally signed or encrypted. As new key Web services standards emerge, they will be added to future WS-I.org profiles.
The next generation of advanced Web applications will be constructed from Web services building blocks. By developing Web services, businesses enable their applications to be integrated into the processes of their suppliers, partners, and customers, yielding improved function, reduced costs, and competitive advantage.
Web services are based on open, implementation-neutral standards such as XML, SOAP, WSDL, and UDDI. The Web services infrastructure is now ready for serious consideration by businesses.
Development tools that support Web services standards are available and IBM has a full-featured suite of Web services tools in WebSphere Studio.
The following material comes from "Understanding the foundation of JSR-109" by Jeffrey Liu and Yen Lu, (IBM software developers), published August 2003. IBMers can find this at http://www-106.ibm.com/developerworks/webservices/library/ws-jsrart/.
JSR-109 facilitates the building of interoperable Web services in the Java 2 Platform, Enterprise Edition (J2EE) environment. It standardizes the deployment of Web services in a J2EE container. This article discusses the server and client programming models defined by JSR-109 and provides code examples.
A crucial objective of Web services is to achieve interoperability across heterogeneous platforms and runtimes. Integrating Web services into the Java 2 Platform, Enterprise Edition (J2EE) environment is a big step forward in achieving this goal. JAX-RPC (JSR-101) took the first step into this direction by defining a standard set of Java APIs and a programming model for developing and deploying Web services on the Java platform. JSR-109 builds upon JAX-RPC. It defines a standard mechanism for deploying a Web service in the J2EE environment, more specifically, in the area of Enterprise JavaBean (EJB) technology and servlet containers. Both of these specifications will be integrated into the J2EE 1.4 specification. Together they serve as the basis of Web services for J2EE.
Roles in Web services for J2EE development—The J2EE platform defines several roles in the application development cycle. They include: J2EE product provider, application component provider (developer), application assembler, deployer, system administrator, and tool provider. In an attempt to integrate Web services development into the J2EE platform, JSR-109 defines additional responsibilities for some of the existing J2EE platform roles. The J2EE product provider is assigned the task of providing the Web services run time support defined by JAX-RPC, Web services container, Web services for J2EE platform APIs, features defined by JAX-RPC and JSR-109, and tools for Web services for J2EE development. In the actual Web services for J2EE development flow, the developer, assembler, and deployer are assigned specific responsibilities.
Figure A-17 illustrates the Web services for J2EE development flow.
Figure A-17: Web services for J2EE development flow
In general, a developer is responsible for providing the following:
Web service definition
Implementation of the Web service
Structural information for the Web service
Implementation of handlers
Java programming language and WSDL mappings
Packaging of all Web service related artifacts into a J2EE module
Figure A-18 on page 298 summarizes a developer's responsibilities in the server and client programming models. We will present further details on each of these programming models later in this article. Figure A-18 on page 298 also illustrates which specific resources each of these responsibilities touches upon. For example, the Web service definition is provided in the form of a WSDL file, the structural information of the Web service and the Java-to-WSDL mapping are defined in webservices.xml and jaxrpcmapping.xml, respectively. These two XML files are deployment descriptors defined in the server programming model.
Figure A-18: Web services developer role
An assembler takes the J2EE modules developed by a developer, and assembles them into a complete J2EE application. An assembler is responsible for the following:
Assembling modules into an Enterprise Application Archive (EAR)
Configuring the modules within the application, for example, resolving module dependencies
Figure A-19 on page 299 summarizes an assembler's responsibilities.
Figure A-19: Web services assembler role
A deployer is responsible for the following:
Resolving endpoint addresses for Web services contained in the EAR
Generating stubs for Web services using deployment tools provided by a Web Services for J2EE product provider
Resolving the WSDL documents used for each service reference
Resolving port access declared by a port component reference
Deploying the enterprise application
Figure A-20 on page 301 summarizes a deployer's responsibilities.
Figure A-20: Web services deployer role
Benefits of JSR-109—Prior to JSR-109, the procedure for deploying a Web service was highly coupled to the destination runtime. Deploying a Web service to Apache Axis is quite different from deploying to Apache SOAP. Organizations that have deployed Web services in the past would require some strong arguments as to why they should switch to the JSR-109 deployment model given their previous investments. The motivation for JSR-109 is to promote the building of interoperable and portable Web services in the J2EE environment. JSR-109 leverages existing J2EE technology and provides industrial standards for Web services deployment. It clearly defines requirements that a Web service for J2EE product provider must support. It allows a J2EE Web service to be configurable via XML-based deployment descriptors and hides the complexity of the system from Web service developers, assemblers, and deployers. Knowing how JSR-109 works allows you to configure a J2EE Web service without having to explore and learn the implementation details of the underlying system. Finally, as JSR-109 is adopted by Web server providers, the process of migration and deployment should become a routine procedure.
Server programming model—The server programming model attempts to standardize the deployment of Web services in the J2EE environment. More specifically, it defines the deployment of a servlet-based implementation bean in the Web container and the deployment of a stateless session EJB implementation in the EJB container. Both types of service implementations use a Service Endpoint Interface (SEI)—refer to JAX-RPC (JSR-101)—to define the method signatures of the Web service. The SEI must follow the Java programming language and WSDL mapping rules defined by JAX-RPC. The service implementation bean must implement all the methods defined in the SEI. For EJB service implementation, the SEI methods must be a subset of the EJB component's remote interface methods. The lifecycle of the service implementation bean is controlled by its associated container. In general, a service implementation bean will go through four phases: the instantiation phase, the initialization phase, the execution phase, and the removal phase. The lifecycle starts off when the container creates an instance of the service implementation bean by calling its newInstance method. The container then initializes the bean via its container specific method. For a servlet-based implementation bean, the init method is called. For an EJB implementation, the setSessionContext and the ejbCreate methods are called. The service implementation bean is now ready to handle service requests. The lifecycle ends when the bean is removed from the container. This will result in the destroy and ejbRemove methods being called for the Web and EJB implementations, respectively. Figure A-21 on page 303 shows the lifecycle of a servlet-based implementation bean and Figure A-22 on page 303 shows the lifecycle of a session EJB component.
Figure A-21: Lifecycle of a servlet-based implementation bean
Figure A-22: Lifecycle of a session EJB
IBMers can find more information at thttp://www-106.ibm.com/developerworks/webservices/library/ws-jsrart/.
Client programming model—The client programming model provides a client view of a Web service in the J2EE environment. Run time details, such as protocol binding and transport are transparent to a client. A client invokes a Web service in the same way as invoking a method locally. JSR-109 specifies three mechanisms for invoking a Web service:
Dynamic Invocation Interface (DII)
Static stub is a Java class that is statically bound to an SEI and to a WSDL port and port component. It is also tied to a specific protocol binding and transport. A stub class defines all the methods that an SEI defines. Therefore, a client can invoke methods of a Web service directly via the stub class. It is extremely easy to use and requires no configuration. However, the major disadvantage of a stub class is that the slightest change of a Web service definition may be enough to render a stub class useless. In this case, the stub class must be regenerated. A stub class is very handy in situations where you want to create a client to a production Web service that is very stable and is not likely to change its definition.
Dynamic proxy, as its name suggests, supports an SEI dynamically at run time without requiring any code generation of a stub class that implements a specific SEI. A dynamic proxy can be obtained during run time by providing an SEI. It is not as statically bound as a stub class, but it requires an SEI to be instantiated. It is invoked in the same way as a stub class is invoked. A dynamic proxy can be very useful during the Web services development cycle. A Web service's definition—more specifically, its SEI—may undergo a lot of changes over a period of time. Using a stub class to test the Web service could be very tedious, because a stub class needs to be regenerated every time the SEI changes. In this case, a dynamic proxy is more suitable because it only needs to be reinstantiated (rather than regenerated as in the case of a static stub) whenever the SEI changes.
IBMers can find more information at http://www-106.ibm.com/developerworks/webservices/library/ws-jsrart/.
JAX-RPC mapping—The last piece to the JSR-109 puzzle is the mechanism for standardizing the Java <=> WSDL mappings. JAX-RPC provides the rules for these mappings, while JSR-109 provides a XML-based deployment descriptor standardized representation.
IBMers can find more information at http://www-106.ibm.com/developerworks/webservices/library/ws-jsrart/.
Future improvements—JSR-109 provides a preliminary standardized mechanism for deploying Web services in the J2EE environment. There is still room for future improvements, especially in the area of security. Although the demand for secure Web services is as important as having interoperable Web services, the challenge to standardize a security model across heterogeneous platforms remains. As for JSR-109, it defines the security requirements that it attempts to address, but the actual standardization is deferred to a future release of the specification. Security requirements include the following:
Credential-based authentication. For example, HTTP BASIC-AUTH.
Authorization defined by the enterprise security model.
Integrity and confidentiality using XML encryption and XML.
Besides security improvements, there are areas where JSR-109 can improve that are directly related to JAX-RPC. For example, JAX-RPC defines an in-depth Java <=> XML serialization and deserialization framework. However, JSR-109 does not provide any support in this area. In addition, JSR-109 did not provide a complete representation for the type mapping rules defined by JAX-RPC. It lacks the support for MIME types. For example, JAX-RPC allows java.lang.String to be mapped to MIME type text/plain. However, JSR-109 cannot support this mapping, as there is no standard representation for MIME types in the JAX-RPC mapping file.
Hopefully, some of these issues will be addressed in a future release of the JSR-109 specification. This will provide a more integrated, interoperable, and portable architecture for building Web services.
The following is from "The Case for Portlets," by Ann Marie Fred and Stan Lindesmith (IBM staff software engineers), published February 1, 2003. IBMers can find this article http://www-106.ibm.com/developerworks/ibm/library/i-portletintro/.
Java servlets, Java Server Pages (JSP), and Microsoft Active Server Pages (ASP) technologies are the tried-and-true development platforms for Web applications. They are mature platforms, with accepted standards and plenty of tools for developing and testing them.
Servlets are essentially a standard way of extending the functionality of a Web browser. They were designed to replace CGI (Common Gateway Interface) scripts. There is a standard API (application programming interface) that developers "fill in" to implement their Web applications. The servlet engine provides some useful and time-saving features, including converting HTTP requests from the network to an easy-to-use HttpServletRequest object, providing an output stream for the programmer to use for the response, and converting the easy-to-use HttpServletResponse object to an HTTP response that can be sent back over the network. The servlet engine also provides convenient session management features that allow you to remember users between requests, and you can allocate resources (such as database connections) that can be used for multiple requests. Like portlets, servlets act as a bridge between your business data and a Web browser interface. Servlets have the advantage of maximum portability: they can run within more Web servers or Web application servers and on more platforms than any other Web application technology available today.
ASP pages provide functionality similar to servlets and JSP pages. ASP pages are used to create an interactive Web page experience for the user. ASP pages also enable developers to use an MVC design to separate the business logic from the presentation by using ActiveX controls. They also integrate easily with Microsoft's .NET platform. However, unlike servlets and JSP pages, ASP pages are not compatible with Java portlets, and there is not an easy way to convert them into portlets. Because of the ActiveX controls, ASP pages are restricted to Windows platforms (unless you can find a third-party porting/bridging program that exists for the platform you are targeting). Overall, ASP pages are not as compatible with nearly as many Web servers, Web application servers, and operating systems as the Java technologies, so we prefer servlets and JSP pages.
Portlets are a hot new Web application development technology. Although JSR-168 provides a portlet specification, portlets are not fully standardized yet. For example, WebSphere Portal Server portlets will not work in the BEA portal server because the BEA application server does not support the WebSphere Portlet API. There are also not as many tools available to help you develop portlets as there are for the more established technologies like servlets.
However, this is not a reason to avoid them. Once the standards are finalized, it should be simple to convert existing portlets to meet the new standards. Toolkits for developing them are already available, and development tools are improving constantly. Check out the developerWorks WebSphere Studio Zone for portal and mobile device toolkits for WebSphere Studio.
There are not nearly as many trained portlet developers as there are experienced servlet, JSP, and ASP developers. However, developers who have experience with these older technologies, especially servlets and JSP technologies, can quickly and easily learn how to write portlets. It is more cost-effective (and less tedious) for developers to write portlets than servlets because the portal server provides many features that the developers would otherwise have to provide themselves.
Portlets are actually a special type of servlet, and they also use JSP pages to render their user interfaces. The Portlet API extends and subclasses the Servlet API, meaning that portlets can do anything that servlets do, with some changes in their behavior and some extra features. The most significant behavior change is in how requests are served: servlets process doGet and doPost requests, which map to HTTP GET and POST requests, while portlets process doView and doEdit requests, which come from the portal server instead of directly from a Web browser. A minor improvement is the PortletLog, which provides better logging and tracing features than the standard Servlet API. A second improvement provided by the WebSphere Portal Server is better support for internationalization of your portlets. The Portlet API also gives you access to configuration information about your portlet application and your portlet instance.
Portlets also have many standard features that are not available to servlets at all. One key feature is built-in support to automatically use different JSP pages for different user devices. This lets you write portlets that work on many devices, including desktop computers with modern Web browsers, older or palmtop computers with limited Web browsers, personal digital assistants (PDAs), and Web-enabled wireless phones. You do not need to program to a lowest common denominator; you can provide multiple interfaces for different device types, reusing the same underlying business logic, and the portal server will choose the most appropriate one for each user. You can even have multiple portlet controllers, which allows you to have different page/action sequences for each device type.
Another important feature provided by the Portlet API is the ability to handle messages and events. The ActionListener turns user clicks in your pages into action events, an easy-to-use model familiar to many Java programmers. The WindowListener notifies your portlet of window events from the portal, such as when your portlet is maximixed, minimized, and restored. The MessageListener lets you handle messages sent from one portlet to another.
Yet another key feature of portlets is the ability to customize them in a number of ways. When developers use the standard Cascading Style Sheet provided by the portal server, all of their portlets will have the same look and feel. Companies can create themes and skins for their portlets that reflect their individual style, and all of the portlets (even those developed by a third party) will change to that new style automatically.
You can also customize your portlets on a user group basis. You can assign some user groups permissions to use portlets, and those without permissions to use the portlets will not even see them in their portal. Your portlet can also change its output based on the user group(s) that the current user is in because the portal server provides an easy way to determine group membership. The portal server can use your existing LDAP directory to import user groups, or you can create a new user directory.
The third type of portlet customization is user preferences. Subject to limits that the portal administrator sets, users can decide which portlets are in their portal, what pages those portlets are on, and any other preferences your developers allow them to change. For example, you can allow your users to customize their stock ticker portlet with their own portfolio of stocks, and store that information in the PortletData object for that user. When the user logs out and returns they will have their saved preferences from the previous session.
There are many other valuable features provided by the WebSphere Portal Server and its extensions, including content management, transcoding, voice support, and offline browsing. If you choose to develop portlets you will also have these features available to you, but they are outside the scope of this article. More information on these advanced features is available on the WebSphere Portal Server home page and the developerWorks WebSphere Portal Zone.
Web services are another flashy new Web application development technology. Web services are modular applications that can be described, published, located, and invoked over the Web. These modular applications are loosely coupled, dynamically bound components. For a good introduction to Web services and how they can be applied to real world business problems, see the series "Best practices for Web services" (developerWorks, January 2003).
Like portlets, Web services are not fully standardized yet, but standards work is well underway, and you can use Web services today. There are also some development tools available already, including the WebSphere SDK for Web Services.
A Web service is a logical extension of object oriented design. A developer can implement a function and then define that function (via an API) as a Web service.
After the application is created and the interfaces defined, the Web service is made available (published) so that a customer (requester) of the service can use it. The architecture has three distinct users—a Web service provider, who defines the modular application and then asks the Web service broker to publish the application so that a Web service requester can use it in their application.
If your goal is to bring together your Web applications and information into one convenient place, portlets are the obvious choice. If your development goals are somewhat different, consider these other portlet and portal server features that you might want to take advantage of:
Portlets can be extended to work on many client devices. Your users can move from computer to computer, and mobile device to mobile device, and still use the information and applications they need.
Portlets allow you to easily customize their content for different user groups, and individual users can rearrange and tailor them to their needs.
You can make your portlets have a unified look, and change their appearance quickly, using Cascading Style Sheets along with themes and skins that the portal server provides. You can create your own themes and skins as well, to better reflect your company's image and style.
Portlets can be published as Web services, so that companies outside of your portal server's environment can easily write programs to use them.
The WebSphere Portal Server provides excellent support for internationalization, beyond what your Web application server provides. It is straightforward to develop portlets that will display correctly for international users, even in double-byte or bi-directional languages like Chinese and Arabic.
Portlets help divide complex applications into tasks: in general, one group of closely related tasks equals one portlet. WebSphere Portal Server's administration portlets are a good example, where administration tasks are broken down into categories (Portlets, Portal Settings, etc.), groups of related tasks (Manage Users, Manage User Groups), and single tasks (Search for users, Create new user).
Portlets make it easy to add features to your applications later. If the new feature is large, you should create a new portlet. For small updates, you can update the existing portlets without losing users' individual preferences.
Portlets, like other Web applications, play well with firewalls. They use standard Web protocols to receive and display information.
You only need to install and configure portlets once for all of your users, which is much easier than working with standalone applications on each computer. This applies to the other Web application technologies as well.
The portal server works with the Web application server to provide security, installation support, reliability, and availability for many users, so you don't need to spend a large part of your development effort working on these features.
Once you do invest in a portal server, you may find its advanced features useful: content management, transcoding, voice support, and offline browsing, among others.
Portlets are not the solution to every design challenge. Here are a few things that portlets do not do well:
Complex user interfaces do not translate well to portlets. The markup languages like HTML and WML simply cannot describe some interfaces. Try to imagine implementing an Integrated Development Environment (IDE) like Eclipse or Visual Basic in HTML and you'll have the idea. Native applications and Java applications work better for this. (If you have a complex user interface and still want to take advantage of the benefits of portlets, WebSphere Portal Version 4.2 does support Struts, which can be very helpful. Struts are part of the Apache Jakarta project—more information is available on the Struts site.
User interfaces with data that must be constantly updated are also not portlet material. When you update one portlet, all portlets on the entire page must be re-drawn, so it is generally not a good practice to have your portlets automatically reload themselves with new data. On the other hand, you can have the refresh option in the portlet so your users can choose when to reload the page.
Highly interactive user interfaces do not translate well to Web applications in general, or portlets in particular. If you want your interface to change automatically when a user takes some action, like selecting an entry in a drop-down list, you can either submit the form and reload the entire page (annoying), or use a scripting language to re-draw the portlet (very difficult). If you use a scripting language, you will need to make sure it works for all of the devices you want to support, and you will also need to make sure your portlet still works if scripts are disabled by some of your users. For mobile devices, you will probably need to have alternate JSP pages that do not use scripts. Native applications or Java applications are easier to make highly interactive than are Web applications.
Portlets need to live "within their box." Be careful if you have a link in a portlet that takes you to a Web page outside of the portal server environment, because it is difficult to get back to the portal after that. Frames are not allowed (Internal frames are allowed, but only Microsoft Internet Explorer users can see them). Pop-up windows and scripts usually cannot be used for mobile devices. If you can't make your application fit into the portal framework, don't make it into a poorly behaved portlet.
If you will want to provide services to other applications, consider writing a Web service first. Once you implement a Web service, you can write a portlet to use it, and you can publish the Web service to share it with other applications. The stock portlet above is a good example: the stock quote service should be a Web service that the stock quote portlet and other applications can use. In this case, you might also write a program that automatically sends users a text pager message when a stock reaches a certain price.
If your company does not have a portal server yet, and does not plan to invest in one immediately, go ahead and implement your application as a servlet using JSP pages for the output. You can always convert it to a portlet later.
Portlets are not fully standardized, and they are not yet supported on as many platforms as the other Java technologies. Until then, you will have to lock yourself into one portal server on the server platforms that it supports. If this is not an acceptable trade-off, implement your application as a servlet first and you will be able to convert it to a portlet later.
The following is from "A preview of Lotus Discovery Server 2.0" by Wendi Pohs and Dick McCarrick, published in May 2002. IBMers can find this article at http://www-10.lotus.com/ldd/today.nsf/lookup/DSpreview.
Few would dispute that knowledge ranks among an organization's most precious assets—if you can find it when you need it. But knowledge is not always easy to locate. Ideally, combined intellect and experience of the entire group should be readily available to everyone all the time.
With some of the tools available today, you can search for all the information your organization or the Internet has collected on a topic, but too often the results are disappointing. The information is too voluminous, unfocused, or out of context—or simply too difficult for the average person to understand. What users really need is experience—the perspective of someone who has done it before. That experience could come in the form of a document, or (even better) a person to whom you can look for assistance. So what's needed is a product that speeds the knowledge management process in a way that brings new scalability to the task of analyzing your ever-growing volume of content. It should also eliminate the need to manually process and manage knowledge. The Lotus Discovery Server answers these needs.
The Lotus Discovery Server is a back-end server for managing your organization's knowledge. The Discovery Server provides sophisticated tools that categorize documents and user information into browsable and searchable form. These tools include the following.
The Knowledge Map or K-map (also called the taxonomy or catalog) is a graphical representation of your organization's knowledge. It displays a hierarchical set of categories and documents you can use to find information. The K-map is the backbone of the Discovery Server search-and-browse user interface. From the K-map interface, you can locate content from many disparate sources, by drilling down through subject categories, using full-text search, or using a combination of both search strategies. Additional information about the relationships between people and document activity adds value and context to the user's search and retrieval experience. Because the K-map displays related documents, people, and places in categories, users can browse and search for information in context.
The K-map Building Service creates the K-map, which you can subsequently modify using the K-map Editor (explained next). The K-map Building Service builds document categories, creates labels for these categories, and places new documents into existing categories. It also identifies documents that do not fit into any existing categories.
The K-map Editor is a client application that lets you fine-tune the K-map to meet the needs of your organization. Neither the K-map Building Service nor any other automatic process can predict precisely how an organization wants to structure its content. It can only build a K-map based on the words in the content. Once the basic K-map is built, you can use the K-map Editor to drag categories from one level to the next, re-label them with preferred terms, and place documents in different categories. This in turn helps teach Discovery Server how to categorize documents with similar content in the future.
Profiles help identify the right people for the right job. Profiles collect existing user information from the directory and other sources, providing a more complete representation of the users in your organization.
Spiders are multithreaded processes that collect data. This data can exist in a number of different file formats, including:
Exchange e-mail and public folders
Windows-compatible operating system files
Lotus Team Workplace
Lotus Notes databases and e-mail
Once the spiders collect this data, the K-map Building Service processes it to create the K-map.
Metrics, which is also called affinities processing, is a computational program that looks at existing documents and relationships between documents and people. The metrics component does two things. First, it calculates the value of a document. Second, it calculates an affinity between a person and categories, based on the person's interactions with documents in the categories, which in turn helps produce category affinities.
Other tools—Additionally, administration tools let you install, set up, and maintain Discovery Server, and security features protect your data.
What's new in Discovery Server 2.0?—A revamped architecture for the K-map user interface and editor, with more accessibility and better display of search results
Improved "people awareness" for Profiles
Better spider support for Domino.Doc and Lotus Team Workplace content
Enhanced control over metrics processing
Easier installation, setup, and maintenance
K-map: Improved interface, expanded features—The 2.0 K-map offers a user interface that features a new servlet-based architecture. This gives you more scalability and better performance. Other K-map enhancements include accessibility, improved display of search results, bookmarking, and better support for opening the most appropriate replica of Lotus Notes databases.
Easier-to-use, more accessible interface—The new K-map user interface consists of two main tabs, Browse and Search, and Search Results. In addition, a new URL-addressable page appears for all the following actions:
Browse to a new category
Perform a search
Navigate to a different tab within the same set of search results
Perform a refined search
Navigate between sets of iterated search results
Page through a set of list results
Sort a list
Each page that appears as a result of these actions is put on the browser history, so you can return to it via the Back/Forward buttons in the browser interface. You can also bookmark these pages.
Lotus is also committed to offering an accessible interface for Discovery Server. This includes supporting all functionality with consistent and simple keyboard navigation of Discovery Server controls, fields, and hyperlinks. We will also aid low-vision users by:
Providing enhanced color contrast in the standard interface design
Supporting the Windows "High Contrast" palette schemes in the Appearance tab of the Control Panel - Display settings
Supporting user-defined font settings in both the operating system and the browser
Other K-map user interface features introduced in 2.0 include:
K-map search results offer People summaries in search results. We've also added a "Go to K-map Category" link to the summary of each document in search results.
Supported documents types include documents stored in Exchange public folders. Discovery Server 2.0 also offers better handling of Lotus Notes attachments, OLE embeddings, Domino.Doc documents and Lotus Team Workplace documents.
Bookmarking lets you bookmark (whether from the Search or Browse page) and have Discovery Server save both the Browse state and the Search parameters.
Saving state maintains a single saved state so that whenever you access the K-map from a particular computer, the interface is configured exactly as it was the last time you logged in. In response to customer feedback, the K-map no longer returns you to the category you were in when you last closed the K-map. This helps improve performance. Key information saved through this feature includes:
Document and People List heights in Browse and Search
Column order, in Document and People Lists in Browse and Search; and on Document, People, and Category tabs in Search Results
Sort order in Document and People Lists in Browse and Search
Column widths, in Document and People Lists in Browse and Search; and on Document, People, and Category tabs in Search Results
Summaries on/off, in Document and People Lists in Browse and Search; and on Document, People, and Category tabs in Search Results
The K-map Editor interface now lets you display a special K-map Editor accessible Reports view. This lets you create, schedule, and delete two new reports to aid your editing. These reports show both the number of documents per category, subtotaled per branch, and the documents that are new to a category (having been added automatically or manually).
Another new K-map Editor feature is category visibility. This option allows you to keep categories hidden until they are ready to be viewed by your users. Then when you publish a category, it becomes visible to end users via the K-map interface, as long as its parent categories are also published. (New categories assume the visibility of their parent by default.) Note that hiding a category does not prevent users from seeing the documents in that category if they are returned via K-map search. However, it does prevent you from getting affinities to that category.
To make it easier to optimize taxonomies, Discovery Server 2.0 offers two new options, Request Subdivide and Request Retrain.
Request Subdivide tells the K-map Building Service to divide the selected category into subcategories. When you select this option, the K-map Building Service attempts to create these subcategories, based on the total number of documents in the category (and the maximum number of documents per category you specified). If the K-map Building Service determines it can create two or more subcategories with reasonable fit values (and fairly evenly distributed documents), it moves documents from the specified category to the new subcategories, leaving the original category empty. If the selected category can't be usefully subcategorized, the K-map Building Service returns an error message.
Request Retrain helps "teach" the K-map Building Service how to categorize documents the way you want. After you manually move documents into a category, this option tells the K-map Building Service to place new documents with similar content into this category in the future. You can retrain selected categories, or the entire K-map taxonomy.
Other K-map Editor enhancements include an icon that indicates whether documents will launch in their native application or in the browser, a set Document Status options to set status for selected documents to locked or unlocked, and a Doc Counts option that displays the number of documents per category
In response to user demand, we've made the K-map Building Service smarter when handling categorization. For example, in Discovery Server 1.0, if you manually moved a document into a category, K-map Building Service automatically assumed you wanted it to remain there forever, even if the category grew so big it required subcategorization to navigate properly. It would never change the categorization of this document. In Discovery Server 2.0, the K-map Building Service is free to subcategorize these documents as appropriate.
By default, all categories created by the K-map Building Service are hidden, so you can review them prior to making the categories available to users.
You can also import an existing file system taxonomy and use it as the basis for your K-map. To do this, Discovery Server 2.0 lets you import the file/folder document taxonomy from your operating system. You can then use the K-map Editor to modify the taxonomy into the K-map you want.
Profiles include enhanced people awareness, the ability to determine the online status of selected members of your organization. We've also upgraded Person profile documents, which now include a more accessible interface.
People awareness incorporates Lotus Instant Messaging functionality to transform passive name references into dynamic resources. These provide information about the person's current online status. Directly from the name reference, you can contact the person, find out more about them, and initiate other application-specific commands.
When you log on Discovery Server 2.0 (either via the K-map or though a profile), you automatically log on to the Lotus Instant Messaging server. Then anywhere your name appears within Discovery Server, other users will be able to tell you are online. Additionally, wherever they see your name displayed in K-map or in a profile, a Lotus Instant Messaging status icon appears to indicate your online status. (If you don't have a Lotus Instant Messaging server specified in your Domino Directory Person document, Discovery Server assumes Lotus Instant Messaging is not available and doesn't display your online status.)
People awareness lets you identify your most knowledgeable people in a particular area, and then contact them immediately by initiating a Lotus Instant Messaging dialog. This is especially useful when you need immediate information or need a question answered in real time.
Person profiles—We've given Person profile documents a new servlet-based interface, to ensure the same "look-and-feel" as K-map. This also gives you better support for text resizing using your browser's View - Text Size commands. We've also made the Person profile interface more accessible.
One Discovery Server customer request is the ability to see all names the user is known by throughout the system. So at the bottom of the Contact Information page, we've added a new field called "Other user names."
We've also made some changes to the Affinities interface. We've moved the interface for approving proposed affinities into the profile document. For example, if there are proposed affinities, they appear in a table within the profile document. (Note that if you approve proposed affinities but then cancel out of the profile document without saving it, the approvals are also cancelled.)
And in response to other user feedback, we display the "Declaring Affinities" description in Read Mode on the Affinities page if the person viewing the profile document has edit rights.
Spiders—Better support for Domino.Doc and Lotus Team Workplace
Discovery Server 2.0 supports spidering Domino.Doc and Lotus Team Workplace documents. We've also enhanced other spiders, such as the one for Lotus Notes/Domino, and added a spider for Exchange.
Domino.Doc spider—We offer spidering for both Domino.Doc 3.1 and 3.0. Capabilities include:
Documents and their attachments are treated as single documents.
A Domino.Doc repository is defined at the File Cabinet level.
Forum (discussion) docs are spidered.
There is an administration option to spider all, or only latest versions.
Domino.Doc-specific field mapping is spidered.
There is support for archived documents.
Lotus Team Workplace spider—We also support spidering for Lotus Team Workplace. Capabilities include:
A Lotus Team Workplace repository defined at the Main.nsf level.
Embedded pages are treated as one entity.
Lotus Team Workplace-specific field mapping is spidered.
Handling documents with attachments—Spiders in Discovery Server 2.0 have improved the handling of documents that do not contain text but do contain one or more attachments. These so-called "sparse container" documents will be classified with their attachments. The title of the attachment identifies the sparse container document.
Metrics—More detailed and easier to understand
Metrics now consists of four separate services: Profile Maintenance, Metrics Reporting, Affinity Processing, and Metrics Processing. These comprised a single service in Discovery Server 1.0; we separated them in response to user demand for a less "black box" approach to Metrics. All four services can only run on one server at a time; but you can move them from one server to another, and schedule them individually.
Profile Maintenance processes user edits to profile documents.
Metrics Reporting creates Metrics reports.
Metrics Processing computes document values, updates the Discovery Server data with new affinities and new document values, and updates full-text search with new values.
Affinity Processing calculates affinities, proposes and publishes affinities, sends affinity e-mails, and updates profile documents with affinities. This service provides more control over when affinities are generated.
Installation, setup, and maintenance—More customer control
Discovery Server 2.0 offers significantly enhanced administration functionality. This lets you have greater control over installation, setup, and maintenance. And you'll have a better idea of what's going on "under the hood" of Discovery Server.
Installation—The installation dialog box now includes numerous checks and warnings to better guide you through installing and upgrading. Also, we no longer install the K-map Editor with the server. We have discovered that in practice, installing the K-map Editor on the server is rarely done—usually only for demo purposes.
We also modified the Admin Name & Password screen to let you provide your own user name. This can be an account already defined on the system or created on the fly if one doesn't exist.
Setup—In Release 2.0, we re-worked the setup screens to be easier and more intuitive. This includes better input validation and error messages.
Maintenance—As a response to customer requests for more control over Discovery Server (and to accommodate new features introduced in Release 2.0), we have incorporated new administration functionality. For example, we made the interface for enabling the XML spider always visible, instead of requiring you to set an INI variable to do this.
We've also added a new interface for replicating K-map data to a secondary server. This includes a "K-map Replication" checkbox in the Server document on the primary Discovery Server, as well as a "K-map Replica" checkbox in the Server document of Secondary Discovery Servers. After you check this option and save the Server document, replication will copy the K-map data from the primary to the specified secondary servers. These secondaries can then serve end users, helping you balance user workload among several machines.
Other maintenance features include:
A new interface for enabling the Exchange spider
A drill-down model to support large numbers of repositories
Better "paging" capability in multi-page views and logs
The ability to move Metrics processing from one server to another
Improved interface for choosing repositories for creating the K-map
Updates to Service Status views
Logging—We've improved logging messages to be more informative. For example, the K-map Building log has been significantly enhanced, with more logging of editor activities. And to keep you better appraised as Lotus Team Workplace and Domino.Doc spiders run, we provide a new message type to enable these spiders to post interim begin/end messages as each room/binder is processed.
Discovery Server 2.0 API Toolkit—As mentioned earlier, users have asked for a more open approach. To meet this need, we're providing a complete Discovery Server API Toolkit. This allows you to customize Discovery Server to suit your organization's exact requirements. Third-party software developers will also use this Toolkit to develop their own solutions based on Discovery Server functionality.
Data repositories—Discovery Server 2.0 includes many new features to better manage your data repositories. For example, you can temporarily prevent a repository from being spidered, and have it start up again later, without disabling the spider. You can then have the repository requeued automatically. You can also stop spidering a repository (for instance, because you made a mistake in defining it), delete it, and start over with new parameters. Other new data repository functionality include:
Preventing two repository records from pointing to the same source
Supporting a "delete and make new copy" action from the main Repositories view
Providing the ability to unqueue a repository
An interface for spidering Exchange e-mail and Public Folders
Options for traversing Domino.Doc and Lotus Team Workplace hierarchy
Options for spidering Domino.Doc and Lotus Team Workplace revisions
The ability to edit Field Map fields after the repository has been queued/spidered
And we've added an interface for specifying a subset of repository data to be spidered, per customer request.
Field mapping—Our goal in this release is to only map what the administrator selects to map—in other words, what you define in the profile forms. This means there will be no "identity mapping" (automatically mapping a field name in the source to the same field name in the profile document). Also in Release 2.0, we provide better coverage for well-known data mappings in the default data field map ($Global).
People sources—New features in this area include:
An interface to specify "Process all documents during next run"
An interface to associate supplemental sources with all or some subset of authoritative sources
Support for additional LDAP search parameters BASEDN and SCOPE
Documentation—Discovery Server 2.0 documentation includes a Deployment Guide, available shortly after product ship. This should be a "must-read" before a customer site begins to install and implement Discovery Server.
We recommend that you start with the following products:
IBM WebSphere Studio is an open comprehensive development environment for building, testing and deploying dynamic on demand e-business applications. Founded on open technologies and built on Eclipse, WebSphere Studio provides a flexible, portal-like integration of multi-language, multi-platform and multi-device application development tools that maximize your productivity, increase ROI and improve overall time to value.
IBM WebSphere Application Server is a high performance and extremely scalable transaction engine for dynamic e-business applications. The Open Services Infrastructure allows companies to deploy a core operating environment that works as a reliable foundation capable of handling high volume secure transactions and Web services. WebSphere continues the evolution to a single Web services-enabled, Java 2 Enterprise Edition (J2EE) application server and development environment that addresses the essential elements needed for an on demand operating environment.
You can use WebSphere Business Integration Server V4.2 to quickly integrate new or existing applications or systems on diverse platforms, create and rapidly deploy new business processes, or solve a variety of business integration needs.
DB2 Universal Database—DB2 Version 8.1 helps solve critical business problems by integrating information across the entire enterprise by leveraging federated Web services and XML. DB2 is delivering new federated capabilities that enable customers to integrate information as Web services. DB2 also delivers new XML enhancements that make it easier for programmers to integrate DB2 and XML information.
DB2 Information Integrator—This new family of products is designed to help you integrate structured, semi-structured and unstructured information effectively and efficiently. These products, based on the previously disclosed Xperanto project, provide the foundation for a strategic information integration framework to help access, manipulate, and integrate diverse, distributed and real time data.
Lotus Domino 6 represents an evolution of the platform, adding features that increase the strength of Domino as a strategic platform for Web applications, messaging, and e-business. New enhancements for application design and expanded support for standards-based development methodologies give developers of all skill levels even more capabilities to add collaboration to new or existing Web, Lotus Notes, or mobile applications. Domino 6 server provides platform choice in both hardware and operating system, with further enhancements in security, administration, performance, enterprise data integration, and directory options.
Rational Rose family—The Unified Modeling Language (UML) has become the software industry's standard notation for representing software architecture and design models. Many development organizations are finding that modeling with the UML helps them build better software faster. Rational Rose software is your solution for building better software faster with the UML.
Lotus Discovery Server—This knowledge management server for e-business users provides organizations with the easiest way to organize and locate resources across various systems and information sources.
Lotus Instant Messaging is real time collaboration software with instant messaging, whiteboarding, and application sharing capabilities.
Tivoli Identity Manager—Companies are increasing the number of users (customers, employees, partners and suppliers) who are allowed to access information. As IT is challenged to do more with fewer resources, effectively managing user identities throughout their lifecycle is even more important. IBM Tivoli Identity Manager provides policy-based identity management across legacy and e-business environments. Intuitive Web administrative and self-service interfaces integrate with existing business processes to help simplify and automate managing and provisioning users. It incorporates a workflow engine and leverages identity data for activities such as audit and reporting.
Virtualized networks serve up computing as needed. Emerging grid technologies, for example, allow a collection of resources to be shared and managed just as if they were one large, virtualized computer.
The following is from the article of the same name by Laura Haas and Eileen Lin (Information Integration Development, San Jose), published March 2002. IBMers can find this material at http://www7b.software.ibm.com/dmdd/library/techarticle/0203haas/0203haas.html.
In a large modern enterprise, it is almost inevitable that different portions of the organization will use different database management systems to store and search their critical data. Competition, evolving technology, mergers, acquisitions, geographic distribution, and the inevitable decentralization of growth all contribute to this diversity. Yet it is only by combining the information from these systems that the enterprise can realize the full value of the data they contain.
For example, in the finance industry, mergers are an almost commonplace occurrence. The newly created entity inherits the data stores of the original institutions. Many of those stores will be relational database management systems, but often from different manufacturers; for instance, one company may have used primarily Sybase, and another Informix?IDS. They may both have one or more document management systems—such as Documentum or IBM Content Manager—for storing text documents such as copies of loans, etc. Each may have applications that compute important information (for example, the risk of a loan to a given customer), or mine for information about customers' buying patterns.
After the merger, they need to be able to access all customer information from both sets of stores, analyze their new portfolios using existing and new applications, and, in general, use the combined resources of both institutions through a common interface. They need to be able to identify common customers and consolidate their accounts, although the different companies may have referred to their customers using totally different identifying keys. Federation technologies can significantly ease the pain in these situations by providing a unified interface to diverse data.
IBM has made significant investments in federation technologies that have resulted in market leading capabilities across the Data Management product portfolio. Today, federation capabilities enable unified access to any digital information, in any format—structured and unstructured, in any information store. Federation capabilities are available today through a variety of IBM products including DB2 UDB (and DB2 Relational Connect), DB2 DataJoiner? and IBM Enterprise Information Portal (EIP). This set of federation technologies continues to be enhanced and our customers' investments in all of these products continue to deliver real business value.
This section focuses specifically on advanced database federation capabilities, implemented through a technology sometimes referred to by the code name Garlic, which represent the next generation of information federation enhancements from IBM software. These enhancements will enable clients to access and integrate the data and specialized computational capabilities of a wide range of relational and non-relational data sources. The Garlic technology will be incorporated into all IBM software offerings that provide federation technology over time. Customers may rest assured that not only will their investments in existing products be protected, but also that in the future, no matter which product is selected, they will be able to leverage the advanced capabilities described here.
IBM's federated database systems offer powerful facilities for combining information from multiple data sources. Built on best-of-breed technology from an earlier product, DB2 DataJoiner, and enhanced with leading-edge features for extensibility and performance from the Garlic research project, IBM's federated database capabilities are unique in the industry. DB2 DataJoiner introduced the concept of a virtual database, created by federating together multiple heterogeneous relational data sources. Users of DB2 DataJoiner could pose arbitrary queries over data stored anywhere in the federated system, without worrying about the data's location, the SQL dialect of the actual data stores, or the capabilities of those stores. Instead, users had the full capabilities of DB2 against any data in the federation. The Garlic project demonstrated the feasibility of extending this idea to build a federated database system that effectively exploits the query capabilities of diverse, possibly non-relational data sources. In both of these systems, as in today's DB2, a middleware query processor develops optimized execution plans and compensates for any functionality that the data sources may lack.
In this section, we describe the key characteristics of IBM's federated technology: transparency, heterogeneity, a high degree of function, autonomy for the underlying federated sources, extensibility, openness, and optimized performance. We then "roll back the covers" to show how IBM's database federation capabilities work. We illustrate how the federated capabilities can be used in a variety of scenarios, and conclude with some directions for the future.
Transparency—If a federated system is transparent, it masks from the user the differences, idiosyncrasies, and implementations of the underlying data sources. Ideally, it makes the set of federated sources look to the user like a single system. The user should not need to be aware of where the data is stored (location transparency), what language or programming interface is supported by the data source (invocation transparency), if SQL is used, what dialect of SQL the source supports (dialect transparency), how the data is physically stored, or whether it is partitioned and/or replicated (physical data independence, fragmentation and replication transparency), or what networking protocols are used (network transparency). The user should see a single uniform interface, complete with a single set of error codes (error code transparency). IBM provides all these features, allowing applications to be written as if all the data were in a single database, although, in fact, the data may be stored in a heterogeneous collection of data sources.
Heterogeneity is the degree of differentiation in the various data sources. Sources can differ in many ways. They may run on different hardware, use different network protocols, and have different software to manage their data stores. They may have different query languages, different query capabilities, and even different data models. They may handle errors differently, or provide different transaction semantics. They may be as much alike as two Oracle instances, one running Oracle 8i, and the other Oracle 9i, with the same or different schemas. Or they may be as diverse as a high-powered relational database, a simple, structured flat file, a Web site that takes queries in the form of URLs and spits back semi-structured XML according to some DTD, a Web service, and an application that responds to a particular set of function calls. IBM's federated database can accommodate all of these differences, encompassing systems such as these in a seamless, transparent federation.
A high degree of function—IBM's federated capability provides users with the best of both worlds: all the function of its rich, standard-compliant DB2 SQL capability against all the data in the federation, as well as all the function of the underlying data sources. DB2's SQL includes support for many complex query features, including inner and outer joins, nested subqueries and table expressions, recursion, user-defined functions, aggregation, statistical analyses, automatic summary tables, and others too numerous to mention. Many data sources may not provide all of these features. However, users still get the full power of DB2 SQL on these sources' data, because of function compensation. Function compensation means that if a data source cannot do a particular query function, the federated database retrieves the necessary data and applies the function itself. For example, a file system typically cannot do arbitrary sorts. However, users can still request that data from that source (in other words, some subset of a file) be retrieved in some order, or ask that duplicates be eliminated from that data. The federated database will simply retrieve the relevant data, and do the sort itself.
While many sources do not provide all the function of DB2 SQL, it is also true that many sources have specialized functionality that the IBM federated database lacks. For example, document management systems often have scoring functions that let them estimate the relevancy of retrieved documents to a user's search. In the financial industry, time series data is especially important, and systems exist that can compare, plot, analyze, and subset time series data in specialized ways. In the pharmaceutical industry, new drugs are based on existing compounds with particular properties. Special-purpose systems can compare chemical structures, or simulate the binding of two molecules. While such functions could be implemented directly, it is often more efficient and cost-effective to exploit the functionality that already exists in data sources and application systems. IBM allows the user to identify functions of interest from the federated sources, and then to use them in queries, so that no function of a source need be lost to the user of the federated system.
Extensibility and openness of the federation—All systems need to evolve over time. In a federated system, new sources may be needed to meet the changing needs of the users' business. IBM makes it easy to add new sources. The federated database engine accesses sources via a software component know as a wrapper. Accessing a new type of data source is done by acquiring or creating a wrapper for that source. The wrapper architecture enables the creation of new wrappers. Once a wrapper exists, simple data definition (DDL) statements allow sources to be dynamically added to the federation without stopping ongoing queries or transactions.
Any data source can be wrapped. IBM supports the ANSI SQL/MED standard (MED stands for Management of External Data). This standard documents the protocols used by a federated server to communicate with external data sources. Any wrapper written to the SQL/MED interface can be used with IBM's federated database. Thus wrappers can be written by third parties as well as IBM, and used in conjunction with IBM's federated database.
Autonomy for data sources—Typically a data source has existing applications and users. It is important, therefore, that the operation of the source is not affected when it is brought into a federation. IBM's federated database does not disturb the local operation of an existing data source. Existing applications will run unchanged, data is neither moved nor modified, interfaces remain the same. The way the data source processes requests for data is not affected by the execution of global queries against the federated system, though those global queries may touch many different data sources. Likewise, there is no impact on the consistency of the local system when a data source enters or leaves a federation. The sole exception is during federated two phase commit processing for sources that participate. Data sources involved in the same unit of work will need to participate in commit processing and can be requested to roll back the associated changes if necessary.
Unlike other products, our wrapper architecture does not require any software to be installed on the machine that hosts the data source. We communicate with the data source via a client server architecture, using the source's normal client. In this way, IBM's federated data source looks like just another application to the source.
Optimized performance—The optimizer is the component of a relational database management system that determines the best way to execute each query. Relational queries are non-procedural and there are typically several different implementations of each relational operator and many possible ordering of operators to choose from in executing a query. While some optimizers use heuristic rules to choose an execution strategy, IBM's federated database considers the various possible strategies, modeling the likely cost of each, and choosing the one with the least cost. (Typically, cost is measured in terms of system resources consumed).
In a federated system, the optimizer must decide whether the different operations involved in a query should be done by the federated server or by the source where the data is stored. It must also determine the order of the operations, and what implementations to use to do local portions of the query. To make these decisions, the optimizer must have some way of knowing what each data source can do, and how much it costs. For example, if the data source is a file, it would not make sense to assume it was smart, and ask it to perform a sort or to apply some function. On the other hand, if the source is a relational database system capable of applying predicates and doing joins, it might be a good idea to take advantage of its power if it will reduce the amount of data that needs to be brought back to the federated engine. This will typically depend on the details of the individual query. The IBM optimizer works with the wrappers for the different sources involved in a query to evaluate the possibilities. Often the difference between a good and a bad decision on the execution strategy is several orders of magnitude in performance. IBM's federated database is unique in the industry in its ability to work with wrappers to model the costs of federated queries over diverse sources. As a result, users can expect the best performance possible from their federated system.
To further enhance performance, each wrapper implementation takes advantage of the operational knobs provided by each data source using the source's native API. For example, blocking multiple result rows into one message (also known as block fetch) is a common performance knob. The query compiler will communicate with the wrapper to indicate which query fragments can utilize block fetch and thus achieve the maximal performance at runtime without loss of query semantics.
Architecture—IBM's federated database architecture is shown in Figure A-23 on page 327. Applications can use any supported interface (including ODBC, JDBC, or a Web service client) to interact with the federated server. The federated server communicates with the data sources by means of software modules called wrappers.
Figure A-23: Architecture of an IBM federated system
Configuring a federated system—A federated system is created by installing the federated engine and then configuring it to talk to the data sources.
Query processing—After the federated system is configured, an application can submit a query written in SQL to a federated server. The federated server optimizes the query, developing an execution plan in which the query has been decomposed into fragments that can be executed at individual data sources. As mentioned above, many decompositions of the query are possible, and the optimizer chooses among alternatives on the basis of minimum estimated total resource consumption. Once a plan has been selected, the federated database drives the execution, invoking the wrappers to execute the fragments assigned to them. To execute a fragment, the wrapper performs whatever data source operations are needed to carry it out, perhaps a series of function calls or a query submitted to the data source in its native query language. The resulting streams of data are returned to the federated server, which combines them, performs any additional processing that could not be accomplished by a data source, and returns the final result to the application.
At the heart of IBM's approach to federated query processing is the manner in which the federated server's optimizer and the wrappers together arrive at a plan for executing the query. The optimizer is responsible for exploring the space of possible query plans. Dynamic programming is the default method used in join enumeration, with the optimizer first generating plans for single-table accesses, then for two-way joins, etc. At each level, the optimizer considers various join orders and join methods, and if all the tables are located at a common data source, it considers performing the join either at the data source or at the federated server. This process is illustrated in Figure A-24.
Figure A-24: Query planning for joins
The optimizer works differently with relational and non-relational wrappers. The optimizer models relational sources in detail, using information provided by the wrapper to generate plans that represent what it expects the source to do.
However, because non-relational sources do not have a common set of operations or common data model, a more flexible arrangement is required with these sources. Hence the optimizer works with the non-relational wrappers:
The IBM federated database submits candidate query fragments called "requests" to a wrapper if the query fragments apply to a single source.
When a non-relational wrapper receives a request, it determines what portion, if any, of the corresponding query fragment can be performed by the data source.
The wrapper returns a reply that describes the accepted portion of the fragment. The reply also includes an estimate of the number of rows that will be produced, an estimate of the total execution time, and a wrapper plan: an encapsulated representation of everything the wrapper will need to know to execute the accepted portion of the fragment.
The federated database optimizer incorporates the reply into a global plan, introducing additional operators as necessary to compensate for portions of fragments that were not accepted by a wrapper. The cost and cardinality information from the replies is used to estimate the total cost of the plan, and the plan with minimum total cost is selected from among all the candidates. When a plan is selected, it need not be executed immediately; it can be stored in the database catalogs and subsequently used one or more times to execute the query. Even if a plan is used immediately, it need not be executed in the same process in which it was created, as illustrated in Figure A-25 on page 330.
Figure A-25: Compilation and runtime for non-relational sources
Why is a federated system useful? How do customers use federation capabilities? In general, a federated system is useful in any situation in which there are multiple sources of data, and a need to combine the information from these various sources. In this section we look at how some customers are using IBM's federated technology to solve their business problems today.
Distributed operations: A major pharmaceutical company—Many companies today are global companies, with a need to coordinate activities in multiple locations throughout the world. For example, a pharmaceutical company might have research labs in both Europe and the US. Each of the labs houses scientists looking for new drugs to battle particular diseases. The scientists all have access to databases of chemical compounds, stored in special-purpose systems that allow searching by particular characteristics of the compounds or by chemical structure (structural similarity). In both labs, the scientists run high throughput screenings of compounds to test their effectiveness against different biological targets. The results of these tests are stored in relational databases at each lab. Other data sources accessed by the scientists include large flat files of genomic and proteomic information, patent databases, spreadsheets of data and analysis, images and text documents.
The scientists in the two labs have different missions, different cures or treatments that they are pursuing. This leads them to do different experiments, and to focus on particular sets of compounds. However, often the same compounds may be useful against different targets, and sometimes one test may be a good indicator of results for other tests. Thus it is important for the scientists at one lab to have access to the data being produced at the other, so as not to duplicate effort. While this could be accomplished by building a large warehouse with all the compound data and test results, there are several drawbacks to that approach. First, the test result data changes rapidly, with thousands of records being added every day from both sides of the Atlantic, making maintenance difficult. Second, the warehouse must either be replicated at both sites, or one site must suffer slower performance for accessing the data. Replication adds to the cost of the solution and the complexity of maintenance. Third, the compound data, today stored in specialized repositories, would need to be migrated to a relational base, including re-implementing the search algorithms and any existing applications.
A federated solution eliminates these issues. Data is left in the existing data sources, with their native access paths, and current applications run unchanged. However, it is easy to build new applications that can access data from any of the sources, regardless of continent. Local data stays local, for rapid access. The less frequently used remote data is still accessible, as needed, and queries are optimized by the federated server to ensure that they are retrieved as efficiently as possible. Replication can still be used if desired for those portions of the data that are heavily accessed by both laboratories.
Heterogeneous replication—Many businesses choose to keep multiple copies of their data. For example, one major retailer with outlets all over the United States backs up data from its various locations at regional warehouses. The retail outlets use one relational database management system; the warehouses are implemented using another DBMS that scales better. However, this poses the problem of how to transfer the data from source to warehouse. IBM's federated technology makes it easy to not only move data, selecting it from the sources and inserting it into the warehouse, but to re-shape it as well, aggregating information from the various outlets before inserting it into the warehouse.
IBM provides a replication product, DB2 DataPropagator™ that helps you integrate your distributed database environment by replicating data between relational databases using the features of the federated database. DataPropagator automates the copying of data between remote systems, avoiding the need to unload and load databases manually. For a non-DB2 relational source, Capture triggers are defined to capture changes to the source and write them to a staging table. The Apply program, which runs on an IBM federated database server, uses a nickname for the staging table to copy the changes from the staging table to a target table in the IBM federated database or in another non-DB2 relational database. Heterogeneous replication is thus made easy thanks to federated technology.
Distributed data warehouse—Implementing a distributed data warehouse has been shown to provide higher availability and lower overall cost. An enterprise can create several data marts that store only high level summaries of data derived from the warehouse. With IBM's federated technology, data marts and warehouse can be on separate systems, yet users of the data mart can still drill down with ease from their local level of summarization into the warehouse. Federated technology shields the users, who have no need to know that the data warehouse is distributed, by providing a virtual data warehouse.
Geospatial application—A bank needs to choose a location for a new branch office. The location chosen must maximize the expected profit. To do so, the bank needs to consider for each location the demographics of the surrounding neighborhood (Do the demographics fit the targeted customer base?), the crime rate in the area (A low crime rate is important for retail operations.), proximity of the site to major highways (to attract customers from outside the immediate area), proximity of major competitors (A site with little competition will most likely mean higher sales.), and proximity to any known problem areas that must be avoided (A dump or other unattractive feature of the neighborhood could negatively impact business.). Some of the needed information will come from the bank's own databases. Other information must be retrieved from external data stores with information on the community. This application illustrates the need to integrate geospatial data with traditional business data. It requires advanced query analysis functions for correlating the data, and end user tools that can visually display the data in a geospatial context.
Traditionally, geospatial data have been managed by specialized geographic information systems (GISes) that cannot integrate spatial data with other business data stored in the company's RDBMS and in external data sources. DB2 Spatial Extender is the product of collaboration with an IBM partner, the Environmental Systems Research Institute (ESRI). DB2 Spatial Extender works with an IBM federated database to give customers the best of both worlds. The customer can take advantage of the geospatial intelligence built into the DB2 Spatial Extender combined with the vast amount of available business information from the federated system. This enables the organization to enhance its understanding of its business, leverage the value of existing data, and build sophisticated new applications, leading to business success.
Despite considerable attention from the research community, few commercial database management systems have addressed the problem of integrating relational and non-relational data sources into a federation. With it's federation capabilities, IBM has made significant progress toward this goal. IBM's unique federated query processing technology allows users to enjoy all the power of DB2 SQL coupled with the power of individual data sources. It provides users with all the benefits of transparency, heterogeneity, a high degree of function, autonomy for the underlying federated sources, extensibility, openness, and optimized performance. Federation is being used today to solve many important business needs.
In the future, we will continue to work to improve the performance and functionality of the federation. For example, a form of caching can already be accomplished using an automatic summary table (AST) mechanism, which allows administrators to define materialized views of data in a set of underlying tables—or nicknames. For certain classes of queries, the database can automatically determine whether a query can be answered using an AST, without accessing the base tables. In addition to constantly improving performance, we are also working on tools to aid in the configuration, tuning and administration of federated systems. Tools for generating statistics for data from non-relational sources and for monitoring the behavior of a federated system are underway. Tools to assist wrapper developers are also in development.
Finally, even a well-designed federated database management system and an accompanying set of tools remains a partial solution to the larger problem of data integration. A comprehensive solution will have to integrate applications as well as data, and address higher-level issues like data quality, annotation, differences in terminology, and business rules that indicate when and how information is to be combined. IBM is focusing on this broader set of information integration requirements to enable customers to satisfy their business integration requirements, and database-style federation is just one key integration technology.
The following is from the article of the same name by Mary Roth and Dan Wolfson (DBTI for e-business, IBM Silicon Valley Lab), published in June 2002. IBMers can find this at http://www7b.software.ibm.com/dmdd/library/techarticle/0206roth/0206roth.html.
The explosion of the Internet and e-business in recent years has caused a secondary explosion of information. Industry analysts predict that more data will be generated in the next three years than in all of recorded history. Enterprise business applications can respond to this information overload in one of two ways: they can bend and break under the sheer volume and diversity of such data, or they can harness this information and transform it into a valuable asset by which to gain a competitive advantage in the marketplace.
Because the adoption of Internet-based business transaction models has significantly outpaced the development of tools and technologies to deal with the information explosion, many businesses find themselves unintentionally using the former approach. Significant development resources are spent on quick and dirty integration solutions that cobble together different data management systems (databases, content management systems, enterprise application systems) and transform data from one format to another (structured, XML, byte streams). Revenue is lost when applications suffer from scalability and availability problems. New business opportunities are simply overlooked because the critical nuggets of information required to make a business decision are lost among the masses of data being generated.
In this section, we propose a technology platform and tools to harness the information explosion and provide an end-to-end solution for transparently managing both the volume and diversity of data that is in the marketplace today. We call this technology information integration. IBM provides a family of data management products that enable a systematic approach to solve the information integration challenges that businesses face today. Many of these products and technologies are showcased in the Information Integration technology preview.
The foundation of the platform is a state-of-the art database architecture that seamlessly provides both relational and native XML as first class data models. We believe that database technology provides the strongest foundation for an information integration platform for three significant reasons:
DBMSs have proven to be hugely successful in managing the information explosion that occurred in traditional business applications over the past 30 years. DBMSs deal quite naturally with the storage, retrieval, transformation, scalability, reliability, and availability challenges associated with robust data management.
The database industry has shown that it can adapt quickly to accommodate the diversity of data and access patterns introduced by e-business applications over the past six years. For example, most enterprise-strength DBMSs have built-in object-relational support, XML capabilities, and support for federated access to external data sources.
There is a huge worldwide investment in DBMS technology today, including databases, supporting tools, application development environments, and skilled administrators and developers. A platform that exploits and enhances the DBMS architecture at all levels is in the best position to provide robust end-to-end information integration.
This section is organized as follows:
We briefly review the evolution of the DBMS architecture.
We provide a real-world scenario that illustrates the scope of the information integration problem and sketches out the requirements for a technology platform.
We formally call out the requirements for a technology platform.
We present a model for an information integration platform that satisfies these requirements and provides an end-to-end solution to the integration problem as the next evolutionary step of the DBMS architecture.
Figure A-26 on page 336 captures the evolution of relational database technology. Relational databases were born out of a need to store, manipulate and manage the integrity of large volumes of data. In the 1960s, network and hierarchical systems such as CODASYL and IMSTM were the state-of-the-art technology for automated banking, accounting, and order processing systems enabled by the introduction of commercial mainframe computers. While these systems provided a good basis for the early systems, their basic architecture mixed the physical manipulation of data with its logical manipulation. When the physical location of data changed, such as from one area of a disk to another, applications had to be updated to reference the new location.
Figure A-26: Evolution of DBMS architecture
A revolutionary paper by Codd in 1970 and its commercial implementations changed all that. Codd's relational model introduced the notion of data independence, which separated the physical representation of data from the logical representation presented to applications. Data could be moved from one part of the disk to another or stored in a different format without causing applications to be rewritten. Application developers were freed from the tedious physical details of data manipulation, and could focus instead on the logical manipulation of data in the context of their specific application.
Not only did the relational model ease the burden of application developers, but it also caused a paradigm shift in the data management industry. The separation between what and how data is retrieved provided an architecture by which the new database vendors could improve and innovate their products. SQL became the standard language for describing what data should be retrieved. New storage schemes, access strategies, and indexing algorithms were developed to speed up how data was stored and retrieved from disk, and advances in concurrency control, logging, and recovery mechanisms further improved data integrity guarantees. Cost-based optimization techniques completed the transition from databases acting as an abstract data management layer to being high-performance, high-volume query processing engines.
As companies globalized and as their data quickly became distributed among their national and international offices, the boundaries of DBMS technology were tested again. Distributed systems such as R* and Tandem showed that the basic DBMS architecture could easily be exploited to manage large volumes of distributed data. Distributed data led to the introduction of new parallel query processing techniques (PARA), demonstrating the scalability of the DBMS as a high-performance, high-volume query processing engine.
The lessons learned in extending the DBMS with distributed and parallel algorithms also led to advances in extensibility, whereby the monolithic DBMS architecture was replumbed with plug-and-play components (Starburst). Such an architecture enabled new abstract data types, access strategies and indexing schemes to be easily introduced as new business needs arose. Database vendors later made these hooks publicly available to customers as Oracle data cartridges, Informix DataBlade™, and DB2 Extenders.
Throughout the 1980s, the database market matured and companies attempted to standardize on a single database vendor. However, the reality of doing business generally made such a strategy unrealistic. From independent departmental buying decision to mergers and acquisitions, the scenario of multiple database products and other management systems in a single IT shop became the norm rather than the exception. Businesses sought a way to streamline the administrative and development costs associated with such a heterogeneous environment, and the database industry responded with federation. Federated databases provided a powerful and flexible means for transparent access to heterogeneous, distributed data sources.
We are now in a new revolutionary period enabled by the Internet and fueled by the e-business explosion. Over the past six years, Java and XML have become the vehicles for portable code and portable data. To adapt, database vendors have been able to draw on earlier advances in database extensibility and abstract data types to quickly provide object-relational data models, mechanisms to store and retrieve relational data as XML documents (XTABLES), and XML extensions to SQL (SQLX).
The ease with which complex Internet-based applications can be developed and deployed has dramatically accelerated the pace of automating business processes. The premise of this section is that the challenge facing businesses today is information integration. Enterprise applications require interaction not only with databases, but also content management systems, data warehouses, workflow systems, and other enterprise applications that have developed on a parallel course with relational databases. In the next section, we illustrate the information integration challenge using a scenario drawn from a real-world problem.
To meet the needs of its high-end customers and manage high-profile accounts, a financial services company would like to develop a system to automate the process of managing, augmenting and distributing research information as quickly as possible. The company subscribes to several commercial research publications that send data in the Research Information Markup Language (RIXML), an XML vocabulary that combines investment research with a standard format to describe the report's metadata. Reports may be delivered via a variety of mechanisms, such as real time message feeds, e-mail distribution lists, Web downloads and CD ROMs.
Figure A-27 shows how such research information flows through the company.
Figure A-27: Financial services scenario
When a research report is received, it is archived in its native XML format.
Next, important metadata such as company name, stock price, earnings estimates, etc., is extracted from the document and stored in relational tables to make it available for real time and deep analysis.
As an example of real time analysis, the relational table updates may result in database triggers being fired to detect and recommend changes in buy/sell/hold positions, which are quickly sent off to equity and bond traders and brokers. Timeliness is of the essence to this audience and so the information is immediately replicated across multiple sites. The triggers also initiate e-mail notifications to key customers.
As an example of deep analysis, the original document and its extracted metadata is more thoroughly analyzed, looking for such keywords as "merger," "acquisition," or "bankruptcy" to categorize and summarize the content. The summarized information is combined with historical information made available to the company's market research and investment banking departments.
These departments combine the summarized information with financial information stored in spreadsheets and other documents to perform trend forecasting, and to identify merger and acquisition opportunities.
To build the financial services integration system on today's technology, a company must cobble together a host of management systems and applications that do not naturally coexist with each other. DBMSs, content management systems, data mining packages and workflow systems are commercially available, but the company must develop integration software in-house to integrate them. A database management system can handle the structured data, but XML repositories are just now becoming available on the market. Each time a new data source is added or the information must flow to a new target, the customer's home grown solution must be extended.
The financial services example above and others like it show that the boundaries that have traditionally existed between DBMSs, content management systems, mid-tier caches, and data warehouses are increasingly blurring, and there is a great need for a platform that provides a unified view of all of these services. We believe that a robust information integration platform must meet the following requirements:
Seamless integration of structured, semi-structured, and unstructured data from multiple heterogeneous sources. Data sources include data storage systems such as databases, file systems, real time data feeds, and image and document repositories, as well as data that is tightly integrated with vertical applications such as SAP or Calypso. There must be strong support for standard metadata interchange, schema mapping, schema-less processing, and support for standard data interchange formats. The integration platform must support both consolidation, in which data is collected from multiple sources and stored in a central repository, and federation, in which data from multiple autonomous sources is accessed as part of a search, but is not moved into the platform itself. As shown in the financial services example, the platform must also provide transparent transformation support to enable data reuse by multiple applications.
Robust support for storing, exchanging, and transforming XML data. For many enterprise information integration problems, a relational data model is too restrictive to be effectively used to represent semi-structured and unstructured data. It is clear that XML is capable of representing more diverse data formats than relational, and as a result it has become the lingua franca of enterprise integration. Horizontal standards such as ebXML and SOAP provide a language for independent processes to exchange data, and vertical standards such as RIXML are designed to handle data exchange for a specific industry. As a result, the technology platform must be XML-aware and optimized for XML at all levels. A native XML store is absolutely necessary, along with efficient algorithms for XML data retrieval. Efficient search requires XML query language support such as SQLX and XQuery.
Built-in support for advanced search capabilities and analysis over integrated data. The integration platform must be bilingual. Legacy OLTP and data warehouses speak SQL, yet integration applications have adopted XML. Content management systems employ specialized APIs to manage and query a diverse set of artifacts such as documents, music, images, and videos. An inverse relationship naturally exists between overall system performance and the path length between data transformation operations and the source of the data. As a result, the technology platform must provide efficient access to data regardless of whether it is locally managed or generated by external sources, and whether it is structured or unstructured. Data to be consolidated may require cleansing, transformation and extraction before it can be stored. To support applications that require deep analysis such as the investment banking department in the example above, the platform must provide integrated support for full text search, classification, clustering and summarization algorithms traditionally associated with text search and data mining.
Transparently embed information access in business processes. Enterprises rely heavily on workflow systems to choreograph business processes. The financial services example above is an example of a macroflow, a multi-transaction sequence of steps that capture a business process. Each of these steps may in turn be a microflow, a sequence of steps executed within a single transaction, such as the insert of extracted data from the research report and the database trigger that fires as a result. A solid integration platform must provide a workflow framework that transparently enables interaction with multiple data sources and applications. Additionally, many business processes are inherently asynchronous. Data sources and applications come up and go down on a regular basis. Data feeds may be interrupted by a hardware or a network failures. Furthermore, end users such as busy stock traders may not want to poll for information, but instead prefer to be notified when events of interest occur. An integration platform must embed messaging, Web services and queuing technology to tolerate sporadic availability, latencies and failures in data sources and to enable application asynchrony.
Support for standards and multiple platforms. It goes without saying that an integration platform must run on multiple platforms and support all relevant open standards. The set of data sources and applications generating data will not decrease, and a robust integration platform must be flexible enough to transparently incorporate new sources and applications as they appear. Integration with OLTP systems and data warehouses require strong support for traditional SQL. To be an effective platform for business integration, emerging cross-industry standards such as SQLX and XQuery as well as standards supporting vertical applications RIXML.
Easy to use and maintain. Customers today already require integration services and have pieced together in-house solutions to integrate data and applications, and these solutions are costly to develop and maintain. To be effective, a technology platform to replace these in-house solutions must reduce development and administration costs. From both an administrative and development point of view, the technology platform should be as invisible as possible. The platform should include a common data model for all data sources and a consistent programming model. Metadata management and application development tools must be provided to assist administrators, developers, and users in both constructing and exploiting information integration systems.
Figure A-28 on page 342 illustrates our proposal for a robust information integration platform.
Figure A-28: An information integration platform
The foundation of the platform is the data tier, which provides storage, retrieval, and transformation of data from base sources in different formats. We believe that it is crucial to base this foundation layer upon an enhanced full-featured federated DBMS architecture.
A services tier built on top of the foundation draws from content management systems and enterprise integration applications to provide the infrastructure to transparently embed data access services into enterprise applications and business processes.
The top tier provides a standards-based programming model and query language to the rich set of services and data provided by the data and services tiers.
The data tier
As shown in Figure A-28, the data tier is an enhanced high performance federated DBMS. We have already described the evolution of the DBMS as a robust, high-performance and extendable technology for managing structured data. We believe that a foundation based on a DBMS architecture allows us to exploit and extend these key advances to semi-structured and unstructured data.
Storage and retrieval. Data may be stored as structured relational tables, semi-structured XML documents, or in unstructured formats such as byte streams, scanned documents, and so on. Because XML is the lingua franca of enterprise applications, a first class XML repository that stores and retrieves XML documents in their native format is an integral component of the data tier. This repository is a true native XML store that understands and exploits the XML data model, not just a rehashed relational record manager, index manager, and buffer manager. It can act as a repository for XML documents as well as a staging area to merge and consolidate federate data. In this role, metadata about the XML data is as critical as the XML data itself. This hybrid XML/relational storage and retrieval infrastructure not only ensures high performance, and data durability for both data formats, but also provides the 24x7 availability and extensive administrative capabilities expected of enterprise database management systems.
Federation. In addition to a locally managed XML and relational data store, the data tier exploits federated database technology with a flexible wrapper architecture to integrate external data sources. The external data sources may be traditional data servers, such as external databases, document management systems, and file systems, or they may be enterprise applications such as CICS?or SAP, or even an instance of a workflow. These sources may in turn serve up structured, semi-structured or unstructured data.
The services tier
The services tier draws on features from enterprise application integration systems, content management systems and exploits the enhanced data access capabilities enabled by the data tier to provide embedded application integration services.
Query processing. In addition to providing storage and retrieval services for disparate data, the data tier provides sophisticated query processing and search capabilities. The heart of the data tier is a sophisticated federated query processing engine that is as fluent with XML and object-relational queries as it is with SQL. Queries may be expressed in SQL, SQLX, or XQuery, and data may be retrieved as either structured data or XML documents. The federated query engine provides functional compensation to extend full query and analytic capabilities over data sources that do not provide such native operations, and functional extension to enable extended capabilities such as market trend analysis or biological compound similarity search.
In addition to standard query language constructs, native functions that integrate guaranteed message delivery with database triggers allow notifications to fire automatically based on database events, such as the arrival of a new nugget of information from a real time data feed.
Text search and mining. Web crawling and document indexing services are crucial to navigate the sea of information and place it within a context usable for enterprise applications. The services tier exploits the federated view of data provided by the data tier to provide combined parametric and full text search over original and consolidated XML documents and extracted metadata. Unstructured information must be analyzed and categorized to be of use to an enterprise application, and for real time decisions, the timeliness of the answer is a key component of the quality. The technology platform integrates services such as Intelligent Miner™ for Text to extract key information from a document and create summaries, categorize data based on predefined taxonomies, and cluster documents based on knowledge that the platform gleans automatically from document content. Built-in scoring capabilities such as Intelligent Miner Scoring integrated into the query language (SQLMM) turn interesting data into actionable data.
Versioning and metadata management. As business applications increasingly adopt XML as the language for information exchange, vast numbers of XML artifacts, such as XML schema documents, DTDs, Web service description documents, etc., are being generated. These documents are authored and administered by multiple parties in multiple locations, quickly leading to a distributed administration challenge. The services tier includes a WebDav-compliant XML Registry to easily manage XML document life cycle and metadata in a distributed environment. Features of the registry include versioning, locking, and name space management.
Digital asset management. Integrated digital rights management capabilities and privilege systems are essential for controlling access to the content provided by the data tier. To achieve these goals, the information integration platform draws on a rich set of content management features (such as that provided in IBM Content Manager) to provide integrated services to search, retrieve and rank data in multiple formats such as documents, video, audio, etc., multiple languages, and multibyte character sets, as well as to control and track access to those digital assets.
Transformation, replication, and caching. Built-in replication and caching facilities and parallelism provide transparent data scalability as the enterprise grows. Logic to extract and transform data from one format to another can be built on top of constraints, triggers, full text search, and the object relational features of today's database engines. By leveraging these DBMS features, data transformation operations happen as close to the source of data as possible, minimizing both the movement of data and the code path length between the source and target of the data.
The application interface
The top tier visible to business applications is the application interface, which consists of both a programming interface and a query language.
Programming interface. A foundation based on a DBMS enables full support of traditional programming interfaces such as ODBC and JDBC, easing migration of legacy applications. Such traditional APIs are synchronous and not well-suited to enterprise integration, which is inherently asynchronous. Data sources come and go, multiple applications publish the same services, and complex data retrieval operations may take extended periods of time. To simplify the inherent complexities introduced by such a diverse and data-rich environment, the platform also provides an interface based on Web services (WSDL and SOAP). In addition, the platform includes asynchronous data retrieval APIs based on message queues and workflow technology to transparently schedule and manage long running data searches.
Query language. As with the programming interface, the integration platform enhances standard query languages available for legacy applications with support for XML-enabled applications. XQuery is supported as the query language for applications that prefer an XML data model. SQLX is supported as the query language for applications that require a mixed data model as well as legacy OLTP-type applications. Regardless of the query language, all applications have access to the federated content enabled by the data tier. An application may issue an XQuery request to transparently join data from the native XML store, a local relational table, and retrieved from an external server. A similar query could be issued in SQLX by another (or the same) application.
The explosion of information made available to enterprise applications by the broad-based adoption of Internet standards and technologies has introduced a clear need for an information integration platform to help harness that information and make it available to enterprise applications. The challenges for a robust information integration platform are steep. However, the foundation to build such a platform is already on the market. DBMSs have demonstrated over the years a remarkable ability to managed and harness structured data, to scale with business growth, and to quickly adapt to new requirements. We believe that a federated DBMS enhanced with native XML capabilities and tightly coupled enterprise application services, content management services and analytics is the right technology to provide a robust end-to-end solution.
We recommend that you start with the following products:
IBM WebSphere Studio is an open comprehensive development environment for building, testing and deploying dynamic on demand e-business applications. Founded on open technologies and built on Eclipse, WebSphere Studio provides a flexible, portal-like integration of multi-language, multi-platform and multi-device application development tools that maximize your productivity, increase ROI, and improve overall time to value.
IBM WebSphere Application Server is a high-performance and extremely scalable transaction engine for dynamic e-business applications. The Open Services Infrastructure allows companies to deploy a core operating environment that works as a reliable foundation capable of handling high volume secure transactions and Web services. WebSphere continues the evolution to a single Web services-enabled, Java 2 Enterprise Edition (J2EE) application server and development environment that addresses the essential elements needed for an on demand operating environment.
You can use WebSphere Business Integration Server V4.2 to quickly integrate new or existing applications or systems on diverse platforms, create and rapidly deploy new business processes, or solve a variety of business integration needs.
DB2 Information Integrator—This new family of products is designed to help you integrate structured, semi-structured and unstructured information effectively and efficiently. These products, based on the previously disclosed Xperanto project, provide the foundation for a strategic information integration framework to help access, manipulate, and integrate diverse, distributed and real time data.
DB2 Universal Database—DB2 Version 8.1 helps solve critical business problems by integrating information across the entire enterprise by leveraging federated Web services and XML. DB2 is delivering new federated capabilities that enable customers to integrate information as Web services. DB2 also delivers new XML enhancements that make it easier for programmers to integrate DB2 and XML information.
Automation leaves enterprise leaders free to focus on managing the business, rather than managing the complexities of new technology. Automation, including autonomic computing, leaves enterprise leaders free to focus on managing the business, rather than managing the complexities of new technology.
The following material comes from "What you need to know now about autonomic computing, Part 1: An introduction and overview" by Daniel H. Steinberg (Director of Java Offerings, Dim Sum Thinking), published in August 2003. IBMers can find this article at http://www-106.ibm.com/developerworks/ibm/library/i-autonom1/.
Imagine if you could describe the business functions you want your system to provide and it just took care of itself. Needed software would be located, installed, and configured automatically. Resources would become available when they were needed and are freed when they weren't. This autonomic vision was explained by IBM's Alan Ganek in a session at developerWorks live!
Autonomic systems are able to dynamically configure and reconfigure themselves according to business needs. Such systems are always on the lookout to protect themselves from unauthorized use, to repair portions of the system that are no longer functioning, and to look for ways to optimize themselves. IBM has introduced major initiatives in autonomic computing. At this year's developerWorks Live! conference, Alan Ganek, IBM Vice President of Autonomic Computing, presented an overview that set the stage for a host of other sessions on the topic. This series presents the highlights from this year's sessions on autonomic computing at developerWorks Live!
Ganek's session in New Orleans came just a few days after the NCAA Final Four men's basketball playoffs concluded. He used basketball as an analogy for autonomic computing. The players think about looking for an open shot, passing to team mates, and defending against the other team. When players run the length of the court on a fast break, they concentrate on getting the basket. They don't have to think about making their heart beat faster, about altering their breathing pattern, or about altering their pupil dilation to focus on the rim of the basket. Regulating the circulation and breathing are critical to a player's success, but they should not require thought or attention. The autonomic nervous system in humans takes care of tuning these core functions and allows us to think on a higher level.
The autonomic computing vision is the analogous situation for IT. Autonomic computing allows people to focus on the big picture because the low level tasks can be "taught" to monitor and manage themselves. In a typical traditional network, there are a wide range of clients, servers, and multiple databases connected across internal and external firewalls. To manage this network you have to consider dozens of systems and applications, hundreds of components, and thousands of tuning parameters. The goal is to have a system that can accept business rules as input to manage the increasingly complex system.
Ganek acknowledges that we've been trying to figure out how to manage complexity for years but that the current situation is different. He explains that in the mid 1990s, the Internet was deployed, but was not being used by many businesses. At that time, a big network was 10,000 ATMs connected to mainframes. "Now," he says, "you have smart cards, smart phones, laptops, and desktops, all of which come in over wireless and the Internet. You don't hit one bank of mainframes, now you might hit 10,000 servers. The Internet explosion is an explosion in complexity."
An IBM ThinkPad?sat on a desk in front of Ganek. One of the conference staff had stopped by and booted it up. Ganek pointed to it as an example of an unused asset that was just wasting cycles. He sees several advantages in implementing autonomic solutions:
Increased ROI (return on investment) by lowering administrative costs and improving the utilization of assets. A solution should be able to locate under-utilized resources and take advantage of them in a dynamic way.
Improved QoS (quality of service) by reducing downtime. Ganek pointed out that 40% of outages come from operator error.
Faster time to value by accelerating implementation of new capabilities along with more accurate and immediate installation and reduced test cycles.
There is not a single technology known as autonomic computing. Ganek points out that customers will still be buying Enterprise Resource Planning (ERP) or data management solutions. They will, however, prefer those with the characteristics of autonomic computing. Many of the sessions on autonomic computing at this conference included the image in Figure A-29.
Figure A-29: Components of self-managing systems
This diagram breaks self-management into four slices:
Self-configuring components increase responsiveness by adapting to environments as they change. Your system can add and configure new features, additional servers, and newly available software releases while the system is up and running. The key to making this process autonomic is to require minimal human involvement.
Self-healing components improve business resiliency by eliminating disruptions that are discovered, analyzed, and acted upon. The system identifies and isolates a failed component. This component is taken offline, repaired or replaced, and then the functional component is brought back online. Autonomic systems need to be designed with some level of redundancy so that this healing can occur transparently to users.
Self-protecting components anticipate and protect against intrusions and corruptions of data. This includes the managing of authentication of users for accessing resources across an array of enterprise resources. Self-protection also includes monitoring who is accessing resources and reporting and responding to unauthorized intrusions.
Self-optimizing components make the best use of available resources even though these resources and the requirements are constantly changing. Humans cannot respond quickly enough to perform these actions in a way that responds to the current system conditions. Systems must monitor and tune their storage, databases, networks, and server configurations continually.
Autonomic does not just mean automated. An automated system might simply specify that this server is assigned to a particular task between the hours of 4:00 and 7:00 and to a different task the remainder of the day. This specification may be correct and it may help to have it in place in your system. On the other hand, you may want a more business oriented rule being enforced. Your rule may be that gold level customers can expect a response to be generated within two seconds while silver customers can expect a response to be generated within six seconds.
Although autonomic computing is not far away, Ganek recommends a step-by-step approach to evolve your infrastructure in that direction. First you need to assess where you are in the continuum. Then you need to decide which area of complexity to tackle first. Ganek reminds the audience that "the complexity is at every level of the system. Autonomic is hardware, software, and system management."
Ganek outlines five levels that run the gamut from manual to autonomic. He says that most organizations are at level 1. This basic level requires that the IT staff install, monitor, maintain, and replace each system element. The second level, the managed stage, already can be implemented using many of the tools that IBM and other vendors provide. The tools help the IT staff analyze system components and use the results to decide which actions to take. Ganek says that many state-of-the-art customers are currently at this level.
Each level replaces some area of human intervention and decision making. The predictive level adds builds on the monitoring tools added in the previous level. At this third level, the system can correlate measurements and make recommendations. The IT staff looks to approve the recommendations and take actions. This leads to faster and better decision making. At level four, the staff becomes less involved in viewing the recommendations and taking actions. This is the adaptive level and features the ability of the technology to make more of the decisions automatically. Staff members spend most of their time setting the policies and managing the controls.
The autonomic level is level five. In many ways the technology is similar to that introduced at level four. A difference is that the IT services are now integrated with business rules. This is where you stop defining IT rules in terms of the components and tuning parameters. Now the policies are set in terms of business logic, and the IT staff focuses on tuning the system and the rules so they best support the company bottom line. As an example, if a Web site supports free content and subscriber-only content, then a rule might specify that resources should be allocated so that the user experience for subscribers is at a certain level even if that means degrading the experience for non-paying site visitors.
In order to deploy autonomic solutions, Ganek explains, there are core capabilities that must be provided. Although this list will change, he suggests an initial set that includes solution installation, common system administration, problem determination, monitoring, complex analysis, policy based management, and heterogeneous workload management. For each of these areas, IBM is looking at a technology to advance those capabilities.
Ganek cited some of the existing IBM projects. There is a partnership with InstallShield, designed to make installation more regular and more familiar. This process will help users understand what is to be installed and what must be installed along with a recognition of dependencies. IBM is participating in the Java Community Process JSR-168. By using Portal technology, the administrative console will become consistent with the same terminology and will interface across applications. Using the agent standards described in JSR-87, IBM has created ABLE, an Agent Based Learning Environment, and will use intelligent agents to capture and share individual and organizational knowledge. These agents will be used to construct the calculation tools needed for autonomic computing.
You can track many of the latest autonomic releases on the newly created autonomic zone on alphaWorks. Other articles in this series describe the grid structure that underlies much of the dynamic sharing of resources and the autonomic cycle that will work to monitor each autonomic element, analyze it, plan for change, and execute the change. This will be accompanied by concrete examples for logging and tracing and for workload management.
At the heart of self-managing systems are the policies used to manage them. Policies have to enable the system to choose among potentially contradictory guidelines by deciding which option would best help achieve business objectives.
As you make the transition from manual to autonomic, you trust your system to make increasingly complicated recommendations and to take action upon them. In this article, you'll read about the cycle used to manage elements in an autonomic system and about the policies used to make decisions about what action to take. This article is based on two presentations at this year's developerWorks Live! conference. David W. Levine presented an overview of the process and the tools you will be able to find on the alphaWorks site this summer, in his session titled "A toolkit for autonomic computing." David Kaminsky presented the four components of a policy and how they are used in autonomic computing in his session, "Policy-driven Computing—The Brains of an Autonomic System."
Autonomic elements have two management tasks: they manage themselves and they manage their relationships with other elements through negotiated agreements. An autonomic element contains a continuous control loop that monitors activities and takes actions to adjust the system to meet business objectives. Autonomic computing components are available as part of the Emerging Technologies Toolkit (ETTK) on alphaWorks. You can use the ETTK to experiment with components that serve as the building blocks for self-management: monitoring, analysis, planning, and execution. The architecture is summarized in Figure A-30 on page 352.
Figure A-30: The autonomic cycle
At the bottom of the diagram is the element that is being managed. It is linked to the control loop with sensors and effectors. You can think of these as high-level getters and setters. The sensors are used to provide information on the current state of the element and the effectors are used to modify this state in some way. The sensors and effectors are exposed to the control loop as Web services.
Refer to the control loop by the acronym MAPE: monitor, analyze, plan, and execute. In the middle of the diagram is knowledge. Knowledge is data with context and structure. The number 1.7 is a piece of data, but it isn't knowledge. The fact that we rejected an average of 1.7 requests per minute in the last hour is knowledge. Having a context and structure for the data allows users to write components which share data. This is crucial because over time, a complex system won't be solved by one programmer. Just as important is the ability to separate the people who write the tools that provide information from the people who write the policies. The collection of knowledge helps with this separation.
Monitoring, the fetching of low-level information, is where most of the knowledge comes from. This is not simply the persistence of every piece of data that can be gathered. You want to be careful not to overload the system with too many pieces of uninteresting data. You might decide to take readings every 10 seconds and persist the data two minutes at a time into a temporary store. When something goes wrong, you might add the relevant readings to the knowledge base. The analysis tools distributed with the Autonomic Computing Toolkit include prediction algorithms, modeling tools, and a math library. More specifically, these include workload forecasting, stochastic modeling, and optimizers.
Once you know what needs to be done, what remains is to make plans and to make changes. These steps are based on rules engine technology. You can evaluate policy against the current operating conditions that you have measured and analyzed in the previous steps. You can separate out how you react to events and new agreements by setting up policies. The goal with rules is to capture interesting behavior without getting caught in overly long-running inferences. The infrastructure for planning and executing is consistent with OGSA and with the W3C policy initiative discussed in the next section.
From the autonomic computing standpoint, policies can be viewed as a set of considerations designed to guide decisions on courses of actions. It is a piece to guide the decisions that guide how a system behaves. Policy-driven computing is the brains of an autonomic system. This does not imply that there are no conflicts among policies that apply to a given situation. The policies are considerations guiding the system, requiring that an interpreter resolves any conflicts.
One of the advantages is that when you can set policies and rules for prioritizing them, you reduce the need to configure individual resources. You are also allowing the system to make adjustments to the configuration when the load on the system changes. Practically speaking, your goal is to take a service-level agreement and abstract the services you are going to provide as service-level objectives. These objectives are then mapped to policies. You can then monitor against the objectives to see whether they are being met. One example is a policy that promises gold clients one second response time, silver clients three seconds, and bronze clients five seconds. You can monitor the response time for each customer level, and dynamically reallocate servers if you are meeting the bronze objectives but not the gold objectives.
Before proceeding, you need to consider whether you are using the same notion of policy as the autonomic computing team. As the speaker points out, there are many different ideas of what is meant by policy. The IETF definition is characterized as guiding actions. The British-based Ponder looks at a policy as a rule that defines a choice in how the system behaves. The WS Policy group looks at a policy statement as being made up of the more atomic policy assertions. These represent preferences and can be thought of as properties that can be set. For IBM, policies are used to guide decisions and should encapsulate some business requirement.
IBM views a policy as a "four-tuple" made up of scope, preconditions, measurable intent, and business value. In the autonomic computing cycle, the policy sits inside of the planning phase. It configures the monitoring and execution phases to meet the objectives of the policy. In subsequent cycles, the analysis phase is where it is determined if the policy has been met.
Scope and preconditions are the first components of the policy "four-tuple." With scope, you identify what is and is not subject to the intent of the policy. This includes which resources are needed and perhaps which policies are applicable. For example, if you have a backup policy, you should define the scope so that it is clear whether it is applied to a database or to a Web server. A policy may be capability-based. In this instance, your backup policy might be applied to any resource capable of performing a backup. Preconditions help the system decide which policies are relevant. Perhaps you want to employ one backup strategy while the system is heavily utilized and a different backup strategy when it is not. You might determine which strategy to use based on the state of a particular system. For example, if an average of more than two requests each minute are received, then employ the first strategy. Your choice might also depend on something as simple as the time of day. Perhaps you choose one strategy during business hours and a different one overnight. The preconditions help you specify the situations in which a particular policy is to be applied.
The remaining components of the policy "four-tuple" are used to define what is the intended result of the policy and to provide further guidance in selecting among apparently conflicting policies. Measurable intent is where you specify what you are trying to accomplish. Examples include "perform a backup" or "provide a one second response time." Business value is used to optimize resource use. You may have one policy that describes a backup and another that prescribes response time. Following one policy may mean that you have to violate another. Business value helps the system select among these applicable and relevant policies.
In order to build and support policies, you need tool support. These tools help you define a policy and store it while it is being validated. You will then need to use either push or pull to enforce the policy. In defining a policy, the key issue is ensuring the policy is relevant to the resources on which you are setting the policy. One strategy is for the resource to tell the tool what sort of policies it can handle. There is a need, when setting policy, to understand the instrumentation of the state of the system. In addition, you want to provide transactional coherence. It is possible that a collection of policies either don't make sense or could be destructive if they are not all deployed together. Transactional coherence enforces these requirements.
Research on policies also includes validation, distribution, and security. The validation step involves conflict resolution. Conflicts could be static or dynamic. You may ask whether an individual can set a particular policy, or you can ask whether an individual can set a particular policy at the current state of the system. Policies are an avenue for attacking a system. It is important to include security and authentication when planning for distribution of a system.
Between the second half of this year and the middle of next year, IBM will be working on Policy editors and validators. The goal is to package many of these technologies with the ETTK to simplify the development of policy-enabled systems. The releases will include rule evaluation engines, business policy to IT policy translations, SLA compliance monitors, and WS Policy enablement. The IBM policy efforts are driven by standards, and the engineers are working on tools that support and facilitate the implementation of those standards. In future articles in this series, we'll look at other standards-based efforts in autonomic computing. These include work on the grid infrastructure for autonomic computing and on deployment of autonomic solutions.
Autonomic systems are self-healing and self-optimizing. Sometimes they need to use resources that are distributed across a network. Grid computing provides the facility for optimal use of resources in a robust and flexible system.
Grid computing lets virtual collaborative organizations share applications and data in an open, heterogeneous environment. Autonomic computing features self-managing systems that benefit from the grid support for sharing, managing, and providing access to resources. This linked management can be used to increase the quality of service and to optimize the available resources. Providing access across the grid supports on demand computing and utility models. A grid allows you to increase your capacity by exploiting distributed resources to provide capacity for high-demand applications. Distributed resources also provide reliability and availability and reduce time to results through computations that are executed more frequently or on customer demand.
Grid computing is being integrated into the design of IBM WebSphere Application Server Version 5. Version 6 will be compliant with Open Grid Services Architecture (OGSA). You will be able to build grids using Application Server, and you will be able to add Application Server to existing grids. In addition, DB2, the IBM eServer product line, and IBM TotalStorage?solutions will be OGSA compliant and will also be easily integrated into grids. This article on grid computing is based on two presentations from this year's developerWorks Live! conference in New Orleans. Marlon Machado and Christopher Walden discussed the changes to Application Server to take advantage of grid computing in their presentation, "Grid Computing in the Real World: WebSphere Application Server V4.0's new system management architecture." Matthew Haynos looked at the grid computing standards in his talk, "Working with Open Grid Services Architecture (OGSA)."
In rethinking Application Server, the architects decided to adopt a grid-like approach to achieve performance, capacity, and failover support. In particular, there is a move away from a single administrative server and a centralized repository for configuration, clustering, and work load management. In the Application Server Version 4 model, there was a single administrative domain, and all nodes were registered in a relational database. This strategy was robust, but there was a single point of failure, and customization had to be done by hand.
In Application Server Version 5, systems administration is an extension of the application. You still have nodes, but they aren't necessarily clones. The administration is divided into cells and nodes. Nodes are managed servers on a physical box that control processes. A cell is a logical node container. A set of cell managers administers one or more nodes. The configuration data is stored within each process. Each process is self-sufficient in managing its own resources. The configuration data is kept on a set of XML files. Cell managers are synchronized using JMX and distribute the administration of the system. They make sure configuration and binaries are updated using asynchronous communication.
A set of design principles guided the redesign of Application Server:
Processes are unreliable— The product has to be able to function even if a managed process isn't executing as expected.
Communication channels are unreliable— The goal is for the system to continue to function if a component fails.
Configuration should be separated from operations— Operations are supported by JMX and will be dynamic and synchronous if possible.
Configuration is document-based— This led to the decision to store configuration values in XML files.
Isolate administrative functions— Application servers and applications can be somewhat independent if they are isolated from each other.
Make backups before making changes— This is always good advice for human administrators, and it now applies to application servers and administrative functions. Hand-in-hand with this is the extensive use of logging throughout the system.
Other changes to Application Server result from the move toward a grid architecture. Cell managers and node managers work together to distribute the application. Processes are self sufficient but loosely coupled to nodes. All updates are automatically distributed across the domain. The message-driven, grid-like architecture is peer-to-peer, based on the JXTA V1.0 protocols spec. The message binding and file transfer layers are used to publish and synchronize configuration data, to launch administrative processes, and to support services such as name, security, and location. Concurrent performance, capacity, and failover support are addressed through graph-traversing algorithms and combinatorial optimization. Think of your processing in terms of processes and components and not just on a box they are running on. Your JMS and application server processes can share nodes, or they can be separated if needed.
A concrete example of the advantages of grid computing is provided by the rethinking of workload management (WLM) and scalability. Now that there is no single Application Server repository, configuration data is shared among nodes. This means that WLM must now be asynchronous and adaptive. This also requires that WLM be factored into your decisions of how you set up your system. Load balancing becomes the foundation for partitioning and grid-like behavior. Requests are dynamically distributed among clones according to availability.
IBM identified four requirements for WLM:
Integration— You need to manage HTTP, IIOP, JMS, or JavaMail requests as they move through the enterprise.
Load balancing— Balance the workload based on the resources available at any given time.
Failure identification— Describe and document what went wrong and mention all of the resources involved.
Standalone availability— WLM functionality should exist even in absence of code.
The WLM controller defines controllable operation goals. It takes the input for the attributes of the defined goals and uses a dynamic routing table to communicate to the elements. Here's a quick look at the WLM algorithm.
Initially you decide which elements and clones are heavier for this weighted round-robin algorithm. All weights are collected in a weight set for the cluster. The weight of a clone is decreased by one for each processed request until it equals zero. At this point, no more requests are sent to that clone. The process goes on until the entire weight set equals zero. The WLM routing algorithm can be overridden with in-process optimization. If a request from a client is sent to an object that is already in the same process, the weights are not decreased. A prefer-local optimization helps ensure that all requests are sent to the same node if possible. You can also override with transactional affinity. In a transaction, all requests will be processed on the same clone if possible. The weights are decreased, but affinity will override zero or negative weights. In other words, the core algorithm can be adapted in situations in which another algorithm is preferable.
The notion of partitioning includes and extends what we used to call scaling. Now you can organize topologies according to business needs and not just by performance and capacity requests. Think of partitioning in two directions. You have different layers, including a Web server, servlet engine, EJB engine, and a database. You also have columns that include one component from each layer. You might have a column that includes one Web server, one database, and so on. Elements from a single column may be located on different machines. If you are familiar with Application Server Version 4, then vertical partitioning is analogous to what you thought of as horizontal scaling. Horizontal partitioning has layers that are independent from each other that are cloned and very granular. The key to performance in Application Server Version 5 is a flexible topology.
OGSA provides a services-oriented virtual organization. Providing virtualized services across an enterprise provides a common base for autonomic management solutions. IBM will OGSA-enable servers, storage, and networks this year. On top of these will sit OGSA-enabled versions of security, workflow, database file systems, directory services, and messaging. You can imagine a Web services layer that sits on top of these functional elements and is used to communicate with the OGSA layer.
The Open Grid Services Infrastructure (OGSI) can be thought of as Web services++. It improves on some of the pieces of Web services, including discovery, lifecycle, registry management, factory, notification, and HandleMap. Grid services are transient, and many of these extra pieces help Web services interact with transient services. The other additions to Web services provide data and state qualities. For example, HandleMap helps you get a pointer to a Web service. You will need lifetime management interfaces to allow the service to create and destroy itself.
Recall the monitor, analyze, plan, and execute cycle for autonomic elements described in the previous article in this series. There you used Web services to communicate with the autonomic element. The sensors and effectors reported back from the element to the managing cycle, and the effectors changed the state of the element based on the plan being executed. The grid services allow you to manage and optimize elements that you bring in and out of your system from across a network. In the final article in this series, you will see concrete applications for deploying applications and for logging services.
If an application is to be self-healing, it must be able to dynamically deploy updates and other instances. It must also be able to locate problems, recommend actions, and execute them. The final article in this series on autonomic computing looks at deploying and logging as essential support technologies.
Previously we looked at the theory and guiding ideas of autonomic computing and the underlying infrastructure. To have self-managing systems, you will need to be able to identify and diagnose problems and to deploy software designed to remedy these problems. This article talks about challenges in deploying applications and consistent logging strategies based on two talks in the autonomic computing thread from this year's developerWorks Live! conference. Heng Chu addressed the challenges of deployment in his session, "Common Software Deployment: An Enablement for Autonomic Computing." Dave Ogle outlined the benefits and difficulties in settling on a common logging format in his session, "Unified Logging and Tracing—A Building Block for Autonomic Computing."
Deploying software isn't just a matter of creating a CD and copying it on target machines. Here is a six-stage process for software deployment.
Create the software packages— Packaging should be done throughout the life cycle. You might be creating a product CD, or clients might want to repackage your software within their install base before they deploy it.
Analyze the environment— In a complex IT environment, you need to check dependencies on hardware, the operating system, or previously installed software. You might also need to determine if migration is needed.
Plan for deployment— Depending on your analysis, you might need to determine a migration path. Also, you need to identify where your software components will be installed.
Install the packages— This step includes the processes of moving software onto or off of a particular machine. This step might be where you install, uninstall, migrate, repair, rollback, or commit various components.
Configure the software— You ensure that the software was properly installed. You might need to configure a product to work properly in an environment or with other components.
Verify the deployment so the software is ready to use— You might smoke test the installation, verify the package is intact, or check that the entire suite has been installed and configured to handle end-to-end transactions.
Today, most installation is at the basic/managed end of the manual-to-autonomic spectrum. At the basic level, a highly skilled IT staff reads through the documentation and performs the installation. At the managed level, the IT staff analyzes needs and uses install tools to automate the installation step of the deployment process. At the predictive level, the IT staff allows the system to recommend deployment actions. The staff approves the recommendations and uses the installation tools to perform the install. At the adaptive level, the IT staff manages performance against service level agreements and allows the system to understand, correlate, and take deployment actions while taking dependencies into account. Finally, at the autonomic level, the system dynamically deploys components based on business rules expressed as policies.
To automate this dynamic deployment of software, IBM is defining the concept of an installable unit and is treating a hosting environment as an autonomic computing-managed element, as shown below.
Figure A-31: The autonomic cycle
The monitor phase is used to gather inventory of existing software and configuration. If you know the inventory has changed, the analyze phase can be used for dependency checking, to verify the integrity of the environment, and to see that introducing new software will not destabilize the environment. The planning phase includes target specification, the choice of migration paths, and establishing the configuration for each policy. The execute phase is where you initiate installation, configuration, and verification. The sensors and effectors are the link between this managing cycle and the element being managed. Here, the sensors advertise installed components and their configuration, while the effectors are a Web service interface between the execute phase and the element that actually carries out these tasks. You can think of the knowledge that sits in the center of this scenario as the repository for information about installed software and its configuration and dependencies.
When something goes wrong with a complex system, locating and identifying the problem can be a nightmare. Your Web application might use a Web server, a database, storage, and other components. Maybe you are supporting more than one database system or more than one type of server. Each product uses its own log file format, and each defines and uses its own events. A failure on one part of the system might actually be the result of a failure somewhere up the message chain. Lack of a common format and vocabulary makes it difficult to write programs to debug or tune your running application.
One solution is to create a common format for reporting errors. For the most part, most messages are reporting the following three pieces of information:
Observing component— The ID of the component that is seeing a problem.
Impacted component— Which component is having the problem.
Situation details— An explanation using common terms of what occurred.
Being able to uniquely identify a component is critical. You want to be able to correlate between reports that originate with two different components. You want to be able to determine whether the impacted and observing components are actually the same. Also, if you are going to automate the process where action is taken, you need to be able to identify the component that needs to be acted upon.
The last leg of the trio requires a consistent way to report common situations. There tends to be creative authorship with variations, even within a single product, on how problems are reported. Across products this becomes even more difficult. IBM looked at thousands of log files to establish a small set of canonical situations. Surprisingly, the result was less than two dozen categories of logged events. Within each category there were many different ways of saying the same thing. Few messages could not be categorized. The situation taxonomy and grammar includes the situation category, the disposition, the task, and the reasoning domain. The initial set of situations includes start, stop, feature, dependency, request, configure, connect, and create. Take a look at that list and consider how many of the situations you encounter fit into one of those categories. Now think about the number of different words you have used to describe any one of them.
The challenge is how to achieve common situations and data. Customers have servers, databases, and other pieces from a variety of vendors. These other vendors might not agree to the common format, and many of the customers might not need to update their systems to include compliant components. For now the solution is to install an adapter that sits on your side of the log file. The adapter translates the current log output into the common situation format. You can get the log information from any element in the common format. This lets you use the autonomic computing cycle to manage the component. An analysis engine can work on the log files. The knowledge base can consist of a database of symptoms. This symptom database will benefit from a common format of information it delivers. To keep this explosion of data from overloading the system, we want to do as much analysis as possible close to the source and filter the data to make it more manageable.
The following is from the IBM whitepaper of the same name, published in October 2002. IBMers can find this at ftp://ftp.software.ibm.com/software/tivoli/whitepapers/wp-autonomic.pdf.
The high tech industry has spent decades creating systems of ever-increasing complexity to solve a wide variety of business problems. Today complexity itself has become part of the problem. After deployment, hardware and software problems occur, people make mistakes and networks grow and change. Improvements and changes in performance and capacity of IT components can require constant human intervention. A machine waiting for a human to tune it and fix it can translate into lost dollars.
With the expense challenges that many companies face, IT managers want to improve the return on investment of IT by reducing the total cost of ownership, improving the quality of service and managing IT complexity. Autonomic computing helps address these issues and more by using technology to manage technology. Autonomics is a term derived from human biology. In the same way that your body's autonomic system monitors your heartbeat, checks your bloodsugar level and keeps your body temperature at 98.6?Farenheit without any conscious effort on your part, autonomic computing components anticipate needs and resolve problems—without human intervention.
IBM products with autonomic capabilities can deliver customer value with their predictive and proactive functions that anticipate changing conditions and problems. This paper defines the customer value of autonomic computing, the requirements for achieving an autonomic environment, the steps for successful implementation and the products that are making this computing concept a reality.
Autonomic computing was conceived of as a way to help reduce the cost and complexity of owning and operating the IT infrastructure. In an autonomic environment, IT infrastructure components—from desktop computers to mainframes—are self-configuring, self-healing, self-optimizing and self-protecting. These attributes are the core values of autonomic computing.
Figure A-32: Components of self-managing systems
Self-configuring— With the ability to dynamically configure itself on the fly, an IT infrastructure can adapt—with minimal human intervention—to the deployment of new components or changes in the IT environment.
Self-healing— A self-healing IT infrastructure can detect when IT components fail and can cure or work around those component failures to provide continued availability of business applications.
Self-optimizing— Self-optimization is the ability of the IT environment to efficiently address resource allocation and utilization with minimal human intervention.
Self-protecting— A self-protecting IT environment can detect hostile or intrusive behavior as it occurs and take autonomous actions to make itself less vulnerable to unauthorized access and use, viruses, denial-of-service attacks and general failures.
In an autonomic environment, components work together and communicate with each other and with high-level management tools. They regulate themselves and, sometimes, each other. They can proactively manage the network while hiding the inherent complexity of these activities from end users.
The IBM view of autonomic computing is to make its software behave automatically and bring the autonomic systems management capability to the infrastructure, enabling the IT environment—including systems management software—to configure, optimize, heal and protect itself.
Typically a complex IT infrastructure is managed using a set of IT management processes. Industry initiatives, including IT Infrastructure Library (ITIL) and IBM IT Process Model, define best practices for managing the IT environment. Figure A-33 on page 364 shows an example of a typical process flow for incident management, problem management and change management. The actual mechanics of how these flows are implemented in a particular IT organization can vary, but the basic functionality is usually the same.
Figure A-33: A typical process flow for incident management, problem management, and change management
The efficiency and effectiveness of these processes are typically measured using metrics like elapsed time of a process, percentage executed correctly, skill requirements, average cost of execution and so on. Autonomic computing technology can help improve the efficiency and speed with which these processes can be implemented by automating some steps in the process.
Quick process initiation— Typical implementations of these processes require a human to initiate the process (create the request for change, collect incident details, open a problem record). This usually requires the IT professional to spend time gathering the right information. In a self-managing system, components can initiate the processes based on information derived directly from the system. This helps reduce the manual labor and time required to respond to critical events.
Reduced time and skill requirements— Tasks or activities in these processes usually stand out as skills-intensive, long-lasting, and difficult to complete correctly the first time because of system complexity. In a change management process such an activity is change impact analysis, and in problem management such an activity is problem diagnosis. In self-managing systems, resources are instrumented so that the expertise required to perform these tasks can be encoded into the system, helping reduce the amount of time and skills needed to perform these tedious tasks.
The self-managing capability of the IT environment helps improve responsiveness, reduce total cost of ownership, and improve time to value. It can help reduce the total cost of ownership because the IT professional can complete the IT processes at a low average cost, and it can help accelerate time to value because it reduces the time it takes to execute an IT process.
The remainder of this section discusses the autonomic computing technology and tools that help make it possible.
The architecture shown in Figure A-34 identifies the required architectural elements in an autonomic environment. The architecture is organized into two major elements—a managed element and an autonomic manager.
Figure A-34: Structure of self-management technologies
The managed element is the resource being managed. At this level of the architecture, the element targeted by management could be a single resource or a collection of resources. The management element exports sensors and effectors. Sensors provide mechanisms to collect information about the state and state transition of an element. Effectors are mechanisms that change the state of an element.
Sensors and effectors represent the instrumentation interface that is available to an autonomic manager. The autonomic manager is a component that implements the control loop. The architecture decomposes the loop into four parts:
Monitor— Mechanisms that collect, aggregate, filter, manage and report details (metrics, topologies and so on) collected from an element.
Analyze— Mechanisms to correlate and model complex situations (time series forecasting, queuing models). These mechanisms allow the autonomic manager to learn about the IT environment and help predict future situations.
Plan— Mechanisms to structure the action needed to achieve goals and objectives. The planning mechanism uses policy information to guide its work.
Execute— Mechanisms that control the execution of a plan with considerations for on-the-fly updates.
The monitor, analyze, plan and execute parts of the autonomic manager relate to the functionality of most IT processes. For example, the mechanics and details of IT processes like change management and problem management are different, but it is possible to abstract these into four common functions—collect the details, analyze the details, create a plan of action and execute the plan. These four functions correspond to the monitor, analyze, plan and execute components of the architecture.
The analyze and plan mechanisms are the essence of an autonomic computing system, because they encode the know-how to help reduce the skill and time requirements of the IT professional.
The knowledge part of the autonomic manager is where data and information used by the four components of the autonomic manager is stored and shared. Knowledge that can be found here includes policy, topology information, system logs and performance metrics.
The architecture prescribes a second set of sensors and effectors. This second set enables collaboration between autonomic managers. Autonomic managers can communicate with each other in a peer-to-peer context and with high-level managers.
Each autonomic self-management attribute of self-configuring, self-healing, self-optimizing, and self-protecting is the implementation of the intelligent control loop (in an autonomic manager) for different operational aspects of configuration, healing, optimization and protection. For example, an autonomic manager can self-configure the system with the correct software if software is missing. By observing a failed element, it can self-heal the system by restarting it. It can self-optimize the current workload if increased capacity is observed. If an intrusion attempt is detected, it can self-protect the systems by blocking the intrusion at the perimeter and by verifying the resource.
To understand how autonomic computing plays a role in different parts of the IT environment, it is important to view the IT environment at different levels. Self-management within each level involves implementing control loops to allow individual resources, composite resources and business solutions to monitor, analyze, plan and execute changes to their environment.
Figure A-35: Self management within each level of the IT environment
IBM provides a suite of management products that helps enable automation of routine management tasks for individual resource elements. IBM products, including the IBM Tivoli Monitoring family, IBM Tivoli Configuration Manager, IBM Tivoli Access Manager, and IBM Tivoli Storage Manager, begin to bring self-managing capabilities to the IT infrastructure for resource elements (systems, applications, middleware, networks and storage devices). IBM is working through IBM Server Group, IBM Software Group, and a variety of third parties to embed the appropriate technologies and enable resource elements to participate in the autonomic IT infrastructure.
At the composite resource level, the evolution to autonomic computing is enabled by the evolution to transaction-based management. In the past, resource elements were traditionally grouped by type (all servers), by location (all servers within a department or facility) or by function (all Web servers). As enterprises develop e-business environments, resources are increasingly aggregated within a transactional context spanning heterogeneous resources. For example, servers, applications, databases and storage devices that touch e-business transactions would be grouped separately from those assigned to human resources. If the composite resource grouping is homogenous (such as a server cluster) or heterogeneous (such as a Web server, database and storage system), the performance and availability requirements of different transaction types drive the autonomic activity on individual resource elements. The attainment of service-level objectives for IT transactions causes resources to be dynamically assigned, configured, optimized, and protected for changing business workloads. IBM Tivoli Monitoring for Transaction Performance, IBM Tivoli Storage Resource Manager, IBM Tivoli Identity Director, and Tivoli Configuration Manager are examples of IBM products that work together to enable the evolution to autonomics at the composite resource layer.
The highest layer of the IT environment is a business solution, such as a customer care system or an electronic auction system. The business solution layer requires autonomic systems management solutions that comprehend the state of business processes—based on policies, schedules, trends and service level objectives and their consequences—and drive the appropriate behavior for transactional systems and their underlying individual resources. Business-aware IBM products include IBM Tivoli Service Level Advisor, IBM Tivoli Business Systems Manager, and IBM Tivoli Systems Automation for S/390?
Making the IT infrastructure autonomic is an evolutionary process enabled by technology, but it is ultimately implemented by each enterprise through the adoption of these technologies and supporting processes. Figure A-36 on page 370 illustrates how an IT environment evolves towards a truly autonomic environment, from basic through managed, predictive, adaptive, and finally to a fully autonomic e-business environment.
Figure A-36: How an IT environment evolves towards a truly autonomic environment
The basic level represents a starting point where some IT environments are today. Each infrastructure element is managed independently by IT professionals who set it up, monitor it and eventually replace it.
At the managed level systems management technologies can be used to collect information from disparate systems onto fewer consoles, helping reduce the time it takes for the administrator to collect and synthesize information as the IT environment becomes more complex.
At the predictive level new technologies are introduced to provide correlation among several infrastructure elements. These elements can begin to recognize patterns, predict the optimal configuration and provide advice on what course of action the administrator should take.
As these technologies improve and as people become more comfortable with the advice and predictive power of these systems, they can progress to the adaptive level. The IT environment can automatically take actions based on the available information and the knowledge of what is happening in the environment.
To get to the fully autonomic level the IT infrastructure operation is governed by business policies and objectives. Users interact with autonomic technology tools to monitor business processes, alter | objectives, or both.
The following sections discuss the autonomic computing levels for each autonomic characteristic—self-configuring, self-healing, self-optimizing, and self-protecting. This can help you determine your current level of readiness, assess the capabilities of current tools and evaluate it within the context of a longer-term view.
An enterprise can greatly increase its responsiveness to both employees and customers with a self-configuring IT environment. With the ability to dynamically configure itself on the fly, an IT infrastructure can adapt immediately—and with minimal human intervention—to the deployment of new components or changes in the IT environment. For example, an e-business retailer dealing with seasonal workload peaks during the holiday shopping season or increased business for a particular event can use a self-configuring IT infrastructure to reassign servers from under-utilized pools to overutilized ones to match shifting workloads. Tivoli software management tools from IBM can allow you to provision a wide range of resources, including systems, applications, users and access privileges, and physical and logical storage. Monitoring and event correlation tools can help determine when changes in the IT infrastructure warrant reconfiguration actions. These tools can allow you to reconfigure your IT environment within minutes or hours rather than in days or weeks.
IBM has defined five implementation levels for a self-configuring IT infrastructure, based on the major capabilities that should ultimately exist for true autonomic functionality (see Figure A-37 on page 371).
Figure A-37: Five implementation levels for a self-configuring IT infrastructure
Level 1: Basic— The focus is on the ability to deploy, configure and change an individual system component, including system hardware configuration, storage hardware configuration, communication configuration and operating system configuration. Basic resource-specific tools are used to perform configuration actions. Configuration of multiple resources is done by logging on to each resource admin tool separately to perform the configuration action.
Level 2: Managed— The focus is on the ability to deploy and manage change to an aggregated group of systems. This includes managing multiple systems and system images, moving systems in and out of clusters, deploying applications to groups of machines, managing groups of users and moving storage in and out of storage networks. The concept of virtualized storage is introduced and the ability to monitor the collective system health and storage components to make decisions about how they might need to be reallocated and reconfigured is assessed.
Level 3: Predictive— The notion of managing based on role is introduced, including user role and system role, so that configurations can be appropriately tailored for their use. Configuration sensing (for example, inventory scanning) and runtime monitoring information is used to determine when corrective actions need to be taken. The administrator can initiate corrective actions based on system recommendation.
Level 4: Adaptive— The focus is on dynamically managing the configuration of the environment by leveraging sophisticated correlation and automation. The key notion is that the reconfiguration happens automatically and the IT infrastructure adjusts itself based on overall configuration health and role changes.
Level 5: Autonomic— Reconfiguration actions are taken within the context of overall business policies and priorities. Business impacts are assessed to determine the appropriate reconfiguration actions. It also includes the ability to provision proactively and anticipate issues that might jeopardize service levels before breaches actually occur.
Tivoli software products that can be used to implement a self-configuring environment include:
Tivoli Configuration Manager— Tivoli Configuration Manager configures automatically to rapidly changing environments. It provides an inventory scanning engine and a state management engine that can sense when software on a target machine is out-of-synch with a reference model for that class of machine. It can automatically create a customized deployment plan for each target and sequence the installation of software in the right order.
Tivoli Identity Manager— Tivoli Identity Manager automates user lifecycle management and integrates with HR and native repositories. It uses automated role-based provisioning for account creation. The provisioning system communicates directly with access control systems to help create accounts, supply user information and passwords and define account entitlements.
Tivoli Storage Manager— Tivoli Storage Manager provides self-configuring capabilities to perform tasks such as automatically identifying and loading the appropriate drivers for the storage devices connected to the server. Configuration and policy information can be defined once at a Tivoli Storage Manager configuration server and then propagated to a number of managed Tivoli Storage Manager servers. Policies and internal automation allow automatic extension of the server database, recovery log, or both when administrator-defined thresholds are reached.
A self-healing IT infrastructure can detect improper operation of systems, transactions and business processes (either predictively or reactively) and then initiate corrective action without disrupting users. Corrective action could mean that a component is altered or other components are altered to accept its workload. Day-to-day operations do not falter or fail because of events at the component level. The Tivoli software availability management portfolio from IBM provides tools to help customers monitor the health and performance of their IT infrastructure. These tools help allow monitoring of multiple metrics from a heterogeneous collection of resources and provide the ability to perform filtering, correlation and analysis. Based on the analysis, automated actions can be taken to cure problems even before they occur. Autonomic capabilities are provided at multiple levels to allow customers to understand business impacts and proactively manage the availability of the IT infrastructure workbench tools allow integration of third-party applications.
IBM has defined five implementation levels for self healing and availability management, based on the major capabilities that should ultimately exist for true autonomic functionality (see Figure A-38).
Figure A-38: Five implementation levels for self healing and availability management
Level 1: Basic— Systems administration and problem management is accomplished by using significant human processing power. Availability of systems is addressed in a reactive way. IT staff learns of problems from customers complaining about lack of service. Problem determination, correlation and cures are accomplished with a great deal of human intervention. Highly skilled IT staff are needed to debug problems.
Level 2: Managed— The focus is on the ability to collect and view availability information from remote locations. Many resources may be located outside the data center, perhaps in branch offices. Error logs can be accessed remotely. IT has deployed a set of monitoring tools to report on availability to a central location. Multiple system and network events can be filtered or manually correlated to identify the root cause of problems. Problems are fixed by skilled administrators.
Level 3: Predictive— IT has granular views into IT systems to accurately pinpoint the cause of outages. Complex, multiple metric collection is now possible, instead of single metrics. Filtering is now advanced and tied to correlation engines, allowing improved root cause problem determination to take place. Automated corrective actions are taken to known problems. These capabilities help customers prioritize which problem to repair first, based on the business impact of the outage.
Level 4: Adaptive— Systems can automatically discover, diagnose and fix problems on multiple monitored resources (operating system, application, middleware) across multiple monitored systems. The IT infrastructure availability is maintained automatically to keep in tune with predefined desired states. Outages don't bring down the system; it dynamically adapts to outages until repairs can be made to maintain service levels. For example, thresholds are temporarily raised to account for added workload.
Level 5: Autonomic— Problem determination and diagnosis depend on sophisticated knowledge already encoded in the system about components and their relationships. The inference capability allows the system to automatically figure out corrective actions within the right business context. For example, if a particular outage cannot be contained with available resources, lower-priority business applications may be shut down or run with degraded quality of service to keep higher priority business applications functioning.
Tivoli software products that can be used to implement a self-healing environment include:
IBM Tivoli Enterprise™ Console— Tivoli Enterprise Console?collates error reports, derives root cause and initiates corrective actions. The event server and correlation engine help allow cross-resource correlation of events observed from hardware, applications and network devices throughout an enterprise. Events from multiple resources can be analyzed in real time to automatically highlight the critical problems that merit attention versus the misleading symptoms and effects. After a problem is highlighted, the system takes self-healing actions by responding automatically when possible or efficiently guiding the support staff to the appropriate response.
IBM Tivoli Switch Analyzer— Tivoli Switch Analyzer correlates network device errors to the root cause without user intervention. It is a Layer 2 switch network management solution that provides automated Layer 2 discovery. It identifies the relationship between devices, including Layer 2 and Layer 3 devices, and identifies the root cause of a problem without human intervention. During a network event storm it can filter out extraneous events to correlate the true cause of the problem.
IBM Tivoli NetView?#8212; Tivoli NetView helps enable self-healing by discovering TCP/IP networks, displaying network topologies, correlating and managing events and SNMP traps, monitoring network health, and gathering performance data. Router fault isolation technology quickly identifies and focuses on the root cause of a network error and initiates corrective actions.
Tivoli Business Systems Manager— Tivoli Business Systems Manager collects real time operating data from distributed application components and resources across the enterprise and provides a comprehensive view of the IT infrastructure components that make up different business solutions. It contains technologies that analyze how an outage would affect a line of business, critical business process, or service level agreement (SLA).
Tivoli Systems Automation S/390— Tivoli Systems Automation S/390 manages real time problems in the context of an enterprise's business priorities. It provides monitoring and management of critical system resources such as processors, subsystems, and Sysplex Timer?and coupling facilities. It supports self-healing by providing mechanisms to reconfigure a processor's partitions, perform power-on reset on IML processors, IPL operating systems (even automatically), investigate and respond to I/O configuration errors, and restart and stop applications if failures occur.
IBM Tivoli Risk Manager— Tivoli Risk Manager enables self-healing by assessing potential security threats and automating responses, such as server reconfiguration, security patch deployment, and account revocation. This helps enable system administrators who are not security experts to monitor and assess security risks in real time with a high degree of integrity and confidence across an organization's multiple security checkpoints. This product contains technology from IBM Research.
IBM Tivoli Monitoring for Applications, IBM Tivoli Monitoring for Databases, and IBM Tivoli Monitoring for Middleware— This family of products minimizes vulnerability by discovering, diagnosing and reacting to disruptions automatically. It provides monitoring solutions and a local automation capability through a set of Proactive Analysis Components. A sophisticated resource model engine allows for local filtering of monitored data, raising events when specific conditions are met. Local rules can be encoded to take immediate corrective action, providing automatic recovery for server failures.
Tivoli Storage Resource Manager— Tivoli Storage Resource Manager automatically identifies potential problems and executes policy-based actions to help prevent or resolve storage issues, minimize storage costs and provide application availability. It can scan and discover storage resources in the IT environment. It supports policy-based automation for the allocation of storage quotas and storage space, monitors file systems and provides reports on capacity and storage asset utilization.
Self-optimization is the ability of the IT infrastructure to efficiently maximize resource allocation and utilization to provide Quality of Service for both system users and their customers. In the near term self-optimization primarily addresses the complexity of managing system performance. In the long term self-optimizing software applications may learn from experience and proactively tune themselves in an overall business objective context. Workload management uses self-optimizing technology to help optimize hardware and software use and verify that service level goals are being met. Predictive analysis tools provide views into performance trends, allowing proactive action to be taken to help optimize the IT infrastructure before critical thresholds are exceeded.
IBM has defined five implementation levels for a self-optimizing IT infrastructure that can optimize workloads and transaction performance across multiple resources (see Figure A-39).
Figure A-39: Five implementation levels for a self-optimizing IT infrastructure that can optimize workloads and transaction performance across multiple resources
Level 1: Basic— Individual resources provide point data regarding individual component performance or utilization, allowing users a simple view of how workload affects a single system. Basic tools allow dynamic viewing of components, but comprehensive views of system performance are still put together manually by looking at multiple local views and reports available with the resource-specific tools.
Level 2: Managed— Management tools allow information on resource utilization and performance to be gathered and collected in a central location. Simple, comprehensive transaction views are possible using techniques such as round-trip measurements, synthetic transactions or end-user client-capture capabilities. Many resources in the middle of a transaction are invisible or not instrumented, and many resources are often located outside the data center—perhaps in branch offices or other locations. Optimizing the IT components is still done manually and with trial and error.
Level 3: Predictive— Tools now provide value by creating detailed, comprehensive transaction views and can break down the composite view of the transaction across the resource elements. Resources can be grouped by transaction types, service levels can be monitored and automated tools provide notifications of impending violations—allowing manual reconfiguration of the IT environment. Predictive tools can perform trend analysis on historical data and provide recommendations.
Level 4: Adaptive— Instrumentation is now available on the composite resources to allow changes of status and automated balancing of work when overload or underload conditions exist across resources in the environment. This level of control provides users with the ability to manage comprehensive performance and effectively meet SLAs.
Level 5: Autonomic— Workload balancing and transaction optimization is done within the business context. Business trade-offs are expressed in machine-processable format, allowing IT management tools to dynamically reallocate resources based on varying business needs. Automated tuning of servers, storage, and networks takes place to maintain quality of service for high-priority business applications.
Tivoli software products that can be used to implement a self-optimizing environment include:
Tivoli Service Level Advisor— Tivoli Service Level Advisor helps prevent SLA breaches with predictive capabilities. It performs trend analysis based on historical performance data from Tivoli Enterprise Data Warehouse and can predict when critical thresholds could be exceeded in the future. By sending an event to Tivoli Enterprise Console, self-optimizing actions can be taken to help prevent the problem from occurring.
IBM Tivoli Workload Scheduler for Applications— Tivoli Workload Scheduler for Applications automates, monitors, and controls the flow of work through the IT infrastructure on both local and remote systems. It can automate, plan, and control the processing of these workloads within the context of business policies. It uses sophisticated algorithms to maximize throughput and help optimize resource usage.
Tivoli Business Systems Manager— Tivoli Business Systems Manager enables optimization of IT problem repairs based on business impact of outages. It collects real time operating data from distributed application components and resources across the enterprise and provides a comprehensive view of the IT infrastructure components that make up different business solutions. It works with Tivoli Enterprise Console to enable self-optimizing actions to help prevent poor performance from affecting a line of business, critical business process, or SLA.
Tivoli Storage Manager— Tivoli Storage Manager supports Adaptive Differencing technology to help optimize resource usage for backup. With Adaptive Differencing, the backuparchive client dynamically determines an efficient approach for creating backup copies of just the changed bytes, changed blocks or changed files, delivering improved backup performance over dialup connections. These technologies allow just the minimum amount of data to be moved to backup, helping optimize network bandwidth, tape usage, and management overhead.
Tivoli Monitoring for Transaction Performance— Tivoli Monitoring for Transaction Performance helps customers tune their IT environments to meet predefined service level objectives. It enables organizations to monitor the performance and availability of their e-business and enterprise transactions to provide a positive customer experience. It integrates with the Tivoli Enterprise Console environment for alerting and proactive management, helping enable optimization of resource usage from a transactional perspective.
IBM Tivoli Analyzer for Lotus Domino— Tivoli Analyzer for Lotus Domino contains a Proactive Analysis Component that allows administrators to verify the availability and optimal performance of Lotus Domino servers. It provides intelligent server health monitoring and expert recommendations to correct problems.
A self-protecting IT environment can take appropriate actions automatically to make itself less vulnerable to attacks on its runtime infrastructure and business data. These attacks can take the form of unauthorized access and use, malicious viruses that can format hard drives and destroy business data, and denial-of-service attacks that can cripple critical business applications.
A combination of security management tools and storage management tools are necessary to deal with these threats. Security management tools can help businesses consistently enforce security and privacy policies, help reduce overall security administration costs, and help increase employee productivity and customer satisfaction. Critical configuration changes and access-control changes should only occur with the right approvals. Tools should detect violations of security policy, and if necessary, automated actions should be taken to minimize risk to IT assets. Tivoli software storage management tools help enable businesses to automatically and efficiently back up and protect business data. Autonomic security and storage solutions provide administrators with a way to create policy definitions and express event correlation and automation knowledge.
IBM has defined five implementation levels for a self-protecting IT infrastructure (see Figure A-40).
Figure A-40: Five implementation levels for a self-protecting IT infrastructure
Level 1: Basic— Localized security configuration requires administrators to configure each component independently and manually track changes. Local backup and recovery tools are used to protect data. Audit reports are on a per-machine basis. A great deal of human intervention is required to protect the runtime and business data.
Level 2: Managed— Management tools are used to centralize security administration, allowing the centralized creation of user IDs and controlling access privileges to resources. Intrusion sensing and auditing tools are used to collect data about intrusion attempts. These are manually reviewed, and corrective actions are taken to protect against future attacks. Centralized backup and recovery tools provide an incremental backup capability across multiple resources.
Level 3: Predictive— Security policies can be consistently administered across the enterprise manually, using enterprise-wide security management tools. IDs and access privileges are coordinated across multiple applications and can be consistently revoked if necessary. Perimeter sensors can detect security violations and correlate them to detect attacks. Security tools provide recommendations for corrective action.
Level 4: Adaptive— Security management focuses on advanced automation, including automatically enabling new users and disabling IDs for those who leave. It automatically grants access to systems and applications needed to do a new job while disabling access to systems associated with the old job. If access control violations and intrusions are detected, automatic reconfiguration actions are initiated to help quarantine systems and disable access to IDs.
Level 5: Autonomic— The focus is on learning systems and systems that can adapt lower-level resource policies in response to higher-level business policy. Collaboration across system components makes it possible to reconfigure systems on the fly, automatically apply security patches when necessary, modify intrusion monitoring levels based on business needs, and adapt policies to help prevent future problems based on past history.
Tivoli software products that can be used to implement a self-protecting environment include:
Tivoli Storage Manager— Tivoli Storage Manager self-protects by automating backup and archival of enterprise data across heterogeneous storage environments. Scaling to protect thousands of computers running a dozen operating system platforms, its intelligent data movement and store techniques and comprehensive automation help reduce administration costs and increase service levels.
Tivoli Access Manager— The Tivoli Access Manager family of products self-protects by helping prevent unauthorized access and using a single security policy server to enforce security across multiple file types, applications, devices, operating systems, and protocols. It supports a broad range of user authentication methods, including Web single sign-on, and has the ability to control access to many types of resources for authenticated users.
Tivoli Identity Manager— Tivoli Identity Manager self-protects by centralizing identity management, integrating automated workflow with business processes, and leveraging self-service interfaces to increase productivity.
Tivoli Risk Manager— Tivoli Risk Manager provides system-wide self-protection by assessing potential security threats and automating responses, such as server reconfiguration, security patch deployment, and account revocation. It collects security information from firewalls, intrusion detectors, vulnerability scanning tools, and other security checkpoints. It simplifies and correlates the vast number of events and alerts generated by numerous security point products and quickly identifies the real security threats to help administrators respond with adaptive security measures.
Companies want and need to reduce their IT costs, simplify management of their IT resources, realize a fast return on their IT investment, and provide high levels of availability, performance, security, and asset utilization. Autonomic computing helps address these issues. IBM is a leader in the evolution to autonomic computing and offers integrated systems management solutions for resource management, transaction-oriented management, and business-solution management that span the four autonomic computing disciplines of self-configuring, self-healing, self-optimizing, and self-protecting.
|< Day Day Up >|| |