6.3 Extensible Markup Language

6.3.1 Common Problems of the Web

In order to move on to do real e-business on the Web, it is necessary to understand why this is not possible using current technologies. This especially refers to the HTML standard that hinders the development of new applications on the Internet, as it was not designed to do anything but present documents in a Web browser. E-business has requirements other than displaying documents. Among other requirements, documents need to be displayed, processed , rearranged, stored, forwarded, exchanged, encrypted, and signed. Using HTML, it is difficult to express the hierarchical relationship of data values (known from database records and object hierarchies). HTML reflects structure and presentation, but conveys nothing about the meaning of the marked -up document.

Today, most applications are tied to the browsers, but many corporations have applications installed that are not able to display the information in HTML, yet need to exchange the information over the Internet. Many customers also want Internet applications to have the look and feel of their applications, and this can be achieved by launching external applications from within the browser. The best solution is to have Web applications that understand the Internet protocols, such as HTTP and TCP/IP, that do not require a Web browser. This allows existing applications to be extended to talk to other resources, such as databases and applications over the Internet. While the Internet protocols help to establish the communication, XML enables the exchange of data between applications that usually have totally different data formats.

In order to create intelligence on the Web, search engines need to understand the content of Web pages, but so far they are not able to. In searching for a certain piece of information, it is highly likely that you get one good result and at least 100 incorrect ones (some search engines are even worse by a factor of 10 to 100). The problem is that search engines normally only index a set of words, document titles, URLs, and metatags , but do not know anything about the structure of the document. A search engine cannot decide if a document is a news article or a thesis, for example. There is no way to mark up the significant portions of a document to focus on the important parts and ignore the noise (such as copyright statements, navigational bars, and design elements). This would allow a much finer granularity of control over search engines. By adding additional attributes to Web elements, this can be achieved. Let's say you are researching information on a singer who also acts and writes (such as Cher or Madonna). It would be good to have a classification of the function of the person on the Web site. If tags like <singer> , <actor> , <author> could be used in HTML, the number of direct hits would be much higher. With XML, these tags can be easily defined and used.

Another common problem on the Web is the collection of related pages and saving them to your hard disk or printing them out. The current method is to save or print them on a page-by-page basis, which can become really annoying if there are more than ten pages. In many cases it is also difficult to identify the other parts of a particular collection, as the document that links all resources together is not known to the person who looks at a particular page. Often a link is not provided, as the owner of the link is aware that the document is part of a larger collection. In order to express the interrelationship, special metadata [11] should be attached to the documents, making it easier to find the other documents related to the topic of a particular search. Although adding metadata is possible in HTML, the information is restricted to the whole document, not to only parts of it that may be of interest for a particular search. Using XML, it is possible to create metadata for all text elements.

[11] Metadata is information about information and is an established standard for HTML pages using the tag <meta> .

6.3.2 Moving to XML

XML is an ISO-compliant subset of the Standard Generalized Markup Language (SGML), a system for organizing and tagging elements of a document. XML is extensible because it is a metalanguage , which enables someone to write a document type definition (DTD) like HTML 4.0 and define the rules of the language so the document can be interpreted by the document receiver. XML is like an alphabet for building new languages and gives companies a way to start with a common foundation and a common alphabet. Every industry is able to define the specific terms they use.

Unlike HTML, the layout is not defined in the XML file, nor is the sequence of the text on the screen. The semantics and the structure of the data is preserved. The data is organized as in an object-oriented database. XML is about creating, sharing, and processing information. The purpose of XML is to provide an easy-to-use subset of SGML that allows for custom tags to define, transmit, and interpret data structures between organizations. These tags look like HTML tags, but describe the meaning of the information in a format that is predictable and precisely defined.

The widespread introduction of XML into Internet applications will change the way we experience the Web today and remove two constraints that are holding back Web development: its dependence on a single, inflexible document type (i.e., HTML) and the complexity of the full SGML, whose syntax is very powerful but extremely complex. XML reduces the complexity of SGML and enables the development of user -defined document types on the Web. Some say that XML provides 80 percent of the benefits of SGML with only 20 percent of the effort.

Although HTML will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure. E-business on the Internet will work only if the information that is transported is not restricted to one make or model or manufacturer. Information can also not cede control of its data format to private hands. In order to save time and effort, the information needs to be provided in a form that allows it to be reused in many different ways.

The presentation of XML documents can be implemented by using the Document Style and Semantics Specification Language (DSSSL) [12] , Cascading Style Sheets (CSS) [13] specification, or the Extended Stylesheet Language (XSL). While DSSSL is not widely used, and CSS is mainly used for display on the Web, XSL is becoming the predominant way of representing XML on different platforms.

[12] DSSSL is a standard for the processing of SGML (Standard Generalized Markup Language) documents. It describes how such a structured document might be presented visually, or converted, or processed in some other way

[13] CSS specifies the possible style sheets or statements that may determine how a given element is presented in a Web page.

6.3.3 XML Applications

XML is slowly finding its way into Internet applications. Although mostly invisible to the end user, many XML applications have already been created to simplify the processing of documents by moving from application-specific data formats to XML.

XSL

Similar to CSS, XSL separates the content from representation. It specifies the formatting characteristics of XML documents on the Web, while CSS specifies the formatting characteristics of HTML documents.

While CSS has its own proprietary syntax, XSL itself has been written in XML and can be extended through JavaScript, scripting language for the manipulation of HTML and XML documents. The formatting model is the same as in CSS and the highly complex DSSSL.

Although it is possible to use CSS for formatting HTML tags, it is not necessary, as all HTML tags have a predefined representation in a Web browser. XML tags are highly dynamic, and none of them has a predefined representation in a Web browser. A designer may create the tag <box>x , which is perfectly valid in XML, if defined in a DTD, but no browser will know how to format this tag. XSL is able to add the missing style information to the XML tag.

Although CSS can also format XML documents, it can only be used for rather simple documents. But XML has been invented to create highly structured and data-rich documents. Unlike CSS, XSL can also transform XML documents, moving an existing document in a form to another document in another form. XSL is able to dynamically render a page when elements need to be rearranged, while CSS can represent the data only in the form in which it was originally placed in the file. XSL can be used, for example, to rearrange Web content for printing to fit better on a printed page, without the need of downloading another version of the same document.

Although XSL is extensible through JavaScript, many developers feel that its features can be replaced by JavaScript and the Document Object Model (DOM), which specifies how objects in a document (text, images, headers, links, etc.) are represented.

SMIL

Based on XML, the Synchronized Multimedia Integration Language (SMIL, pronounced "smile") has been created by the W3C [14] and is a powerful way to synchronize any type of media (e.g., audio, video, text, and graphics) and build time-based, streaming multimedia presentations without having to learn a complex programming language.

[14] http://www.w3c.org/AudioVideo/

Until now, it was necessary to use either programming languages such as Java to implement complex TV-like content or multimedia applications such as Macromedia's Director. [15] Using simple instructions that are similar to HTML, you can build complex animations. By using an interpreted language, the time needed to download multimedia content decreases dramatically.

[15] http://www.macromedia.com/

The major difference between SMIL and Director is that SMIL does not create one large file that needs to be downloaded. Instead, images, sounds, and animations are downloaded one after the other in the order of appearance in the presentation. If components are used in several multimedia presentations, the browser may have some already in the cache, reducing the download time even further. The customer is able to see the beginning of the multimedia much earlier and is able to decide if it's worth waiting for the rest of it.

Just as with HTML pages, replacing components is easy and does not require you to rebuild the complete page. With SMIL, you can replace components and use the presentation in an instant, without interrupting the service for your customers. The authoring process can be simplified by using SMIL. It also supports hyperlinks in order to offer interactivity.

SMIL was developed to complement other Web technologies, such as Dynamic HTML (DHTML) and DOM. Although Microsoft initially supported the SMIL initiative, the company now contends the technology is no longer compatible with its media player. Perhaps because they developed their media player at the same time, most other technology companies are jumping onto the SMIL bandwagon.

SMIL is not meant to be a replacement for existing multimedia technologies. It is possible to join the media formats and create an even richer experience.

RDF

Another very interesting application is the Resource Description Framework (RDF), [16] which has been developed by the W3C. RDF adds metadata to Internet resources, whereby a resource can be any object on the Web, such as a Web page, image, or sound. The metadata can be used to find a resource by adding a detailed description and keywords to the metadata, to rate the content, and to digitally sign an object on the Internet.

[16] http://www.w3c.org/RDF/

The problem with the metadata of HTML pages is that computers are not able to understand the information. If the description of two Web pages is similar and points to the same type of information, such as "Germany is a country in Europe" and "Germany is a European country," a computer will not be able to detect this without extensive programming. RDF associates unambiguous methods of expressing these statements so that a machine can understand that they have the same meaning.

Although RDF is able to improve search results on the Web, it is not restricted to this application. It is able to describe individual elements and the relationships between them.

Therefore, the W3C has developed a data model and a syntax for the RDF. The difference between RDF and similar frameworks is that RDF has been developed especially for the Web. The syntax for RDF is based on a special data model that defines the way properties are described. It represents the properties of a resource and the values of the properties.

Although RDF has been developed independently of XML, it can be easily represented in extensible markup language. Therefore, the names of the properties and the values are not predefined, but can be chosen by those responsible for the Internet object. The creator of an RDF record can choose which particular properties or sets of properties will be used. In order to ensure the uniqueness of every RDF record, it uses the namespace mechanism, which is also used in XML and the Internet. The namespace is the set of names in a given naming system.

RDF is already used on the Internet. Netscape uses RDF to index site content in order to allow users to find information more quickly. The feature "What's Related" in Netscape Communicator 4.5 (and above) uses RDF to display related sites within the browser. Millions of users worldwide are using this feature, and it is so far the most popular XML application.

WSDL

As communications protocols and message formats are standardized in the Web community, it becomes increasingly possible and important to be able to describe the communications in some structured way. The Web Services Description Language (WSDL) addresses this need by defining an XML grammar for describing network services as collections of communication endpoints capable of exchanging messages. WSDL service definitions provide documentation for distributed systems and serve as a recipe for automating the details involved in applications communication.

WSDL files are a subset of the registries in UDDI [17] and ebXML [18] . It is an XML vocabulary that provides a standard way of describing service IDLs. [19] It provides contact information, descriptions of the Web services, their location, and specification on how to invoke them.

[17] http://www.uddi.org/

[18] http://www.ebxml.org/

[19] The Interface Definition Language (IDL) is the prevalent language used for defining how components connect together.

WSDL is the resulting artifact of a convergence of activity between NASSL (by IBM [20] ) and SDL (by Microsoft [21] ). It provides a simple way for service providers to describe the format of requests and response messages for remote method invocations (RMI).

[20] http://www.ibm.com/

[21] http://www.microsoft.com/

A WSDL document defines services as collections of network endpoints, or ports. In WSDL, the abstract definition of endpoints and messages is separated from their concrete network deployment or data format bindings. This allows the reuse of abstract definitions: messages, which are abstract descriptions of the data being exchanged, and port types, which are abstract collections of operations. The concrete protocol and data format specifications for a particular port type constitute a reusable binding. A port is defined by associating a network address with a reusable binding, and a collection of ports defines a service.

The UDDI registry is broken down into industry categories and geographic locations. A WSDL file is often generated from another information source, like a Component Object Model (COM) [22] IDL or Common Object Request Broker Architecture (CORBA) [23] file or Enterprise Java Beans (EJB) [24] class definition. The current WSDL specification details how to map messages and operations to HTTP GET/POST, SOAP v1.1, and MIME, a specification for formatting non-ASCII messages.

[22] COM is Microsoft's framework for developing and supporting program component objects. It is aimed at providing similar capabilities to those defined in CORBA.

[23] http://www.corba.org/

[24] http://java.sun.com/products/ejb/

It is important to observe that WSDL does not introduce a new type of definition language. WSDL recognizes the need for rich type systems for describing message formats and supports the XML Schemas specification (XSD) as its canonical type system. However, because it is unreasonable to expect a single type system grammar to be used to describe all message formats in the present and future, WSDL allows the use of other type definition languages via extensibility.

In addition, WSDL defines a common binding mechanism. This is used to attach a specific protocol or data format or structure to an abstract message, operation, or endpoint. It allows the reuse of abstract definitions.

WSFL

To compose more complex services out of existing service components, it is necessary to provide a means for describing the workflow. The Web Services Flow Language (WSFL) is an XML language for this particular task. WSFL considers two types of Web services compositions:

  1. Usage Patterns Describes workflow or business processes.

  2. Interaction Patterns Describes overall partner interactions.

In Chapter 2 we provided a series of example scenarios where WSFL would play a key role. If you look, for example, at Figure 2.6, you can see the need for several workflows. The different services that are available on the market are brought into a specific context to form a dedicated service for construction sites.

6.3.4 Other Applications

XML applications are developed not only by the W3C, but also by companies and other organizations. The W3C is offering a standard set of applications, but every company is free to develop new and innovative applications based on XML, which can be for private, internal use only or can be made public for external review and use. For example, engineers at NASA [25] plan to use XML to develop an instrument control language for infrared devices on satellites and space telescopes. The XML syntax will be used to describe classes of infrared instruments, control procedures, communications protocols, and user documentation. Computers will parse the tagged data and generate instrument control code, most likely in Java.

[25] http://www.nasa.gov/

Siemens [26] is using a system that allows employees to submit their timecards and lets managers approve them online. Each timecard submission and approval is tied to basic human resource (HR) data such as name , serial number, and employee type. The timecard validation depends on pay code rules and frequency tables. Managers have the ability to temporarily delegate approval responsibilities, and the entire system interfaces to the corporate-wide directory service database as well as to the payroll system.

[26] http://www.siemens.com/

In order to achieve this goal, the data between the different systems needs to be interchanged. Therefore, a data format is required that can be shared among the applications. XML serves as the data interchange format in a time and attendance system and enables the integrators to reuse interface code, extend and modify data structures to accommodate personalization and internationalization, and design the system without worrying about limitations imposed by data sources. Data fields can be added without disrupting the existing structure and applications. Global development and the different holiday schedules of each country can be easily implemented with XML. The developer can use the same pay-period DTD, but just drop in a new holiday attribute. XML makes the localization of an application easy, maximizing code and data model reuse. Many other companies have started to use XML to integrate their existing applications and exchange data between different platforms and database systems.

6.3.5 Business via XML

XML has become the industry-neutral standard for information exchange. XML is used throughout the industry in a similar paradigm as Electronic Data Interchange (EDI) [27] to exchange information in a very structured and predefined way. The difference between EDI and XML is that EDI had a limited set of structures that were accepted, and a new information structure needed months or even years to go through all instances to become a new industry standard. XML allows anyone to create new data structures on the fly. While this is great for communication between two parties, it poses a problem for industry-wide exchange of information. If every company develops its own XML standards, this could lead to incompatibilities and additional overhead for converting the data. The problem is less the different order of information and more that some companies may omit data or add information that other companies will not be able to process.

[27] EDI is a standard format for exchanging business data.

To automate the value chain and the intercompany business processes, it is necessary to define XML data structures that contain all information for a certain industry with the option of omitting or adding information for the exchange of data. Many fear that the independent software vendors will drive incompatible versions of XML that best fit their own product strategies.

XML has been integrated by PeopleSoft, Oracle, and Baan into their Enterprise Resource Planning (ERP) systems. SAP is integrating XML into its Business Application Programming Interfaces (BAPI), which give developers access to the internal workings of the company's R/3 software. If the manufacturers of ERP software agree on a common XML format, developers will gain a standardized, vendor-neutral way to access human resources, financial, and manufacturing data stored in these systems. But if the software manufacturers are not able to find a common standard, XML will not resolve the problem of proprietary Application Programming Interfaces (APIs), which made life difficult in the past.

Today, vertical XML vocabularies have already been implemented, which enable single industries to exchange information. So far, these vocabularies have been introduced for the financial sector, the content management industry, air traffic control, and the footwear business. Although this helps each single industry, there still needs to be crossindustry standards within the business software products. Having several XML flavors accepted may still be valuable , but it would also be limiting and would not achieve the actual goal of reducing costs and increasing automation.

While the independent software manufacturers need to adopt a generic XML standard, at the same time, each industry needs to develop an XML schema. For example, it is necessary to define data structures for types of computers, reseller locations, configuration restrictions, and pricing models. These XML schemas are developed by standards bodies, such as the W3C, Commerce.Net, [28] RosettaNet, [29] and the Organization for the Advancement of Structured Information Standards (Oasis). [30] They are defining links within and between industries in a vendor-neutral way. E-business, supply chains, and other areas have already been addressed.

[28] http://www.commerce.net/

[29] http://www. rosetta .net/

[30] http://www.oasis- open .org/

6.3.6 Standard XML Schemas

To ensure interoperability between systems, standard XML schemas are a must. The problem is that all the standards bodies are developing different standards for the same areas. For some time, it looked like the usual competitors would start a new standards battle over XML schemas just as we have seen battles over the right way to interpret HTML or the split in the industry over how Java should evolve .

Two portals have been set up that represent the two industry camps that want XML to follow their ways. The first is XML.org, [31] which has been developed by Oasis and is backed by software makers such as IBM, Sun, Novell, and Oracle. The portal has been established since 1998.

[31] http://www.xml.org/

On the other side of the fence, Microsoft launched its BizTalk [32] initiative in May 1999, which has been established as an XML design clearinghouse, developer resource, and repository for XML schemas. To make BizTalk a success, Microsoft is backed by ERP software manufacturers such as SAP, e-commerce software and service providers such as Ariba, [33] and partners in the industry such as Boeing . [34]

[32] http://www.biztalk.org/

[33] http://www.ariba.com/

[34] http://www.boeing.com/

The XML portals provide a forum for XML schemas that have been designed for specific industries, such as the financial sector, health care, and insurance companies. Microsoft's BizTalk portal has raised suspicion among competitors that fear that Microsoft wants to take over the XML software application industry by defining its own standards without the consensus of the rest of the industry. Parts of the industry fear that this initiative could splinter the XML market.

Fortunately, Microsoft reconsidered its position regarding XML and joined Oasis in June 1999 to reduce the fears of the market. Microsoft's decision to back Oasis has eased several industry fears and made it possible to develop a common framework for XML applications on the Internet.

6.3.7 ebXML

The ebXML standard (Electronic Business XML) was developed by Oasis [35] and UN/ CEFACT [36] to help make XML the worldwide language for electronic data transactions, much as English has become the standard vernacular for international business transactions. "The ebXML architecture begins with a business process and information model, maps the model to XML documents, and defines requirements for applications that process the documents and exchange them among trading partners." As with the UDDI registry, the ebXML registry lists a company's capabilities in a standard profile, allowing businesses to find one another through the registry, define agreements, and exchange XML messages that facilitate business transactions. "The goal is to allow all these things to be performed automatically, without human intervention, over the Internet."

[35] http://www.oasis.org/

[36] http://www.unece.org/cefact/



Radical Simplicity. Transforming Computers Into Me-centric Appliances
Radical Simplicity: Transforming Computers Into Me-centric Appliances (Hewlett-Packard Press Strategic Books)
ISBN: 0131002910
EAN: 2147483647
Year: 2002
Pages: 88

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net