Metadata


Much has been written about the importance of metadata and the significance it is having on describing information. Metadata is information about information. It makes understanding content easier for humans and perhaps more importantly machines. It helps in organizing and improving the semantics of the information we so often search, exchange, and accumulate. Part of its appeal is in the simplicity with which it can be represented. For instance, things like title, author, subject, and date represent descriptive information (metadata) about a document or publication that might appear on a Web site. Search engines capable of interpreting and automatically generating metadata are already at a distinct advantage. The effectiveness and relevance of search results can be radically improved by using metadata. It's akin to a semantic map for content.

The Web contains a wealth of knowledge and unfortunately a wealth of worthless, outdated information rubble. Sifting through the rubble to find relevant information is difficult, even for the experienced "search-ologist." Using metadata effectively enables a search engine to use more than simple keyword matching, or pattern recognition and repetition techniques. Metadata can be used to improve search effectiveness by associating descriptive information with each searchable item. For instance, rather than searching by simple filenames associated with audio, video, or document files, metadata provides a means to classify, categorize, and build rich ontologies.

Music file metadata might be used to categorize by title, recording artist, length, and bit rate of encoding:

 <title>  <album> <artist> <length> <bitRate> 

Video file metadata might include video title, director, and technical specifications:

 <title>  <director> <resolution> <colorDepth> <codec> 

Document file metadata could include the author, title of the document, and the version requested:

 <author>  <title> <version> 

If all this metadata is searchable, it greatly increases the search accuracy of search engines. P2P applications are using metadata to automatically arrange imported files into a personalized media library. They are being used to improve the cataloging capabilities of exchanged information, such as documents, emails, meeting notes, and so on.

P2P applications are even using XML to describe P2P services. In fact, like Web services, a whole new area of service description and message routing is being built on XML and XML metadata. All aspects of JXTA are being built on XML to structure data as advertisements, messages, and protocols. Jabber is being built as an XML router that relies heavily on XML namespaces to provide extensibility, as seen in Figure 5.3.

Figure 5.3. Jabber is defining an XML-based backbone that goes beyond integrating proprietary IM systems. Jabber is promoting XML routing as an integration technology.

graphics/05fig03.gif

The Role of XML

XML has become the description language of choice for most metadata enthusiasts. XML is a good choice for representing data for a number of reasons:

  • XML is a simple markup language based on an existing standard (SGML), and is a natural extension to HTML.

  • Its text-based encoding makes it easily processed by programming languages. Java specifically has defined an entire set of packages related to parsing, formatting, messaging, and binding objects to XML.

  • XML is capable of being edited by simple text editors, and there is an abundance of tools and resources available to support its adoption.

  • XML is capable of expressing complex hierarchical relationships and links between documents and objects.

  • XML is extensible and open, enabling authors to define their own set of markup tags to define and structure data.

  • XML is widely adopted, and is a standard supported by the World Wide Web Consortium (www.w3.org).

The importance of XML and XML-defined metadata will continue to grow. Chapter 8, "P2P Data Formats and Interchange" will take a more in-depth look at how metadata is being used in P2P applications.



JavaT P2P Unleashed
JavaT P2P Unleashed
ISBN: N/A
EAN: N/A
Year: 2002
Pages: 209

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net