Projects and Products that Embody Some Aspects of Interpretation


Several products are available that perform some aspect of the interpretation of unstructured data.

Verity,[32] Autonomy,[33] Inktomi,[34] and Inxight[35]

Verity, Autonomy, Inktomi, and Inxight each offer products that help with the categorization of unstructured data. Each uses a variety of techniques, especially statistical algorithms, to achieve an approximation of the meaning of the unstructured data in documents submitted to their servers. Each is also involved in document storage, publishing, and so on, but we are primarily concerned with the ability to infer knowledge from unstructured, uninterpreted data.

Applied Semantics[36]

A number of companies process documents and extract semantic information from them. One of them is Applied Semantics, whose Circa technology claims to be able to understand and categorize documents based on their content and a series of taxonomies. This is profoundly important. We should expect to see several derivatives of this emerging over the next few years. I expect that search engines based on semantic categorizations will displace keyword searches for most purposes. I also expect that mining a company's unstructured documents will create a treasure trove of useful structured information. Google has acquired this company.

Cyc

A project led by Doug Lenat, Cyc's ambitious goal is to create a commonsense AI rules base; that is, a rules engine and a set of rules that would allow a system to reason at a commonsense level.[37]

Commonsense reasoning allows us to take one set of discovered information ("John was reprimanded by his boss") and infer other information ("John has a job," "John may have transgressed some rules or performed at less than standard levels," etc.).

This is a powerful force in semantic interpretation, because often the literal information in a document, even if it is categorized into proper taxonomies, is not sufficient to reason about. Some of our most important information is extremely brief (e.g., emails) and involves a great deal of inferring based on prior knowledge or common sense to properly interpret.

After almost two decades of work, and with a knowledge base of more than a million concepts, Cyc is complete enough to be sold and used as the basis for other prototypes. Several large corporations have licensed the rule base for inclusion in products to make them more user friendly. There has also been some activity to make the rule base or some part of it available on a more open basis.

ThoughtTreasure[38]

ThoughtTreasure is an IBM research project led by Eric Mueller. It uses the agent-based approach described previously. The various agents run from parsers and lexical agents to analogy agents and planning agents. Although analogy agents may sound much more complex than lexical parsers, the insight is that once the other agents are in place the actual work an analogy agent has to do is minimal.

[32]See http://www.verity.com/ for further information.

[33]See http://www.autonomy.com/ for further information.

[34]See http://www.inktomi.com/ for further information.

[35]See http://www.inxight.com/ for further information.

[36]See http://www.appliedsemantics.com/ for further information.

[37]See http://www.opencyc.org for further information.

[38]See http://www.signiform.com/tt/htm/tt.htm for further information.




Semantics in Business Systems(c) The Savvy Manager's Guide
Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)
ISBN: 1558609172
EAN: 2147483647
Year: 2005
Pages: 184
Authors: Dave McComb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net