Threaded Conversation Databases | Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)

Groups that work together on a project or process often wish to record, share, track, and refer back to their work. Producing a manual (or a magazine), tracking a complex order or service transaction, and storing histories of problems and their solutions are just a few ways that strung-together chains or threads of conversations have proven their usefulness to organizations.

The basic feature list for a product that collaborators would find useful includes:

a central storage location to maintain consistency;
a structuring system that captures messages and responses in order and in the right context;
remote access to the message store for users;
control over privacy and access;
the ability to move, reorder, delete, and duplicate messages;
the ability to attach nontext documents, such as spreadsheet files, images, and audio clips, to messages and the ability to view or otherwise access them.

Two unrelated technologiesdatabases and e-mailare tempting prospects for the job of recording, sharing, and retrieving collaborative work. Unfortunately, both candidates are not quite right for the job, though it's not immediately obvious why.

Why E-Mail Can't Handle It

E-mail is a point-to-point technology; an e-mail post office is a transient storage facility, designed to move messages in and out rather than to keep them in one place. If only two people work on a project, an e-mail program with a reply function can serve as an adequate way to track the conversation. Each collaborator has a complete copy of both sides of the ongoing dialogue as the project moves forward. But there is not a single location for the data; responsibility for saving it, much less backing it up, can be clarified only by rules external to the e-mail program.

Now consider what happens with three collaborators, J, K, and L. First of all, there has to be an external rule that each collaborator will cc: the person they are not sending the message to: J messages K and cc:s L. Assume the best caseonly one collaborator begins the process. Even if everyone responds to messages in a sort of token passing order, there will be at least one extra message with each "generation" of messages. Of course, in any actual situation, collaboration will not follow a strict synchronous turn -taking model, and two people will reply simultaneously to a message. Then the problem of manually incorporating extra messages into the two-way model thread begins to get complicated.

The number of superfluous messages in each generation increases with each additional collaborator. More importantly, the problems of synchronizing responses, manually reordering messages (which are, after all, simply sections of text with a message header), and navigating a long text document without markers and jumping-off points are subject to a sort of combinatorial explosion with additional collaborators. In a very short time, the collaborative message store, which is no more than an undifferentiated text file, will become unmanageable. Using e-mail to track a collaboration involving more than a handful of active participants requires a mass of external rules and a great deal of administrative effort.

Can Database Functions Help?

Databases have the advantage of storing data in a single place (or in a manageable collection of places). There are two familiar and well- understood database models that can be useful for storing textual datafield-oriented databases and freeform text databases. Field-oriented databases can have their components linked together logically, based on common fields (the relational model), or tied together with pointers (the hierarchical and network models). They are most useful for storing and retrieving sizable chunks of text when each of the chunks or messages is tied to a record with other, relatively short, fields, which can be used for indexing or sorting the database. (Sorting a large collection of messages by their titles or subjects, much less by their contents, would rarely be a useful way to present or browse them.) Typical field-oriented databases have limited functions for searching or otherwise manipulating the content of long text fields.

Free-form text databases usually make no attempt to break up a large body of text; they simply index all the nontrivial words in the text and expedite locating those words. Using Boolean operators to search a massive text can be very fruitful. For example, in a series of messages that track the steps of a LAN enhancement project, searching for "(Category or Type) and (3 or 4 or 5 or coaxial)" should find any messages discussing Ethernet cabling specifications. Such searches work well regardless of how skillfully the subject of the message was named and how well the message focused on a single subject.

Without field-oriented functionality, however, an indexed freeform text file is hard to navigate and only amenable to collaborating people with external rules and a substantial amount of external administrationfor example, putting dates on messages and ordering them.

An additional tactic for categorizing pieces of text is assigning keywordswords that characterize the content of the text in some way and can be stored and searched separately. (Keywords can also be useful for keeping track of images and sounds that are not made up of text.) Though very large indexed text databases can be searched fast, the use of keywords can provide even faster performance.

Both database technologies can be useful for tracking threads of messages, but the messages need to be loaded into the database and presented to the collaborators. Establishing threads, notifying collaborators of updates, presenting maps of threads, and many other desirable functions would require custom programming in any of the standard database programs.

The combination of messaging and database functions has converged in several unrelated environments. The first widespread application of threaded conversations was for online bulletin board systems. Usenet newsgroups are threaded conversations. Forums or conferences on the commercial online servicesCompuServe, Prodigy, and GEniealso take this form. The most ambitious and elaborate commercially available product for tracking threaded conversations is Lotus Notes (Lotus Development, Cambridge, MA). Several other products, including Collabra Software's (Mountain View, CA) Collabra Share, Attachmate's (Bellevue, WA) OpenMind, and Trax Software's (Culver City, CA) TeamTalk have also been introduced to provide this function.

Basic Functions Of A Threaded-Message Database

Displays. Entering messages must be straightforward, so a user who opens or logs on to the conversation database ought to be able to see a map of the conversation. A hierarchical display, similar to an outline, is a useful way to display sequences of comments and replies. Preset limits to the number of potential levels are probably undesirable. The ability to collapse some or all of the levels helps keep the overall structure clear.

Remote participation. Collaborators must be able to reach the message store over some kind of network. In addition, a gateway to regular e-mail products can let collaborators who don't have full access contribute.

Thread editing. People make mistakes. It ought to be easy to relocate a misplaced message or delete completely irrelevant ones. Actual messages don't always fit precisely in a threaded hierarchy. For instance, a single message may include two or more topics. It should be easy to duplicate a message and insert the copies at the places where they fit.

Scrolling. For large, ongoing threaded conversations, such as those on commercial online services, the day will arise when the space that the system has available runs out. For example, there may be a fixed number of message units allocated to a forum; when these slots are filled, each additional message bumps off an older one. This process is referred to as scrolling. A whole series of problems can ensue. It would be desirable to have configurable scrolling settings so that factors such as the message's position in a thread, the relative importance of a thread, or the author of a message could be factored into the scrolling behavior.

Once any message in a thread scrolls off, the meaningfulness of subsequent reply and commentary messages is apt to suffer, even if the title of the thread continues to be carried along.

Thread Drift

A problem that fancy software features are not likely to solve soon has to do with the content of messages and the subject that is assigned to name a thread. Participants in online discussions sometimes refer to "thread drift" ironically, when they really mean that someone is drastically changing the subject.

Nevertheless, discussions can imperceptibly veer away from the original topic until the original name of the thread is a completely misleading guide to the content. More generally , there is nothing to stop participants in an unmoderated threaded conversation, or unskilled moderators, from using arbitrary subject names and wrecking the coherence of a conversation. Until the day that computers have the ability to interpret the meaning of sentences and paragraphs, which is nowhere in sight, external rules and human moderators/ censors/editors will be necessary to keep threaded conversations coherent to their users.

This tutorial, number 78, by Steve Steinke, was originally published in the February 1995 issue of LAN Magazine/Network Magazine.