Recognizer Architecture | Building Tablet PC Applications (Pro-Developer)

Recognizer Architecture

The Tablet PC Platform provides support for ink recognition using software libraries named recognizers code that computes the textual or object representation of ink strokes for one or more languages. A language is any set of words or objects that is represented by writing. English, Chinese, German, and ink-based gestures are all considered languages. Each recognizer therefore includes a property denoting the languages for which it is capable of interpreting ink strokes.

Multiple recognizers can be installed and used in the system, with each recognizer residing in a DLL. It is perfectly valid for these DLLs to contain multiple recognizers. A recognizer supports one or more languages, although most often a recognizer will support only one language because the accuracy of results tends to decrease when recognizing multiple languages concurrently. More variability corresponds to a higher margin of error.

It is possible for you to write your own recognizer, but this is a somewhat specialized and usually rather difficult task. The Tablet PC Platform SDK documentation has more information on writing custom recognizers. This chapter covers the use of already existing recognizers.

With the ability for multiple recognizers to be installed in the system comes the default recognizer: this identifies the recognizer most suited to interpret ink given a specific LCID (locale identifier), usually obtained from the operating system s locale setting.

Text vs. Object Recognition

Most recognizers available for the Tablet PC Platform perform ink recognition into text for languages. However, ink strokes can also be recognized as objects, representing things such as application commands, musical notes, Web site structure, and mathematical formulae. On the Tablet PC Platform, the most common type of object recognition is for application gestures, in which one or more ink strokes maps to a specific application command. This type of gesture contrasts to the system gestures we learned about in Chapter 4 in that application gestures are ink-based and trigger commands as a menu item or a toolbar button would. System gestures are not ink-based, and they typically translate into direct manipulation actions such as selection, moving, and resizing.

The Tablet PC Platform has great support for application gesture recognition built into the InkCollector and InkOverlay classes, as we ll soon see.

Synchronous vs. Asynchronous Recognition

Performing ink recognition functionality is often a computationally intensive task. One of the main reasons high accuracy of results wasn t achievable until recently was because the computing power required to yield great recognition results just wasn t available on a wide scale. Even today, the amount of CPU power needed to recognize ink can be great enough to require the Tablet PC Platform to supply two usage models for performing recognition: synchronous and asynchronous.

Synchronous recognition occurs when the thread requesting recognition results blocks until computation is complete. Asynchronous recognition occurs when the thread requesting recognition results is allowed to continue immediately following the request and is later notified that computation is complete from an event.

Reco Terminology

Synchronous and asynchronous recognition are more commonly referred to as foreground recognition and background recognition, respectively. Recognition is often shortened to reco (pronounced reh-k h). For the most part, we ll be using the terms foreground reco and background reco throughout the rest of the chapter.

Partial Recognition

Some recognizers possess a capability known as partial recognition, which refers to the occurrence of a recognition computation happening incrementally and on an alternative thread from the one working with the recognizer. Consider partial recognition a proactive approach a recognizer takes to the computation it begins recognition as soon as any ink is given to it and incrementally adjusts the computation as ink is added or removed or recognition properties are changed. This way, when results are ultimately requested either synchronously or asynchronously, the recognition computation is already in progress or complete, resulting in much more timely results than if the computation were started at the time of the request.

When to Use Foreground and Background Recognition

Foreground reco is more commonly used in ink-enabled applications since it is easier to code and generally meets the requirements most applications have for recognition. Because most recognizers that ship with the Tablet PC Platform support partial recognition, you can take steps that will result in little performance difference between foreground and background reco. This prompts the question, Why use background reco if it s harder to code?

Background reco is primarily useful for implementing timely application response to input for example, if a user pastes a large number of strokes into his document, initiating synchronous recognition at that moment could block the application for many seconds, giving the appearance of a crash or a hang. Using asynchronous recognition yields recognition results without the computation getting in the user s way. Additionally, background recognition results are reported as soon as they re calculated; if your application makes use of the results (for example, the user interface shows the textual form of ink in tiny letters beneath the strokes) in a timely manner, the task becomes much easier to implement than if foreground reco were employed.

Recognition Results

The results a recognizer returns comprise much more than just a text string. Information such as the recognizer s confidence level (the level of accuracy the software thinks it achieved in its computation), alternative results (in case the recognizer thinks more than one result applies), and association of strokes with the text in the string can all be extremely useful in the implementation of certain features. The most common use of recognition results, other than obtaining the recognized text string, is to provide the user with a UI capable of correcting any accuracy problems. This UI is referred to as a correction UI. The Tablet PC Platform exposes all the information required to build a correction UI while keeping the API simple and straightforward.

When ink is converted into words, the recognizer might be unsure of word breaking, otherwise known as the segmentation of ink, as well as the results for individual words. Consider this classic recognition example: a user writes the letters t, o, g, e, t, h, e, and r. They could mean one of many results, such as the word together or the phrases to gather, tog ether, or to get her. Each word in those segmentation alternates can itself have a number of alternate results. It is important to realize that recognition results encompass not only alternates for words but also alternates for their segmentation.

Many East Asian languages, such as Chinese, Japanese, and Korean, are based on the notion of a word being formed out of a set of discrete symbols or word segments. The possible combinations of all symbols or segments forming words are usually astronomical in size, so it is impractical to consider providing alternates for an entire word. It makes more sense to provide alternative results at the segment level. It is therefore not entirely correct to talk about a recognizer converting ink into words because recognizers actually convert ink into segments. For languages such as English and French, the boundary for segmentation happens to be the boundary between whole words.

Now that we ve learned a little bit about the data a recognizer can return, let s jump into the Tablet PC Platform s support for using its great recognition functionality.