HMM Approximations


Note that in the prior examples, the current state of the chain was a function solely of the prior state of the chain with a defined probability. This is known as a bigram (a sequence of two words). If the current state were not a function of the prior state, the selection of state would simply be a random process. If the state were dependent upon the prior two states, it would be called a trigram. While increasing the dependence on the prior states (or context) can increase the usability of the chain (as we'll see in the applications section), the memory requirements on such models can be inhibitive [Henke 1999]. For example, consider Table 10.1 that shows the number of elements necessary for a vocabulary of 100 words.

 
Table 10.1: Context Size and the Number of Unique Elements.

n

Type

Number of Elements


2

bigram

10,000

3

trigram

1,000,000

4

4-gram

100,000,000

A corpus of 100 unique words is quite small, therefore creating anything other than bigrams over a corpus can be quite expensive.




Visual Basic Developer
Visual Basic Developers Guide to ASP and IIS: Build Powerful Server-Side Web Applications with Visual Basic. (Visual Basic Developers Guides)
ISBN: 0782125573
EAN: 2147483647
Year: 1999
Pages: 175

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net