Note that in the prior examples, the current state of the chain was a function solely of the prior state of the chain with a defined probability. This is known as a bigram (a sequence of two words). If the current state were not a function of the prior state, the selection of state would simply be a random process. If the state were dependent upon the prior two states, it would be called a trigram. While increasing the dependence on the prior states (or context) can increase the usability of the chain (as we'll see in the applications section), the memory requirements on such models can be inhibitive [Henke 1999]. For example, consider Table 10.1 that shows the number of elements necessary for a vocabulary of 100 words.
n | Type | Number of Elements |
---|---|---|
| ||
2 | bigram | 10,000 |
3 | trigram | 1,000,000 |
4 | 4-gram | 100,000,000 |
A corpus of 100 unique words is quite small, therefore creating anything other than bigrams over a corpus can be quite expensive.