8.5 Sequential Monte Carlo Signal Processing


8.5.1 Sequential Importance Sampling

Importance sampling is one of the most well-known and elementary Monte Carlo techniques. Suppose that we want to make an inference about some random quantity X ~ p ( X ) using the Monte Carlo method. Sometimes drawing samples directly from p ( X ) is difficult, but it may be easier (or otherwise advantageous) to draw samples from a trial density, say, q ( X ). Note that the desired inference can be written as

Equation 8.70

graphics/08equ070.gif


Equation 8.71

graphics/08equ071.gif


where

Equation 8.72

graphics/08equ072.gif


is called the importance weight . In importance sampling, we draw samples X (1) , X (2) , ..., X ( n ) according to the trial distribution q ( ·). We then approximate the inference (8.71) by

Equation 8.73

graphics/08equ073.gif


This technique is widely used, for example, for reducing the sample-size requirements in BER estimation.

However, it is usually difficult to design a good trial density function in high-dimensional problems. One of the most useful strategies in these problems is to build up the trial density sequentially. Suppose that we can decompose X as X = ( x 1 , ..., x d ), where each of the x j may be either a scalar or a vector. Then our trial density can be constructed as

Equation 8.74

graphics/08equ074.gif


by which we hope to obtain some guidance from the target density while building up the trial density. Corresponding to the decomposition of X , we can rewrite the target density as

Equation 8.75

graphics/08equ075.gif


and the importance weight as

Equation 8.76

graphics/08equ076.gif


Equation (8.76) suggests a recursive way of computing and monitoring the importance weight. That is, by denoting X t = x 1 , ..., ( x t ) (thus, X d X ), we have

Equation 8.77

graphics/08equ077.gif


Then w d ( X d ) is equal to w ( X ) in (8.76). Potential advantages of this recursion and (8.75) are that we can stop generating further components of X if the partial weight derived from the sequentially generated partial sample is too small, and that we can take advantage of p t ( x t X t -1) in designing q t ( x t X t -1). In other words, the marginal distribution p ( X t ) can be used to guide the generation of X .

Although the idea above sounds interesting, the trouble is that the decomposition of p ( X ) as in (8.75) and that of w ( X ) as in (8.76) are not practical at all! The reason is that in order to get (8.75), one needs to have the marginal distribution

Equation 8.78

graphics/08equ078.gif


whose computation involves integrating out components x t +1, ... x d in p ( X ) and is as difficult as ”or even more difficult than ”the original problem.

To carry out the sequential sampling idea, we need to introduce another layer of complexity. Suppose that we can find a sequence of auxiliary distributions ,

graphics/481equ01.gif

so that p t ( X t ) is a reasonable approximation to the marginal distribution p t ( X t ) for t = 1,..., d “ 1 and p d ( X ) = p ( X ). We emphasize that { p t ( X t )} are required to be known only up to a normalizing constant, and they serve only as "guides" to our construction of the entire sample X = ( x 1 ,..., x d ). The sequential importance sampling (SIS) method can then be defined as noted in the following recursive procedure.

Algorithm 8.7: [Sequential importance sampling (SIS)] For t = 2,..., d :

  • Draw x t from q t ( x t X t -1) and let X t = ( X t -1 , x t ).

  • Compute

Equation 8.79

graphics/08equ079.gif


and let w t = w t -1 u t .

In the SIS step, we call u t an incremental weight . It is easy to show that X t is properly weighted by w t with respect to p t provided that X t -1 is properly weighted by w t -1 with respect to p t -1. Thus, the entire sample X obtained in this sequential fashion is properly weighted by the final importance weight, w d , with respect to the target density p( X ) . One reason for the sequential buildup of the trial density is that it breaks a difficult task into manageable pieces. The SIS framework is particularly attractive, as it can use the auxiliary distributions p 1 , p 2 ,..., p d to help construct more efficient trial distributions:

  • We can build q t in light of p t . For example, one can choose (if possible)

Equation 8.80

graphics/08equ080.gif


Then the incremental weight becomes

Equation 8.81

graphics/08equ081.gif


  • When we observe that w t is getting too small, we can choose to reject the sample halfway and restart. In this way we avoid wasting time on generating samples that are doomed to have little effect in the final estimation. However, as an outright rejection incurs bias, the rejection control techniques can be used to correct such bias [279].

  • Another problem with SIS is that the resulting importance weights are still very skewed, especially when d is large. An important recent advance in sequential Monte Carlo to address this problem is the resampling technique [160, 276].

8.5.2 SMC for Dynamical Systems

Consider the following dynamical system modeled in state-space form as

Equation 8.82

graphics/08equ082.gif


where z t , y t , u t , and v t are, respectively, the state variable, the observation, the state noise, and the observation noise at time t . These quantities can be either scalars or vectors.

Let Z t = ( z , z 1 ,..., z t ) and let Y t = ( y , y 1 ,..., y t ). Suppose an online inference of Z t is of interest; that is, at current time t we wish to make a timely estimate of a function of the state variable Z t , say h ( Z t ), based on the currently available observation, Y t . From Bayes' formula, we realize that the optimal solution to this problem is

Equation 8.83

graphics/08equ083.gif


In most cases an exact evaluation of this expectation is analytically intractable because of the complexity of such a dynamical system. Monte Carlo methods provide us with a viable alternative to the required computation. Specifically, if we can draw m random samples graphics/482fig01.gif from the distribution p ( Z t Y t ), we can approximate E { h ( Z t ) Y t } by

Equation 8.84

graphics/08equ084.gif


Very often, direct simulation from p ( Z t Y t ) is not feasible , but drawing samples from some trial distribution is easy. In this case we can use the idea of importance sampling discussed above. Suppose that a set of random samples graphics/482fig01.gif is generated from the trial distribution q ( Z t Y t ). By associating the weight

Equation 8.85

graphics/08equ085.gif


with the sample graphics/482fig02.gif , we can approximate the quantity of interest, E { h ( Z t ) Y t }, as

Equation 8.86

graphics/08equ086.gif


with

Equation 8.87

graphics/08equ087.gif


The pair ( graphics/483fig01.gif ), j = 1, ..., m , is called a properly weighted sample with respect to distribution p ( Z t / Y t ). A trivial but important observation is that the graphics/483fig02.gif (one of the components of graphics/483fig03.gif is also properly weighted by the graphics/483fig04.gif with respect to the marginal distribution p ( z t / Y t ).

Another possible estimate of E { h ( Z t ) Y t } is

Equation 8.88

graphics/08equ088.gif


The main reasons for preferring the ratio estimate (8.86) to the unbiased estimate (8.88) in an importance sampling framework are that the estimate (8.86) usually has a smaller mean-square error than that of (8.88), and that the normalizing constants of both the trial and target distributions are not required in using (8.86) (where these constants are canceled out); in such cases the weights { graphics/483fig04.gif } are evaluated only up to a multiplicative constant. For example, the target distribution p ( Z t Y t ) in a typical dynamical system (and many Bayesian models) can be evaluated easily up to a normalizing constant (e.g., the likelihood multiplied by a prior distribution), whereas sampling from the distribution directly and evaluating the normalizing constant analytically are impossible .

To implement Monte Carlo techniques for a dynamical system, a set of random samples properly weighted with respect to p ( Z t Y t ) is needed for any time t . Because the state equation in system (8.82) possesses a Markovian structure, we can implement a recursive importance sampling strategy, which is the basis of all sequential Monte Carlo techniques [277]. Suppose that a set of properly weighted samples graphics/483fig05.gif with respect to p ( Z t “1 Y t “1 ) is given at time ( t “1).

A sequential Monte Carlo filter generates from the set a new one, graphics/483fig06.gif , which is properly weighted at time t with respect to p ( Z t Y t ). The algorithm is described as follows .

Algorithm 8.8: [Sequential Monte Carlo filter for dynamical systems] For j = 1,... , m :

  • Draw a sample graphics/483fig02.gif from a trial distribution q ( graphics/483fig07.gif ) and let graphics/483fig08.gif .

  • Compute the importance weight :

Equation 8.89

graphics/08equ089.gif


The algorithm is initialized by drawing a set of i.i.d. samples graphics/483fig10.gif from p ( z / y ). When y represents the "null" information, p ( z y ) corresponds to the prior distribution of z . The samples and weights drawn by the algorithm above are characterized by the following result, the proof of which is given in the Appendix (Section 8.7.3).

Proposition 8.1: The weighted samples generated by Algorithm 8.8 satisfy

Equation 8.90

graphics/08equ090.gif


Equation 8.91

graphics/08equ091.gif


The result above, together with the law of large numbers , implies that

Equation 8.92

graphics/08equ092.gif


There are a few important issues regarding the design and implementation of a sequential Monte Carlo filter, such as the choice of the trial distribution q ( ·) and the use of resampling (see Section 8.5.3). Specifically, a useful choice of the trial distribution graphics/484fig01.gif , for the state-space model (8.82) is of the form

Equation 8.93

graphics/08equ093.gif


where in (8.93) we used the facts that

Equation 8.94

graphics/08equ094.gif


Equation 8.95

graphics/08equ095.gif


both of which follow directly from the state-space model (8.82). For this trial distribution, the importance weight is updated according to

Equation 8.96

graphics/08equ096.gif


Equation 8.97

graphics/08equ097.gif


where (8.96) follows from the fact that

Equation 8.98

graphics/08equ098.gif


and the last equality is due to the conditional independence property of the statespace model (8.82). See [277] for the general SMC framework and a detailed discussion on various implementation issues. SMC techniques have been employed to tackle a number of problems in wireless communications and networks, including blind detection in fading channels [76], blind detection in OFDM systems [593], and joint mobility tracking and hand-off detection in cellular networks [595].

8.5.3 Resampling Procedures

The importance sampling weight graphics/485fig01.gif measures the quality of the corresponding imputed sequence graphics/485fig02.gif . A relatively small weight implies that the sample is drawn far from the main body of the posterior distribution and has a small contribution in the final estimation. Such a sample is said to be ineffective . If there are too many ineffective samples, the Monte Carlo procedure becomes inefficient. This can be detected by observing a large coefficient of variation in the importance weight. Suppose that graphics/485fig03.gif is a sequence of importance weights. Then the coefficient of variation, n t , is defined as

Equation 8.99

graphics/08equ099.gif


with

Equation 8.100

graphics/08equ100.gif


Note that if the samples are drawn exactly from the target distribution, all the weights are equal, implying that v t = 0. A large coefficient of variation in the importance weights indicates ineffective samples. It is shown in [232] that the importance weights resulting from an SMC filter form a martingale sequence. As more and more data are processed , the coefficient of variation of the weights increases ”that is, the number of ineffective samples increases ” rapidly .

A useful method for reducing ineffective samples and enhancing effective ones is resampling , which was suggested in [160, 276] under the SMC setting. Roughly speaking, resampling allows those "bad" samples (with small importance weights) to be discarded and those "good" ones (with large importance weights) to replicate so as to accommodate the dynamic change of the system. Specifically, let graphics/486fig01.gif be the original properly weighted samples at time t . A residual resampling strategy forms a new set of weighted samples graphics/486fig02.gif according to the following algorithm (assume that graphics/486fig03.gif ).

Algorithm 8.9: [Residual resampling]

  • For j = 1,..., m, retain graphics/486fig04.gif copies of the sample ( graphics/486fig05.gif ). Denote graphics/486fig06.gif .

  • Obtain K r i.i.d. draws from the original sample set graphics/486fig07.gif , with probabilities proportional to ( graphics/486fig08.gif ), j = 1,..., m .

  • Assign equal weight (i.e., set graphics/486fig09.gif ) to each new sample .

The correctness of the residual resampling procedure above is stated by the following result, whose proof is given in the Appendix (Section 8.7.4).

Proposition 8.2: The samples drawn by the residual resampling procedure (Algorithm 8.9) are properly weighted with respect to p ( Z t Y t ) for m

Alternatively, we can use the following simple resampling procedure, which also produces properly weighted samples with respect to p ( Z t Y t ):

  • For j = 1,..., m , draw i.i.d. random numbers l j from the set {1,2,..., m }, with probability distribution

Equation 8.101

graphics/08equ101.gif


  • The m new samples and the corresponding weights graphics/486fig10.gif are given by

Equation 8.102

graphics/08equ102.gif


In practice, when small to modest m is used (we used m = 50 in our simulations), the resampling procedure can be seen as trading off between bias and variance. That is, the new samples with their weights resulting from the resampling procedure are only approximately proper, which introduces small bias in Monte Carlo estimation. On the other hand, however, resampling greatly reduces Monte Carlo variance for the future samples.

Resampling can be done at any time. However, resampling too often adds computational burden and decreases "diversities" of the Monte Carlo filter (i.e., it decreases the number of distinctive filters and loses information). On the other hand, resampling too rarely may result in a loss of efficiency. It is thus desirable to give guidance on when to do resampling. A measure of the efficiency of an importance sampling scheme is the effective sample size graphics/487fig01.gif , defined as

Equation 8.103

graphics/08equ103.gif


Heuristically, graphics/487fig01.gif reflects the equivalent size of a set of i.i.d. samples for the set of m weighted ones. It is suggested in [277] that resampling should be performed when the effective sample size becomes small (e.g., graphics/487fig02.gif ). Alternatively, one can conduct resampling at every fixed-length time interval (say, every five steps).

8.5.4 Mixture Kalman Filter

Many dynamical system models belong to the class of conditional dynamical linear models (CDLMs) of the form

Equation 8.104

graphics/08equ104.gif


where u t ~ N c ( 0,I ), v t ~ N c ( 0,I ), and l t is a random indicator variable. The matrices F l t , G l t , H l t , and K l t are known, given l t . In this model, the state variable z t corresponds to ( x t , l t ).

We observe that for a given trajectory of the indicator l t in a CDLM, the system is both linear and Gaussian, for which the Kalman filter provides the complete statistical characterization of the system dynamics. Recently, a novel SMC method, the mixture Kalman filter (MKF), was developed in [75] for online filtering and prediction of CDLMs; it exploits the conditional Gaussian property and uses a marginalization operation to improve the algorithmic efficiency. Instead of dealing with both x t and l t , the MKF draws Monte Carlo samples only in the indicator space and uses a mixture of Gaussian distributions to approximate the target distribution. Compared with the generic SMC method, the MKF is substantially more efficient (e.g., it gives more accurate results with the same computing resources). However, the MKF often needs more computational power for its proper implementation, as the formulas required are more complicated. Additionally, the MKF requires the CDLM structure, which is not applicable to all problems of interest.

Let Y t = ( y , y 1 ,..., y t ) and let L t = ( l , l 1 ,..., l t ). By recursively generating a set of properly weighted random samples graphics/487fig03.gif to represent p ( L t Y t ), the MKF approximates the target distribution p( x t / Y t ) by a random mixture of Gaussian distributions

Equation 8.105

graphics/08equ105.gif


where graphics/488fig01.gif is obtained by implementing a Kalman filter for the given indicator trajectory graphics/488fig02.gif . Thus, a key step in the MKF is the production at time t of the weighted samples of indicators, graphics/488fig03.gif , based on the set of samples, graphics/488fig04.gif , at the previous time ( t - 1) according to the following algorithm.

Algorithm 8.10: [Mixture Kalman filter for conditional dynamical linear models] For j = 1,... m :

  • Draw a sample graphics/488fig05.gif from a trial distribution q ( graphics/488fig06.gif ).

  • Run a one-step Kalman filter based on graphics/488fig05.gif , graphics/488fig07.gif , and y t to obtain graphics/488fig08.gif .

  • Compute the weight :

Equation 8.106

graphics/08equ106.gif


The MKF can be extended to handle the partial CDLM , where the state variable has a linear component and a nonlinear component. See [75] for a detailed treatment of the MKF and the extended MKF.



Wireless Communication Systems
Wireless Communication Systems: Advanced Techniques for Signal Reception (paperback)
ISBN: 0137020805
EAN: 2147483647
Year: 2003
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net