8.4 Bayesian Multiuser Detection via MCMC

In this section we illustrate the application of MCMC signal processing (in particular, the Gibbs sampler) by treating three related problems in multiuser detection under a general Bayesian framework: (1) optimal multiuser detection in the presence of unknown channel parameters, (2) optimal multiuser detection in non-Gaussian ambient noise, and (3) multiuser detection in coded CDMA systems. The methods discussed in this section were first developed in [540]. We begin with a perspective on the related works in these three areas.

The optimal multiuser detection algorithms with known channel parameters, that is, the multiuser maximum- likelihood sequence detector (MLSD) and the multiuser maximum a posteriori probability (MAP) detector, were first investigated in [517, 518] (cf. [520]). When the channel parameters (e.g., the received signal amplitudes and the noise variance) are unknown, it is of interest to study the problem of joint multiuser channel parameter estimation and data detection from the received waveform. This problem was first treated in [375], where a solution based on the expectation-maximization (EM) algorithm is derived. In [460], the problem of sequential multiuser amplitude estimation in the presence of unknown data is studied, and an approach based on stochastic approximation is proposed. In [581], a tree-search algorithm is given for joint data detection and amplitude estimation. Other works concerning multiuser detection with unknown channel parameters include [119, 206, 208, 326, 339, 464]. For systems employing channel coding, the optimal decoding scheme for convolutionally coded CDMA is studied in [143], which is shown to have a prohibitive computational complexity. In [144], some low-complexity receivers that perform multiuser symbol detection and decoding either separately or jointly are studied. The powerful turbo multiuser detection techniques for coded CDMA systems are discussed in Chapter 6. Finally, robust multiuser detection methods in non-Gaussian ambient noise CDMA systems are treated in Chapter 4.

In what follows we present Bayesian multiuser detection techniques with unknown channel parameters in both Gaussian and non-Gaussian ambient noise channels. The Gibbs sampler is employed to calculate the Bayesian estimates of the unknown multiuser symbols from the received waveforms. The Bayesian multiuser detector can naturally be used in conjunction with the MAP channel decoding algorithm to accomplish turbo multiuser detection in unknown channels. Note that although in this section we treat only the simple synchronous CDMA signal model, the techniques discussed here can be generalized to treat more complicated systems, such as intersymbol-interference (ISI) channels [541], asynchronous CDMA with multipath fading [594], nonlinearly modulated CDMA systems [372], multicarrier CDMA systems with space-time coding [591], and systems with Gaussian minimum-shift-keying (GMSK) modulation over multipath fading channels [592].

8.4.1 System Description

As in Chapter 6, we consider a coded discrete-time synchronous real-valued baseband CDMA system with K users, employing normalized modulation waveforms s ₁ , s ₂ , ... s _K , and signaling through a channel with additive white noise. The block diagram of the transmitter end of such a system is shown in Fig. 8.1. The binary information bits { d _k [ n ]} _n for user k are encoded using a channel code (e.g., block code, convolutional code, or turbo code). A code-bit interleaver is used to reduce the influence of the error bursts at the input of the channel decoder. The interleaved code bits are then mapped to BPSK symbols, yielding a symbol stream { b _k [ i ]} _i . Each data symbol is then modulated by a spreading waveform s _k and transmitted through the channel. The received signal is the superposition of the K users' transmitted signals plus the ambient noise, given by

Equation 8.17

Figure 8.1. Coded synchronous CDMA communication system.

graphics/08fig01.gif

In (8.17), M is the number of data symbols per user per frame; A _k , b _k [ i ], and s _k denote, respectively, the amplitude, the i th symbol, and the normalized spreading waveform of the k th user; and n [ i ] = [ n [ i ] n ₁ [ i ] ... n _N-1 [ i ]] ^T is a zero-mean white noise vector. The spreading waveform is of the form

Equation 8.18

where N is the spreading factor. It is assumed that the receiver knows the spreading waveforms of all active users in the system. Define the following a priori symbol probabilities:

Equation 8.19

Note that when no prior information is available, we choose r _k [ i ] = ½ (i.e., all symbols are assumed to be equally likely).

It is further assumed that the additive ambient channel noise { n [ i ]} is a sequence of zero-mean i.i.d. random vectors independent of the symbol sequences { b _k [ i ]} _{i ;k} . Moreover, each noise vector n [ i ] is assumed to consist of i.i.d. samples { n _j [ i ]} _j . Here we consider two types of noise distributions, corresponding to additive Gaussian noise and additive impulsive noise, respectively. For the former case, the noise n _j [ i ] is of course assumed to have a Gaussian distribution:

Equation 8.20

where s ² is the variance of the noise. For the latter case, the noise n _j [ i ] is assumed to have a two- term Gaussian mixture distribution (cf. Chapter 4):

Equation 8.21

with 0 < < 1 and < . Here the term N (0, ) represents the nominal ambient noise, and the term N (0, ) represents an impulsive component, with representing the probability that an impulse occurs. The total noise variance under distribution (8.21) is given by

Equation 8.22

Denote . We consider the problem of estimating the a posteriori probabilities of the transmitted symbols

Equation 8.23

based on the received signals Y and the prior information { r _k [ i ] _{i ; k} , without knowing the channel amplitudes { A _k } and the noise parameters (i.e., s ² for Gaussian noise; , , and for non-Gaussian noise). These a posteriori probabilities are then used by the channel decoder to decode the information bits { d _k [ n ]} _{n ; k} shown in Fig. 8.1, which is discussed in Section 8.4.4.

8.4.2 Bayesian Multiuser Detection in Gaussian Noise

We now consider the problem of computing the a posteriori probabilities in (8.23) under the assumption that the ambient noise distribution is Gaussian; that is, the pdf of n [ i ] in (8.17) is given by

Equation 8.24

Define the following notation:

graphics/458equ01.gif

Then (8.17) can be written as

Equation 8.25

Equation 8.26

We approach this problem using a Bayesian framework: First, the unknown quantities a , s ² , and X are regarded as realizations of random variables with some prior distributions. The Gibbs sampler, a Monte Carlo method, is then employed to calculate the maximum a posteriori probability estimates of these unknowns.

Bayesian Inference

Assume that the unknown quantities a , s ² , and X are independent of each other and have prior densities p ( a ), p ( s ² ), and p ( X ), respectively. Since { n [ i ]} is a sequence of independent Gaussian vectors, using (8.24) and (8.25), the joint posterior density of these unknown quantities ( a , s ² , X ) based on the received signal Y takes the form

Equation 8.27

graphics/08equ027.gif

The a posteriori probabilities (8.23) of the transmitted symbols can then be calculated from the joint posterior distribution (8.27) according to

Equation 8.28

graphics/08equ028.gif

The computation in (8.28) involves 2 ^{KM -1} multidimensional integrals, which is clearly infeasible for any practical implementations with typical values of K and M . To avoid the direct evaluation of the Bayesian estimate (8.28), we resort to the Gibbs sampler discussed in Section 8.3. The basic idea is to generate ergodic random samples { a ^{( n )} , s ² ^{( n )} , X ^{( n )} : n = n , n +1,...} from the posterior distribution (8.27), and then to average { b _k [ i ] ^{( n )} : n = n , n +1,...} drawn from an appropriate conditional distribution to obtain an approximation of the a posteriori probabilities in (8.28).

Prior Distributions

In Bayesian analysis, prior distributions are used to incorporate the prior knowledge about the unknown parameters. When such prior knowledge is limited, the prior distributions should be chosen such that they have a minimal impact on the posterior distribution. Such priors are termed noninformative . The rationale for using noninformative prior distributions is to "let the data speak for themselves ," so that inferences are unaffected by information external to current data [51, 136, 248].

Another consideration in the selection of the prior distributions is to simplify computations . To that end, conjugate priors are generally used to obtain simple analytical forms for the resulting posterior distributions. The property that the posterior distribution belongs to the same distribution family as the prior distribution is called conjugacy . Conjugate families of distributions are mathematically convenient in that the posterior distribution follows a known parametric form [51, 136, 248]. Finally, to make the Gibbs sampler more computationally efficient, the priors should also be chosen such that the conditional posterior distributions are easy to simulate.

Following these general guidelines in Bayesian analysis, we choose the conjugate prior distributions for the unknown parameters p ( a ), p ( s ² ), and p ( X ), as follows.

For the unknown amplitude vector a , a truncated Gaussian prior distribution is assumed,

Equation 8.29

where I _{{ a > }} is an indicator which is 1 if all elements of a are positive and is zero otherwise . Note that large values of S correspond to less informative priors. For the noise variances s ² , an inverse chi-square prior distribution is assumed,

Equation 8.30

graphics/08equ030.gif

Equation 8.31

Small values of v correspond to the less informative priors ( roughly , the prior knowledge is worth v data points). The value of v l reflects the prior belief of the value of s ² . Finally, since the symbols { b _k [ i ]} _{i ;k} are assumed to be independent, the prior distribution p ( X ) can be expressed in terms of the prior symbol probabilities defined in (8.19) as

Equation 8.32

where

Equation 8.33

Conditional Posterior Distributions

The following conditional posterior distributions are required by the Gibbs multiuser detector in Gaussian noise. The derivations are given in the Appendix (Section 8.7.1).

The conditional distribution of the amplitude vector a given s ² , X , and Y is given by

Equation 8.34

with

Equation 8.35

Equation 8.36

where, in (8.35), we have used as usual to denote the cross-correlation matrix of the signaling set.
The conditional distribution of the noise variance s ² given a, X , and Y is given by

Equation 8.37

or

Equation 8.38

with

Equation 8.39
The conditional probabilities of b _k [i] = ±1, given a , s ² , X _ki , and Y can be obtained from (where )

Equation 8.40

where

Gibbs Multiuser Detector in Gaussian Noise

Using the conditional posterior distributions above, the Gibbs sampling implementation of the Bayesian multiuser detector in Gaussian noise proceeds iteratively as follows.

Algorithm 8.5: [Gibbs multiuser detector in Gaussian noise] Given initial values of the unknown quantities drawn from their prior distributions, proceed as follows. For n = 1,2,...:

Draw a ⁽ⁿ⁾ from p ( a s ^{2( n -1)} , X ^{( n -1)} , Y ) given by (8.34) .
Draw s ^{2( n )} from p ( s ² a ^{( n )} , X ^{( n -1)} , Y ) given by (8.38) .
For i = 0, 1, ..., M “1

For k = 1, 2, ..., K

Draw from P( ) given by (8.40) ,

where

Note that to draw samples of a from (8.29), or (8.34), the rejection method [527] can be used. For instance, after a sample is drawn from N ( a , S ) or N ( a _* , S _* ), check to see if the constraint A _k > 0, k = 1, ..., K , is satisfied; if not, the sample is rejected and a new sample is drawn from the same distribution. The procedure continues until a sample is obtained that satisfies the constraint.

To ensure convergence, the procedure above is usually carried out for ( n + N ) iterations for suitably chosen n and N , and samples from the last N iterations are used to calculate the Bayesian estimates of the unknown quantities. In particular, the a posteriori symbol probabilities in (8.28) are approximated as

Equation 8.41

where

Equation 8.42

A MAP decision on the symbol b _k [ i ] is then given by

Equation 8.43

Furthermore, if desired, estimates of the amplitude vector a and the noise variance s ² can also be obtained from the corresponding sample means:

Equation 8.44

Equation 8.45

The posterior variances of a and s ² , which reflect the uncertainty in estimating these quantities on the basis of Y , can also be approximated by the sample variances as

Equation 8.46

graphics/08equ046.gif

Equation 8.47

graphics/08equ047.gif

Note that the computations above are exact in the limit as N . However, since they involve only a finite number of samples, we think of them as approximations but realize that in theory any order of precision can be achieved given a sufficiently large sample size N .

The complexity of the Gibbs multiuser detector above per iteration is ( K ² + KM ); that is, it has a term that is quadratic with respect to the number of users K , due to the inversion of the positive-definite symmetric matrix in (8.35), and a term that is linear with respect to the symbol block size M . The total complexity is then [ K ² + KM ) ( n + N )]. For practical values of K and M , this is a substantial complexity reduction compared with the direct implementation of the Bayesian symbol estimate (8.28), whose complexity is (2 ^KM ).

Simulation Examples

We consider a five-user ( K = 5) synchronous CDMA channel with processing gain N = 10. The user spreading waveform matrix S and the corresponding correlation matrix R are given, respectively, by

graphics/463equ01.gif

The following noninformative conjugate prior distributions are used in the Gibbs sampler for the case of Gaussian noise:

graphics/463equ02.gif

Note that the performance of the Gibbs sampler is insensitive to the values of the parameters in these priors as long as the priors are noninformative.

We illustrate the convergence behavior of the Bayesian multiuser detector in Gaussian noise. In this example, the user amplitudes and the noise variance are taken to be , , , , , and s ² = -2 dB. The data block size of each user is M = 256. In Fig. 8.2 we plot the first 100 samples drawn by the Gibbs sampler of the parameters b ₃ [50], b ₄ [100], A ₁ , A ₅ , and s ² . The corresponding true values of these quantities are shown in the same figure as horizontal lines. Note that in this case, the number of unknown parameters is K + KM + 1 = 1286 (i.e., a , X , and s ² ). Remarkably, it is seen that the Gibbs sampler reaches convergence within about 20 iterations. The marginal posterior distributions of the unknown parameters A ₁ , A ₅ , and s ² in the steady state can be illustrated by the corresponding histograms, which are also shown in Fig. 8.2. The histograms are based on 500 samples collected after the initial 50 iterations.

Figure 8.2. Samples and histograms: Gaussian noise. , , , , , and s ² = -2 dB. The histograms are based on 500 samples collected after the initial 50 iterations.

graphics/08fig02.gif

8.4.3 Bayesian Multiuser Detection in Impulsive Noise

We next discuss Bayesian multiuser detection via the Gibbs sampler in non-Gaussian impulsive noise. As discussed above, it is assumed that the noise samples { n _j [ i ]} _j of n [ i ] in (8.17) are independent with a common two-term Gaussian mixture pdf given by

Equation 8.48

with 0 < < 1 and .