2.3 Blind Multiuser Detection: Direct Methods | Wireless Communication Systems: Advanced Techniques for Signal Reception (paperback)

It is seen from (2.13) and (2.20) that these two linear detectors are expressed in terms of a linear combination of the signature sequences of all K users. Recall that for the matched-filter receiver, the only prior knowledge required is the desired user 's signature sequence. In the downlink of a CDMA system, the mobile receiver typically has knowledge of only its own signature sequence, but not of those of the other users. Hence it is of interest to consider the problem of blind implementation of the linear detectors (i.e., without the requirement of knowing the signature sequences of the interfering users). This problem is relatively easy for the linear MMSE detector. To see this, consider again the definition (2.19). Directly solving this optimization problem, we obtain

Equation 2.26

graphics/02equ026.gif

where, by (2.5),

Equation 2.27

is the autocorrelation matrix of the received signal. Note that C _r can be estimated from the received signals by the corresponding sample autocorrelation. Note also that the constant A ₁ ² in (2.26) does not affect the linear decision rule (2.7) or (2.9). Hence (2.26) leads straightforwardly to the following blind implementation of the linear MMSE detector ”the direct matrix inversion (DMI) blind detector . Here we do not assume knowledge of the complex amplitude of the desired user, so differential detection will be employed.

Algorithm 2.1: [DMI blind linear MMSE detector ”synchronous CDMA]

Compute the detector:

Equation 2.28

Equation 2.29

Perform differential detection:

Equation 2.30

Equation 2.31

Algorithm 2.1 is a batch processing method; it computes the detector only once based on a block of received signals ; and the estimated detector is then used to detect all data bits of the desired user contained in the same signal block, . In what follows we consider online implementations of the blind linear MMSE detector. The idea is to perform sequential detector estimation and data detection. That is, suppose that at time ( i - 1) an estimated detector m ₁ [ i - 1] is employed to detect the data bit b ₁ [ i - 1]. At time i a new signal r [ i ] is received which is then used to update the detector estimate to obtain m ₁ [ i ]. The updated detector is used to detect the data bit b ₁ [ i ]. Hence the blind detector is sequentially updated at the symbol rate. To develop such an adaptive algorithm, we need an alternative characterization of the linear MMSE detector. Consider the following constrained optimization problem:

Equation 2.32

To solve (2.32), define the Lagrangian

Equation 2.33

The solution to (2.32) is then obtained by solving

Equation 2.34

where l is such that [i.e., ]. Comparing the solution above with (2.26), it is seen that they differ only by a positive scaling constant. Since such a scaling constant will not affect the linear decision rule (2.7) or (2.9), (2.32) constitutes an equivalent definition of the linear MMSE detector. The approach to multiuser detection based on (2.32) was proposed in [183] and was termed minimum-output-energy (MOE) detection . A similar technique was developed for array processing [125, 176, 511], and in that context is termed the linearly constrained minimum variance (LCMV) array .

We next consider adaptive algorithms for recursively (online) estimating the linear MMSE detector defined by (2.32).

2.3.1 LMS Algorithm

We first consider the least-mean-squares (LMS) algorithm for recursive estimation of m ₁ based on (2.32). Define

Equation 2.35

as a projection matrix that projects any signal in onto the orthogonal space of s ₁ . Note that m ₁ can be decomposed into two orthogonal components :

Equation 2.36

with

Equation 2.37

Using the decomposition above, the constrained optimization problem (2.32) can then be converted to the following unconstrained optimization problem:

Equation 2.38

The LMS algorithm for adapting the vector x ₁ based on the cost function (2.38) is then given by

Equation 2.39

where m is the step size and where the stochastic gradient g ( x ₁ [ i ]) is given by

Equation 2.40

graphics/02equ040.gif

Substituting (2.40) into (2.39), we obtain the following LMS implementation of the blind linear MMSE detector. Suppose that at time i , the estimated blind detector is m ₁ [ i ]= s ₁ + x ₁ [ i ]. The algorithm performs the following steps for data detection and detector update.

Algorithm 2.2: [LMS blind linear MMSE detector ”synchronous CDMA]

Compute the detector output:

Equation 2.41

Equation 2.42

Update the detector:

Equation 2.43

The convergence analysis of Algorithm 2.2 is given in [183]. An alternative stochastic gradient algorithm for blind adaptive multiuser detection is developed in [237], which employs the technique of averaging to achieve an accelerated convergence rate (compared with the LMS algorithm). An LMS algorithm for blind adaptive implementation of the linear decorrelating detector is developed in [501]. Moreover, a comparison of the steady-state performance (in terms of output mean-square error) shows that the blind detector incurs a loss compared with a training-based LMS detector [183, 389, 390]. A two-stage adaptive detector is proposed in [63], where symbol-by-symbol predecisions at the output of a first adaptive stage are used to train a second stage, to achieve improved performance.

2.3.2 RLS Algorithm

The LMS algorithm discussed above has a very low computational complexity, on the order of operations per update. However, its convergence is usually very slow. We next consider the recursive least-squares (RLS) algorithm for adaptive implementation of the blind linear MMSE detector, which has a much faster convergence rate than the LMS algorithm. Based on the cost function (2.32), at time i the exponentially windowed RLS algorithm selects the weight vector m ₁ [ i ] to minimize the sum of exponentially weighted mean-square output values:

Equation 2.44

where 0 < l < 1 (1 - l << 1) is called the forgetting factor . The solution to this optimization problem is given by

Equation 2.45

with

Equation 2.46

Denote . Note that since

Equation 2.47

by the matrix inversion lemma we have

Equation 2.48

Hence we obtain the RLS algorithm for adaptive implementation of the blind linear MMSE detector as follows. Suppose that at time ( i - 1), F [ i - 1] is available. Then at time i , the following steps are performed to update the detector m ₁ [ i ] and to detect the differential bit b ₁ [ i ].

Algorithm 2.3: [RLS blind linear MMSE detector ”synchronous CDMA]

Update the detector:

Equation 2.49

Equation 2.50

Equation 2.51

Compute the detector output:

Equation 2.52

Equation 2.53

The convergence properties of Algorithm 2.3 are analyzed in detail in [389].

2.3.3 QR-RLS Algorithm

The RLS approach discussed in Section 2.3.2, which is based on the matrix inversion lemma for recursively updating C _r [ i ] ^-1 , has complexity per update. Note that although fast RLS algorithms of complexity exist [66, 83, 116, 124], all these algorithms exploit a time-index-shifting property of the input data. In this particular application, however, successive input data vectors do not have the shifting relationship; in fact, r [ i ] and r [ i - 1] do not overlap at all. Therefore, these standard fast RLS algorithms cannot be applied in this application.

The RLS implementation of the blind linear MMSE detector suffers from two major problems. The first problem is numerical. Recursive estimation of C _r [ i ] ^-1 is poorly conditioned because it involves inversion of a data correlation matrix. The condition number of a data correlation matrix is the square of the condition number of the corresponding data matrix; hence twice the dynamic range is required in the numerical computation [158]. The second problem is that the form of the recursive update of C _r [ i ] ^-1 severely limits the parallelism and pipelining that can effectively be applied in implementation.

A well-known approach for overcoming these difficulties associated with the RLS algorithms is the rotation-based QR-RLS algorithm [175, 389, 588]. The QR decomposition transforms the original RLS problem into a problem that uses only transformed data values, by Cholesky factorization of the original least-squares data matrix. This causes the numerical dynamic range of the transformed computational problem to be halved and enables more accurate computation than that with the RLS algorithms that operate directly on C _r [ i ] ^-1 . Another important benefit of the rotation-based QR approaches is that the computation can easily be mapped onto systolic array structures for parallel implementations. We next describe the QR-RLS blind linear MMSE detector, first developed in [389].

QR-RLS Blind Linear MMSE Detector

Assume that C _r [ i ] is positive definite. Let

Equation 2.54

be the Cholesky decomposition (i.e., C [ i ] is the unique upper triangular Cholesky factor with positive diagonal elements). Define the following quantities :

Equation 2.55

Equation 2.56

Equation 2.57

At time i , the a posteriori least-squares (LS) estimate is given by

Equation 2.58

Equation 2.59

The a priori LS estimate at time i is given by

Equation 2.60

It can be shown that x [ i ] and z [ i ] are related by [389]

Equation 2.61

Suppose that C [ i “ 1] and u [ i “ 1] are available from the previous recursion. At time i the new observation r [ i ] becomes available. We construct a block matrix consisting of C [ i “ 1], u [ i “ 1], and r [ i ] and apply an orthogonal transformation as follows:

Equation 2.62

In (2.62) the matrix Q [i] which zeros the first N elements on the last row of the partitioned matrix appearing on the left-hand side of (2.62), is an orthogonal matrix consisting of N Givens rotations ,

Equation 2.63

where Q _n [ i ] zeros the n th element in the last row by rotating it with the ( n + 1)th row. An individual rotation is specified by two scalars, c _n and s _n (which can be regarded as the cosine and sine, respectively, of a rotation angle f _n ), and affects only the last row and the ( n + 1)th row. The effects on these two rows are

Equation 2.64

graphics/02equ064.gif

where the rotation factors are defined by

Equation 2.65

Equation 2.66

The correctness of (2.62) is shown in the Appendix (Section 2.8.1). It is seen from (2.62) that the computed quantities appearing on the right-hand side are C [ i ], u [ i ], v [ i ], at time n . It is also shown in the Appendix (Section 2.8.1) that the quantities a [ i ], z [ i ], and x [ i ] can be updated according to the equations

Equation 2.67

Equation 2.68

Equation 2.69

Note that g [ i ] in (2.62) is the last diagonal element of Q [ i ]. A direct calculation shows that [175, 316].

The initialization of the QR-RLS blind adaptive algorithm is given by , and a [-1] = d , where d is a small number. This corresponds to the initial condition C _r [-1] = d I _N and m ₁ [-1] = s ₁ (i.e., the adaptation starts with the matched filter). At each time i , the algorithm proceeds as follows.

Algorithm 2.4: [QR-RLS blind linear MMSE detector ”synchronous CDMA]

Update the detector: Apply the orthogonal transformation (2.62) .
Compute the detector output and perform differential detection:

Equation 2.70

Equation 2.71

The orthogonal transformation (2.62) on the block matrix can be mapped onto a triangular systolic array for highly efficient parallel implementation, which is discussed next.

Parallel Implementation on Systolic Arrays

The QR-RLS blind adaptive algorithm derived above has good numerical properties and is well suited for parallel implementation. Figure 2.1 shows systematically a systolic array implementation of this algorithm, using a triangular array first proposed in [316]. It consists of three sections: the basic upper triangular array, which stores and updates C [ i ]; the right-hand column of cells , which stores and updates u [ i ]; and the final processing cell , which computes the demodulated data bit. The system is initialized as and . The received data r [ i ] are fed from the top and propagate to the bottom of the array. The rotation angles f _n are calculated in left boundary cells and propagate from left to right. The internal cells update their elements by Givens rotations using the angles received from the left. The factor g [ i ] is calculated along the left boundary cells, where a dot ( ) represents an extra delay. The final cell extracts the signs of h [ i ] and g [ i ] and produces the demodulated differential data bit according to (2.71). The computation at each cell is also outlined in Fig. 2.1. The QR-RLS algorithm may also be carried out using the square-root free Givens rotation algorithm to reduce the computational complexity at each cell [158, 316]. For more details on the systolic array implementations, see [175, 316].

Figure 2.1. Systematic of the systolic array implementation of the QR-RLS blind adaptive multiuser detection algorithm ( N = 4, K = 2) and the operations at each cell.

graphics/02fig01.gif

The systolic array in Fig. 2.1 operates in a highly pipelined manner. The computational wavefront propagates at the received data symbol rate. The demodulated data bits are also output at the received data symbol rate. Note that the demodulated data bit produced on a given clock corresponds to the received vector entered 2 N clock cycles earlier.

If multiple synchronous user data streams need to be demodulated, we can simply add more column arrays on the right-hand side and initialize each of them by the corresponding signature vector of each user. It is clear that by using the same triangular array, multiple users' data can be demodulated simultaneously . This is also illustrated in Fig. 2.1 for the case of two users. (Also, multiple paths of the same signal can be handled by adding appropriate linear arrays to Fig. 2.1.)