2.3 Blind Multiuser Detection: Direct Methods


It is seen from (2.13) and (2.20) that these two linear detectors are expressed in terms of a linear combination of the signature sequences of all K users. Recall that for the matched-filter receiver, the only prior knowledge required is the desired user 's signature sequence. In the downlink of a CDMA system, the mobile receiver typically has knowledge of only its own signature sequence, but not of those of the other users. Hence it is of interest to consider the problem of blind implementation of the linear detectors (i.e., without the requirement of knowing the signature sequences of the interfering users). This problem is relatively easy for the linear MMSE detector. To see this, consider again the definition (2.19). Directly solving this optimization problem, we obtain

Equation 2.26

graphics/02equ026.gif


where, by (2.5),

Equation 2.27

graphics/02equ027.gif


is the autocorrelation matrix of the received signal. Note that C r can be estimated from the received signals by the corresponding sample autocorrelation. Note also that the constant A 1 2 in (2.26) does not affect the linear decision rule (2.7) or (2.9). Hence (2.26) leads straightforwardly to the following blind implementation of the linear MMSE detector ”the direct matrix inversion (DMI) blind detector . Here we do not assume knowledge of the complex amplitude of the desired user, so differential detection will be employed.

Algorithm 2.1: [DMI blind linear MMSE detector ”synchronous CDMA]

  • Compute the detector:

Equation 2.28

graphics/02equ028.gif


Equation 2.29

graphics/02equ029.gif


  • Perform differential detection:

Equation 2.30

graphics/02equ030.gif


Equation 2.31

graphics/02equ031.gif


Algorithm 2.1 is a batch processing method; it computes the detector only once based on a block of received signals graphics/033fig01.gif ; and the estimated detector is then used to detect all data bits of the desired user contained in the same signal block, graphics/033fig02.gif . In what follows we consider online implementations of the blind linear MMSE detector. The idea is to perform sequential detector estimation and data detection. That is, suppose that at time ( i - 1) an estimated detector m 1 [ i - 1] is employed to detect the data bit b 1 [ i - 1]. At time i a new signal r [ i ] is received which is then used to update the detector estimate to obtain m 1 [ i ]. The updated detector is used to detect the data bit b 1 [ i ]. Hence the blind detector is sequentially updated at the symbol rate. To develop such an adaptive algorithm, we need an alternative characterization of the linear MMSE detector. Consider the following constrained optimization problem:

Equation 2.32

graphics/02equ032.gif


To solve (2.32), define the Lagrangian

Equation 2.33

graphics/02equ033.gif


The solution to (2.32) is then obtained by solving

Equation 2.34

graphics/02equ034.gif


where l is such that graphics/034fig01.gif [i.e., graphics/034fig02.gif ]. Comparing the solution above with (2.26), it is seen that they differ only by a positive scaling constant. Since such a scaling constant will not affect the linear decision rule (2.7) or (2.9), (2.32) constitutes an equivalent definition of the linear MMSE detector. The approach to multiuser detection based on (2.32) was proposed in [183] and was termed minimum-output-energy (MOE) detection . A similar technique was developed for array processing [125, 176, 511], and in that context is termed the linearly constrained minimum variance (LCMV) array .

We next consider adaptive algorithms for recursively (online) estimating the linear MMSE detector defined by (2.32).

2.3.1 LMS Algorithm

We first consider the least-mean-squares (LMS) algorithm for recursive estimation of m 1 based on (2.32). Define

Equation 2.35

graphics/02equ035.gif


as a projection matrix that projects any signal in graphics/034fig03.gif onto the orthogonal space of s 1 . Note that m 1 can be decomposed into two orthogonal components :

Equation 2.36

graphics/02equ036.gif


with

Equation 2.37

graphics/02equ037.gif


Using the decomposition above, the constrained optimization problem (2.32) can then be converted to the following unconstrained optimization problem:

Equation 2.38

graphics/02equ038.gif


The LMS algorithm for adapting the vector x 1 based on the cost function (2.38) is then given by

Equation 2.39

graphics/02equ039.gif


where m is the step size and where the stochastic gradient g ( x 1 [ i ]) is given by

Equation 2.40

graphics/02equ040.gif


Substituting (2.40) into (2.39), we obtain the following LMS implementation of the blind linear MMSE detector. Suppose that at time i , the estimated blind detector is m 1 [ i ]= s 1 + x 1 [ i ]. The algorithm performs the following steps for data detection and detector update.

Algorithm 2.2: [LMS blind linear MMSE detector ”synchronous CDMA]

  • Compute the detector output:

Equation 2.41

graphics/02equ041.gif


Equation 2.42

graphics/02equ042.gif


  • Update the detector:

Equation 2.43

graphics/02equ043.gif


The convergence analysis of Algorithm 2.2 is given in [183]. An alternative stochastic gradient algorithm for blind adaptive multiuser detection is developed in [237], which employs the technique of averaging to achieve an accelerated convergence rate (compared with the LMS algorithm). An LMS algorithm for blind adaptive implementation of the linear decorrelating detector is developed in [501]. Moreover, a comparison of the steady-state performance (in terms of output mean-square error) shows that the blind detector incurs a loss compared with a training-based LMS detector [183, 389, 390]. A two-stage adaptive detector is proposed in [63], where symbol-by-symbol predecisions at the output of a first adaptive stage are used to train a second stage, to achieve improved performance.

2.3.2 RLS Algorithm

The LMS algorithm discussed above has a very low computational complexity, on the order of graphics/035fig01.gif operations per update. However, its convergence is usually very slow. We next consider the recursive least-squares (RLS) algorithm for adaptive implementation of the blind linear MMSE detector, which has a much faster convergence rate than the LMS algorithm. Based on the cost function (2.32), at time i the exponentially windowed RLS algorithm selects the weight vector m 1 [ i ] to minimize the sum of exponentially weighted mean-square output values:

Equation 2.44

graphics/02equ044.gif


where 0 < l < 1 (1 - l << 1) is called the forgetting factor . The solution to this optimization problem is given by

Equation 2.45

graphics/02equ045.gif


with

Equation 2.46

graphics/02equ046.gif


Denote graphics/036fig01.gif . Note that since

Equation 2.47

graphics/02equ047.gif


by the matrix inversion lemma we have

Equation 2.48

graphics/02equ048.gif


Hence we obtain the RLS algorithm for adaptive implementation of the blind linear MMSE detector as follows. Suppose that at time ( i - 1), F [ i - 1] is available. Then at time i , the following steps are performed to update the detector m 1 [ i ] and to detect the differential bit b 1 [ i ].

Algorithm 2.3: [RLS blind linear MMSE detector ”synchronous CDMA]

  • Update the detector:

Equation 2.49

graphics/02equ049.gif


Equation 2.50

graphics/02equ050.gif


Equation 2.51

graphics/02equ051.gif


  • Compute the detector output:

Equation 2.52

graphics/02equ052.gif


Equation 2.53

graphics/02equ053.gif


The convergence properties of Algorithm 2.3 are analyzed in detail in [389].

2.3.3 QR-RLS Algorithm

The RLS approach discussed in Section 2.3.2, which is based on the matrix inversion lemma for recursively updating C r [ i ] -1 , has graphics/037fig01.gif complexity per update. Note that although fast RLS algorithms of graphics/035fig01.gif complexity exist [66, 83, 116, 124], all these algorithms exploit a time-index-shifting property of the input data. In this particular application, however, successive input data vectors do not have the shifting relationship; in fact, r [ i ] and r [ i - 1] do not overlap at all. Therefore, these standard fast RLS algorithms cannot be applied in this application.

The RLS implementation of the blind linear MMSE detector suffers from two major problems. The first problem is numerical. Recursive estimation of C r [ i ] -1 is poorly conditioned because it involves inversion of a data correlation matrix. The condition number of a data correlation matrix is the square of the condition number of the corresponding data matrix; hence twice the dynamic range is required in the numerical computation [158]. The second problem is that the form of the recursive update of C r [ i ] -1 severely limits the parallelism and pipelining that can effectively be applied in implementation.

A well-known approach for overcoming these difficulties associated with the RLS algorithms is the rotation-based QR-RLS algorithm [175, 389, 588]. The QR decomposition transforms the original RLS problem into a problem that uses only transformed data values, by Cholesky factorization of the original least-squares data matrix. This causes the numerical dynamic range of the transformed computational problem to be halved and enables more accurate computation than that with the RLS algorithms that operate directly on C r [ i ] -1 . Another important benefit of the rotation-based QR approaches is that the computation can easily be mapped onto systolic array structures for parallel implementations. We next describe the QR-RLS blind linear MMSE detector, first developed in [389].

QR-RLS Blind Linear MMSE Detector

Assume that C r [ i ] is positive definite. Let

Equation 2.54

graphics/02equ054.gif


be the Cholesky decomposition (i.e., C [ i ] is the unique upper triangular Cholesky factor with positive diagonal elements). Define the following quantities :

Equation 2.55

graphics/02equ055.gif


Equation 2.56

graphics/02equ056.gif


Equation 2.57

graphics/02equ057.gif


At time i , the a posteriori least-squares (LS) estimate is given by

Equation 2.58

graphics/02equ058.gif


Equation 2.59

graphics/02equ059.gif


The a priori LS estimate at time i is given by

Equation 2.60

graphics/02equ060.gif


It can be shown that x [ i ] and z [ i ] are related by [389]

Equation 2.61

graphics/02equ061.gif


Suppose that C [ i “ 1] and u [ i “ 1] are available from the previous recursion. At time i the new observation r [ i ] becomes available. We construct a block matrix consisting of C [ i “ 1], u [ i “ 1], and r [ i ] and apply an orthogonal transformation as follows:

Equation 2.62

graphics/02equ062.gif


In (2.62) the matrix Q [i] which zeros the first N elements on the last row of the partitioned matrix appearing on the left-hand side of (2.62), is an orthogonal matrix consisting of N Givens rotations ,

Equation 2.63

graphics/02equ063.gif


where Q n [ i ] zeros the n th element in the last row by rotating it with the ( n + 1)th row. An individual rotation is specified by two scalars, c n and s n (which can be regarded as the cosine and sine, respectively, of a rotation angle f n ), and affects only the last row and the ( n + 1)th row. The effects on these two rows are

Equation 2.64

graphics/02equ064.gif


where the rotation factors are defined by

Equation 2.65

graphics/02equ065.gif


Equation 2.66

graphics/02equ066.gif


The correctness of (2.62) is shown in the Appendix (Section 2.8.1). It is seen from (2.62) that the computed quantities appearing on the right-hand side are C [ i ], u [ i ], v [ i ], at time n . It is also shown in the Appendix (Section 2.8.1) that the quantities a [ i ], z [ i ], and x [ i ] can be updated according to the equations

Equation 2.67

graphics/02equ067.gif


Equation 2.68

graphics/02equ068.gif


Equation 2.69

graphics/02equ069.gif


Note that g [ i ] in (2.62) is the last diagonal element of Q [ i ]. A direct calculation shows that graphics/039fig01.gif [175, 316].

The initialization of the QR-RLS blind adaptive algorithm is given by graphics/039fig02.gif , and a [-1] = d , where d is a small number. This corresponds to the initial condition C r [-1] = d I N and m 1 [-1] = s 1 (i.e., the adaptation starts with the matched filter). At each time i , the algorithm proceeds as follows.

Algorithm 2.4: [QR-RLS blind linear MMSE detector ”synchronous CDMA]

  • Update the detector: Apply the orthogonal transformation (2.62) .

  • Compute the detector output and perform differential detection:

Equation 2.70

graphics/02equ070.gif


Equation 2.71

graphics/02equ071.gif


The orthogonal transformation (2.62) on the block matrix can be mapped onto a triangular systolic array for highly efficient parallel implementation, which is discussed next.

Parallel Implementation on Systolic Arrays

The QR-RLS blind adaptive algorithm derived above has good numerical properties and is well suited for parallel implementation. Figure 2.1 shows systematically a systolic array implementation of this algorithm, using a triangular array first proposed in [316]. It consists of three sections: the basic upper triangular array, which stores and updates C [ i ]; the right-hand column of cells , which stores and updates u [ i ]; and the final processing cell , which computes the demodulated data bit. The system is initialized as graphics/039fig03.gif and graphics/039fig04.gif . The received data r [ i ] are fed from the top and propagate to the bottom of the array. The rotation angles f n are calculated in left boundary cells and propagate from left to right. The internal cells update their elements by Givens rotations using the angles received from the left. The factor g [ i ] is calculated along the left boundary cells, where a dot ( ) represents an extra delay. The final cell extracts the signs of h [ i ] and g [ i ] and produces the demodulated differential data bit according to (2.71). The computation at each cell is also outlined in Fig. 2.1. The QR-RLS algorithm may also be carried out using the square-root free Givens rotation algorithm to reduce the computational complexity at each cell [158, 316]. For more details on the systolic array implementations, see [175, 316].

Figure 2.1. Systematic of the systolic array implementation of the QR-RLS blind adaptive multiuser detection algorithm ( N = 4, K = 2) and the operations at each cell.

graphics/02fig01.gif

The systolic array in Fig. 2.1 operates in a highly pipelined manner. The computational wavefront propagates at the received data symbol rate. The demodulated data bits are also output at the received data symbol rate. Note that the demodulated data bit produced on a given clock corresponds to the received vector entered 2 N clock cycles earlier.

If multiple synchronous user data streams need to be demodulated, we can simply add more column arrays on the right-hand side and initialize each of them by the corresponding signature vector of each user. It is clear that by using the same triangular array, multiple users' data can be demodulated simultaneously . This is also illustrated in Fig. 2.1 for the case of two users. (Also, multiple paths of the same signal can be handled by adding appropriate linear arrays to Fig. 2.1.)



Wireless Communication Systems
Wireless Communication Systems: Advanced Techniques for Signal Reception (paperback)
ISBN: 0137020805
EAN: 2147483647
Year: 2003
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net