Professional Documents
Culture Documents
Error between lter output y (t) and a desired signal d(t): e(n) = d(n) y (n) = d(n) w(n)T u(n) Change the lter parameters according to w(n + 1) = w(n) + u(n)e(n)
wT (n + 1)u(n) = d(n)
with the least modication of w(n), i.e. with the least Euclidian norm of the dierence w(n + 1) w(n) = w(n + 1)
Lecture 5
M 1 k =0
(wk (n + 1) wk (n))2
wT (n + 1)u(n) = d(n)
2
The solution can be obtained by Lagrange multipliers method: J (w(n + 1), ) = = w(n + 1)
M 1 k =0
+ d(n) wT (n + 1)u(n)
2 M 1 i=0
wi (n + 1)u(n i)
To obtain the minimum of J (w(n + 1), ) we check the zeros of criterion partial derivatives: J (w(n + 1), ) = 0 wj (n + 1)
wi (n + 1)u(n i)
Lecture 5
d(n) =
1 (wi (n) + u(n i))u(n i) 2 i=0 M 1 1 M 1 d(n) = wi (n)u(n i) + (u(n i))2 2 i=0 i=0 M 1 2(d(n) i =0 wi (n)u(n i)) = M 1 2 i=0 (u(n i)) 2e(n) = M 1 2 i=0 (u(n i)) 2e(n) u(n j ) i))2
M 1
Thus, the minimum of the criterion J (w(n + 1), ) will be obtained using the adaptation equation wj (n + 1) = wj (n) +
M 1 i=0 (u(n
In order to add an extra freedom degree to the adaptation strategy, one constant, , controlling the step size will be introduced: 1 M 1 e ( n ) u ( n j ) = w ( n ) + e(n)u(n j ) wj (n + 1) = wj (n) + j 2 2 u ( n ) ( u ( n i )) i=0 To overcome the possible numerical diculties when u(n) is very close to zero, a constant a > 0 is used: e(n)u(n j ) a + u(n) 2
wj (n + 1) = wj (n) +
Lecture 5
The principal characteristics of the Normalized LMS algorithm are the following: The adaptation constant is dimensionless, whereas in LMS, the adaptation has the dimensioning of a inverse power. Setting a + u(n) 2 we may vue Normalized LMS algorithm as a LMS algorithm with data- dependent adptation step size. (n) =
Considering the approximate expression (n) = the normalization is such that : * the eect of large uctuations in the power levels of the input signal is compensated at the adaptation level. * the eect of large input vector length is compensated, by reducing the step size of the algorithm. This algorithm was derived based on an intuitive principle: In the light of new input data, the parameters of an adaptive system should only be disturbed in a minimal fashion. The Normalized LMS algorithm is convergent in mean square sense if 0< <2 a + M E (u(n))2
Lecture 5
Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization)
10
10
Learning curve Ee2(n) for Normalized LMS algorithm =1.5 =1.0 =0.5 =0.1
10 10
1
10
10
10
10
500
1500
2000
2500
10
500
1500
2000
2500
Lecture 5
Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization): The LMS was run with three dierent step-sizes: = [0.075; 0.025; 0.0075] The NLMS was run with four dierent step-sizes: = [1.5; 1.0; 0.5; 0.1] With the step-size = 1.5, NLMS behaved denitely worse than with step-size = 1.0 (slower, and with a higher steady state square error). So = 1.5 is further ruled out Each of the three step-sizes = [1.0; 0.5; 0.1] was interesting: on one hand, the larger the stepsize, the faster the convergence. But on the other hand, the smaller the step-size, the better the steady state square error. So each step-size may be a useful tradeo between convergence speed and stationary MSE (not both can be very good simultaneously). LMS with = 0.0075 and NLMS with = 0.1 achieved a similar (very good) average steady state square error. However, NLMS was faster. LMS with = 0.075 and NLMS with = 1.0 had a similar convergence speed (very good). However, NLMS achieved a lower steady state average square error. To conclude: NLMS oers better trade-os than LMS. The computational complexity of NLMS is slightly higher than that of LMS.
Lecture 5
1 n+c
Disadvantage for non-stationary data: the algorithm will not react anymore to changes in the optimum solution, for large values of n. Variable Step algorithm: w(n + 1) = w(n) + M (n)u(n)e(n) where M (n) =
0 (n) 0 0 1 (n) 0 0 0 0
. 0 . 0 . 0 . M 1 (n)
Lecture 5
or componentwise wi (n + 1) = wi (n) + i (n)u(n i)e(n) i = 0, 1, . . . , M 1 * each lter parameter wi (n) is updated using an independent adaptation step i (n). * the time variation of i (n) is ad-hoc selected as if m1 successive identical signs of the gradient estimate, e(n)u(n i) ,are observed, then i (n) is increased i (n) = c1 i (n m1 ) (c1 > 1)(the algorithm is still far of the optimum, is better to accelerate) if m2 successive changes in the sign of gradient estimate, e(n)u(n i), are observed, then i (n) is decreased i (n) = i (n m2 )/c2 (c2 > 1) (the algorithm is near the optimum, is better to decelerate; by decreasing the step-size, so that the steady state error will nally decrease).
Lecture 5
9
0.01n+c )within
Comparison of LMS and variable size LMS ( = (channel equalization) with c = [10; 20; 50]
10
10
Learning curve Ee (n) for variable size LMS algorithm c=10 c=20 c=50 LMS =0.0075
10 10
1
10
10
10
10
500
1500
2000
2500
10
500
1500
2000
2500
Lecture 5
10
3. Sign algorithms
In high speed communication the time is critical, thus faster adaptation processes is needed.
sgn(a) =
1; a > 0 0; a = 0 1; a < 0
The Sign algorithm (other names: pilot LMS, or Sign Error) w(n + 1) = w(n) + u(n) sgn(e(n)) The Clipped LMS (or Signed Regressor) w(n + 1) = w(n) + sgn(u(n))e(n) The Zero forcing LMS (or Sign Sign) w(n + 1) = w(n) + sgn(u(n)) sgn(e(n)) The Sign algorithm can be derived as a LMS algorithm for minimizing the Mean absolute error (MAE) criterion J (w) = E [|e(n)|] = E [|d(n) wT u(n)|] (I propose this as an exercise for you). Properties of sign algorithms:
Lecture 5
11
Very fast computation : if is constrained to the form = 2m , only shifting and addition operations are required. Drawback: the update mechanism is degraded, compared to LMS algorithm, by the crude quantization of gradient estimates. * The steady state error will increase * The convergence rate decreases The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for 32000 bps system.
Lecture 5
12
Comparison of LMS and Sign LMS within the example from Lecture 4 (channel equalization)
10
10
Learning curve Ee2(n) for Sign LMS algorithm =0.075 =0.025 =0.0075 =0.0025
10 10
1
10
10
10
10
500
1500
2000
2500
10
500
1500
2000
2500
Sign LMS algorithm should be operated at smaller step-sizes to get a similar behavior as standard LMS algorithm.
Lecture 5
13
= h(N + 1) = . . . = 0
w(n + 1) = w(n) + N
n j =nN +1
e(j )u(j )
Lecture 5
14
Momentum LMS algorithm When LPF is an IIR lter of rst order h(0) = 1 , h(1) = h(0), h(2) = 2 h(0), . . . then, bi (n) = LP F (gi (n)) = bi (n 1) + (1 )gi (n) b(n) = b(n 1) + (1 )g (n) The resulting algorithm can be written as a second order recursion: w(n + 1) w(n) w(n + 1) w(n) w(n + 1) w(n + 1) w(n + 1) = = = = = = w(n) b(n) w(n 1) b(n 1) w(n) w(n 1) b(n) + b(n 1) w(n) + (w(n) w(n 1)) (b(n) b(n 1)) w(n) + (w(n) w(n 1)) (1 )g (n) w(n) + (w(n) w(n 1)) + 2(1 )e(n)u(n)
w(n + 1) w(n) = (w(n) w(n 1)) + (1 )e(n)u(n) Drawback: The convergence rate may decrease. Advantages: The momentum term keeps the algorithm active even in the regions close to minimum. For nonlinear criterion surfaces this helps in avoiding local minima (as in neural network learning by backpropagation)
Lecture 5
15
Lecture 5
16
Since we use at each time instant, n, two dierent computations of the lter output ((1) and (2)), the name of the method is double updating. One interesting situation is obtained when = 1. Then y (n) = w(n + 1)T u(n) = (w(n)T + e(n)u(n)T )u(n) T u(n) u(n) = w(n)T u(n) + e(n) = w(n)T u(n) + d(n) w(n)T u(n) = d(n)
Finaly, the output of the lter is equal to the desired signal y (n) = d(n), and the new error is zero e (n) = 0. This situation is not acceptable (sometimes too good is worse than good). Why?
Lecture 5
17
wk (n)u(n k )
M 1 M 1 i=0 j =i [2]
wk (n)u(n k ) +
We will introduce the input vector with dimension M + M (M + 1)/2: (n) = u(n) u(n 1) . . . u(n M + 1) u2 (n) u(n)u(n 1) . . . u2 (n M + 1) and the parameter vector
[1] [1] [1] [2] [2] [2] (n) = w0 (n) w1 (n) . . . wM 1 (n) w0,0 (n) w0,1 (n) . . . wM 1,M 1 (n) T T
Now the output of the quadratic lter (3) can be written y (n) = (n)T (n) and therefore the error e(n) = d(n) y (n) = d(n) (n)T (n) is a linear function of the lter parameters (i.e. the entries of (n))
Lecture 5
18
Minimization of the mean square error criterion J () = E (e(n))2 = E (d(n) (n)T (n))2 will proceed as in the linear case, resulting in the LMS adptation equation (n + 1) = (n) + (n)e(n) Some proprties of LMS for Volterra lters are the following: The mean sense convergence of the algorithm is obtained when 0<< 2 max (4)
where max is the maximum eigenvalue of the correlation matrix R = E (n) (n)T If the distribution of the input is symmetric (as for the Gaussian distribution), the corelation matrix has the block structure R 0 2 R= 0 R4 corresponding to the partition in linear and quadratic coecients
(n) =
(n) =
Lecture 5
19
and the adaptation problem decouples in two problems, i.e. the Wiener optimal lters will be solutions of R2 w[1] = Ed(n)u[1] (n) R4 w[2] = Ed(n)u[2] (n) and similarly, the LMS solutions can be analyzed separately, for linear, w[1] , and quadratic, w[2] , coecients, according to the eigenvalues of R2 and R4 .