You are on page 1of 5

1674

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 10, OCTOBER 2009

Brief Papers
Generalized Encoding and Decoding Operators for Lattice-Based Associative Memories
John McElroy and Paul Gader II. ASSOCIATIVE MEMORIES An in depth introduction to lattice-based associative memories is given in [11]; a short treatment is given here. provides a mapping between In general, an associative memory a set of input queries, often called keys, and the values we would like to associate with them [12]. Given a set of input pairs of the form k g is the set of keys  X;  Y such that X ; f 1; 2; 1 2 k and Y f ; ; ; g are their corresponding values, we say that M is a perfect recall memory for X and Y if and only if " ! " ; 8"  f ; M ! ; k g. The associative memory model described above is known as a heteroassociative memory because the set of keys may be different than the set of stored patterns. An autoassociative memory associates pattern pairs of the form  ;  such that   X . Typically, these memories are used for retrieving the clean version of a stored pattern upon the application of a noisy version. Therefore, an important attribute of an associative memory is not only its ability to recall stored prototypes but also to exclude nonprototypes. Unlike associative memories based on linear correlation matrices, the lattice-based associative memory uses the maximum (or ) and ) operators to store pattern pairs of the form  ;  minimum (or ; ; k . The goal is the same as before, i.e., to store the pairs for  in a memory such that  is recalled whenever  is presented. initial data set. This improvement is described in detail, and improved tolerance to outliers is quantied using a widely available data set.

AbstractDuring the 1990s, Ritter et al. introduced a new family of associative memories based on lattice algebra instead of linear algebra. These memories provide unlimited storage capacity, unlike linear-correlation-based models. The canonical lattice-based memories, however, are susceptible to noise in the initial input data. In this brief, we present novel methods of encoding and decoding lattice-based memories using two families of ordered weighted average (OWA) operators. The result is a greater robustness to distortion in the initial input data, and a greater understanding of the effect of the choice of encoding and decoding operators on the behavior of the system, with the tradeoff that the time complexity for encoding is increased. Index TermsAssociative, lattice, memory, ordered weighted average (OWA).

(x

y y

... 1 ...

x x

...

x x

I. INTRODUCTION The ability of the human brain to associate seemingly unrelated patterns, such as the sight of a movie poster with the smell of buttered popcorn, is fascinating. Research into a family of articial neural networks known as associative memories attempts to develop a computational framework for mimicking this behavior. A well-known example of a recurrent associative memory is the Hopeld network. [1], [2] . Other associative memory models include the bidirectional associative memory [3] and the quantum associative memory [4][7]. During the 1990s, Ritter et al. introduced a new family of associative memories based on lattice algebra instead of the more common linear-algebra-based approaches [8], [9] . These morphological associative memories, which use and operators for encoding and decoding, provide unlimited storage capacity and do not require orthogonality in their key vectors. Comparisons of the morphological memories to the Hopeld network can be found in [10]. However, these lattice-algebra-based networks are not always robust in the presence of noise [11]. Though the literature focuses on improving such networks ability to retrieve clean input patterns when presented with noisy versions, it does not take into account the presence of outliers within the initial data. Such outliers are capable of making the xed point set arbitrarily wide, which can greatly decrease the performance of the memory when presented with a distorted version of an initial data point. First, the canonical encoding and decoding operators are generalized to the set of ordered weighted average (OWA) operators. A set of OWAbased associative memories is introduced, which makes use of order and the for encoding, statistic operators beyond simply the while still using the canonical decoding operators. This novel class of memories is shown to be more robust with respect to outliers within the

min = 1 ...

max (x y )

A. The Canonical Encoding and Decoding Processes Using the lattice-theoretic framework, it is possible to create two different memories for the chosen pair of pattern associations. One of these memories uses the pointwise minimum and is dened as XY such that

min

max

wij

k =1

yi

0 x j

(1)

Likewise, there is a memory XY created using the pointwise maximum and it is dened such that
mij

M
k

=1

yi

0 x j

(2)

min

max

This research focuses on autoassociative memories specically, so  xi for our purposes. The memories XX and XX will be referred to as canonical memories throughout this document to separate them from the novel memories proposed here, which make use of or . Referring to (1) and (2), encoding operators other than , for all i. note that in the autoassociative case, wii mii ^ The decoding process involves the operators and _ ,
yi

min max =

=0

Manuscript received November 26, 2007; revised May 24, 2009; accepted June 09, 2009. First published September 15, 2009; current version published October 07, 2009. The authors are with the Computer and Information Science and Engineering Department, University of Florida, Gainesville, FL 32611 USA (e-mail: john@johnmcelroy.com; pgader@cise.u.edu). Digital Object Identier 10.1109/TNN.2009.2028424

known as the

min product and max product. They are dened such that C = A ^ B means that cij = p k=1 (aik + bkj ). Similarly, _ p C = A B implies that cij = k=1(aik + bkj ). These operators
are the lattice algebraic equivalents to the matrix product of linear algebra.

1045-9227/$26.00 2009 IEEE

MCELROY AND GADER: GENERALIZED ENCODING AND DECODING OPERATORS FOR LATTICE-BASED ASSOCIATIVE MEMORIES

1675

For the canonical memories, the decoding operator is the dual of the encoding operator. For instance, decoding with the max-encoded memory

y = MXX ^ x:

by whether 8XX is encoded using the min or max operators. The weights corresponding to the min operator are (w1 = 1; w2 = 0), while the weights from the max operator are (w1 = 0; w2 = 1). Let the four possible outcomes of the sort be

(3)

to say that WXX _ x = x = MXX ^ x . Also, it was shown that X  F (X), meaning that both of the canonical memories provide perfect recall across the set X of initial inputs. Another noteworthy theorem states that for every point x in the xed point set the set of points fa + xg is also in F (X), for any real number a. This means that membership in the xed point set is invariant under scalar shift [8]. These memories also have the property that all points will converge to a xed point in one iteration. A memory so encoded from the set k of vectors in <n will contain n2 elements. Each entry can be computed in time O(k), so the entire time to compute the encoding is O(k n2 ). Decoding takes O(n2 ) time. The space complexity of the OWA-based memories described here is the same as that of the canonical memories, as is the decoding time complexity, since the canonical decoding process is used. The encoding phase will longer for OWA-based memories, on average, since computing each entry requires a sort of all n elements. Assuming a sort with a worst case time complexity of O(k log k), such as merge sort, will yield a time complexity of O(kn2 log k). This only has to be computed once, however, and the system still converges in one pass. B. Piecewise Linearity of Lattice-Based Memories

The min and max operators have a dualistic relationship in the sense that (a1 ; . . . ; an ) = 0 (0a1 ; . . . ; 0an ). This is because a(1)  a(2)  1 1 1  a(n) implies that 0a(1)  0a(2)  1 1 1  0a(n) . A xed point set of an associative memory is any point x such that x ! M ! x. Let the xed point set for either canonical memory be denoted F (X). Ritter et al. revealed that if x is a xed point of WXX , then it is a xed point of MXX as well and vice versa, that is

Case1 : Case2 : Case3 : Case4 :

1(1) 1(1)  

=1 =2 1(1) = 1 1(1) = 2

1(2) 1(2)  

=2 =1 1(2) = 2 1(2) = 1

2(1) 2(1)  

=1 =1 2(1) = 2 2(1) = 2

2(2) 2(2)  

=2 =2 2(2) = 1 2(2) =1
:

(8) (9) (10) (11)

Each case implies two constraints, based on the sort. The general form of the constraints for all cases is
1 2

 

(1) (1)

+  (1) [ ]  +  (1) [ ] 
x x t t

1 2

 

(2) (2)

+  (2) [ ] +  (2) [ ]
x x t t

(12)
:

(13)

The decoding process of the lattice-based associative memory, with respect to a given dimension, can be viewed as an OWA, described by Yager [13]. An OWA operator of dimension n is a mapping
F

: n ! <n
I

(14)

with an associated weighting vector W = [w1 ; w2 ; . . . ; wn ] such that W1  (0; 1) and 1 W1 = 1. The action of F on a sequence of n values a1 ; . . . ; an is dened such that F (a1 ; . . . ; an ) = w1 a(1) ; . . . ; w1 a(n) where a(i) is the ith largest of the values in the sequence, assuming that the sorted items are listed from least to greatest. In this case, the min operator has an associated weighting vector of
w

i = 0;

if i = 1 if i f2; . . . ; ng

(15)

Though lattice-based associative memories discussed are nonlinear, they can be treated as piecewise linear systems. This piecewise linear viewpoint allows for the extension of the lattice-based model to include other order statistic operators, and other OWA operators, in general, beyond the min and the max. Given a set of input patterns X = fx1 ; x2 ; . . . ; xk g such that xi  <2 ; 8i create a lattice-based associative memory

8XX =

11 21

12 22

(4)

where 8XX can be either WXX or MXX . Treating the lattice-based associative memory as a dynamical system, let x[t] denote the state of vector, i.e.,

while the max operator handled similarly. For any particular sort, the action of decoding is simply a weighted sum. Therefore, when examining any one particular case, the nonlinearity of the decoding operator is removed. Thus, the decoding process is piecewise linear, with a different linear subsystem corresponding to each possible sort. The result of the constraints corresponding to each sort is that the Euclidean plane is divided into three regions, each of which encompasses the domain of one of the subsystems. The reason that there are only three, and not four, is that the constraints for either Case2 or Case3 will be contradictory, depending on the chosen encoding operator, and therefore, no region of the plane can satisfy them. To see this, it is helpful to dene the two lines
L1 L

x[t + 1] = 8xx

x[t]

(5)

: 2 :

x2 x

= 1+ 2 = 10
x x

21 12 :

(16) (17)

where stands for or depending on what is appropriate for the encoding operator. Expanding the right-hand side of (5) yields
x1 t

[ +1] = [ +1] =

1

 

(1)

+  (1) [ ]
x t

w1

+ 1 (2) +  (2) [ ]
 x t

w2

(6)

and
x2 t 2

(1)

+  (1) [ ]
x t

w1

+ 2 (2) +  (2) [ ]
 x t

w2

(7)

where i (j ) is a function returning the index of the j th element after sorting the ith sequence of values shown in (6) and (7); the sort occurs after the addition is performed. In this case, the w values are determined

Though it is possible that they may be equal, it is usually the case that one is above the other. Line L1 is always above L2 when the max operator is used to encode the input vectors. Likewise, when the min operator is used to encode, L2 > L1. Note that L1 and L2 divide the statespace into three regions. Let us call them RA , RB , and RC . The region that is above both L1 and L2 is region RA , which corresponds to the constraints of Case1 . The region RC exists below both lines, and represents the constraints of Case4 . In between L1 and L2 lies RC , which corresponds to Case3 when max encoding is used and Case2 when min encoding is used. The remaining cases, Case2 for a MXX and Case3 for a WXX , constrain the range of state vectors to be both above the topmost line and below the bottommost, which is impossible.

1676

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 10, OCTOBER 2009

C. The Fixed Point Set

WXX

In order for a point x to be a xed point for the

min encoded memory . . . ; ng :


(18)

data set.

minmax sense and to provide a level of robustness to outliers in the


III. MEMORIES WITH ORDER STATISTICS ENCODING

using order statistics operators, are shown to be well behaved in the

WXX

_ x

= xi
_

8i  f1;

For the purposes of this discussion, the process of creating a memory using the th-order statistic will be denoted

operator, and the fact that the Recalling the denition of the diagonal elements of XX are zero, this means that

 ij = x i 0 xj

 (b)

8   f1;

. . . ; kg

(24)

j =1

(wij + xj ) = wii + xi

8i  f1;

. . . ; ng :

(19)

This means that 11 12 22 2 and 21 2, 1 2 which corresponds to 2 , or the case between the boundaries 1 and 2 . By similar logic, it can be shown that the xed point set for encoded memory XX corresponds to the 3 , which is also between 1 and 2 when encoding is used. Thus, the xed point set of the 2-D canonical lattice-based associative memory is the region between these two boundaries. , In the -dimensional case, for each pair of dimensions f g, the state space can be divided up into three regions, in a manner similar to the 2-D case. In general, with memory XX for a given dimension pair

max

w +x > w +x
Case max

w +x < w +x L
Case

 where ij refers to the th element of  i 0 j when sorted from least to greatest. This means that the canonical method of encoding a set of vectors using the operators becomes
(k)

 (b) b k max mij = xi 0 xj 


min

x x

8   f1;

. . . ; kg

(25)

i;j  1; . . . ;n

(i;j )

(i;j )

encoding similarly dened. The vector of OWA weights with the used for this encoding would only differ from the -dimensional zero vector at the th index, which would contain a 1. For decoding, which takes place over the dimensions of the state vector, let the decoding operator for the th-order statistic be denoted

. For a key vector x, and memory coding process would be written

XXX , the canonical min-de-

WXX _ x[k]

, xi [k] =

(wi + x [k]) =1 , xi [k]  wi + x [k] 8 , xi [k]  wij + xj [k]:

= xi [k]
k

MXX ^ x = MXX
(20) (21) (22) (23) Decoding using the

1 x:

(26)

Encoding a lattice-based associative memory with the th-order statistic yields a memory XX such that

8 ij = xi 0 xj


^ x[k]

 (b) n

(27)

 ji i , and thereThis means that i 0 ij  j and j is between the two boundaries. We can say that a vector fore, j x is a xed point if j lies between the boundaries i 0 ij and f g. It can also be shown for XX that i ji , for all x is a xed point if j lies between the boundaries i 0 ij and i ji , for all . For more general memory XX , which can be XX XX , or a memory created using a more general set of operators (such as those discussed Section III), the boundaries are referred to as i 0 ij and i ji . The case in which the state vector is between these two boundaries for a given dimension pair will be referred to as the inside case for that dimension pair. For the canonical memories, points within the inside case are always xed points. We will refer to this property by saying that XX and XX are well behaved in sense. Viewing the lattice-based associative memorys the encoding and decoding processes as OWA operations motivates the exand the . Since both of amination of operators outside of the the canonical memories are encoded and decoded using dual operators, one assumption might be that it is necessary for all encoding and decoding operator pairs to be duals. The notion of dual OWA operators was dened by Yager [13], [14]. Our research has explored a family of associative memories using dual operator pairs that are arbitrarily close and the . A surprising property of these memories is to the that the state vector never reaches a xed point, i.e., these memories sense, and they do not converge are not well behaved in the in one pass. Such memories are outside the scope of this brief, but in Section III, we will introduce a family of memories that use encoding and decoding operators that are not duals. These memories, encoded

x [k] [k] x +w [k] x +m W M x 

x [k] w x x i;j  1; . . . ;n x i;j x +

x [k] w + x x w M x m
8

min operator, then, means that


i

x[k + 1]i = 8XX b

j=1

(ij + xj [k]) :

(28)

When using the th-order statistic for encoding and the rst-order statistic for decoding, a vector x is a xed point if and only if for all vectors x X, the difference i 0 j must be less than or  equal to the th difference  and all i 0 j for all dimension pairs

min max

min

max

 x [k] x [k] b x x  vectors x  X, that is, x i 0 xj  (b) . This is true because for all i;j  f1; . . . ;ng xi [k] 0 xj [k]  xi 0 xj  (b) 8   , xi [k]  xj [k] + x i 0 xj  (b) , xi [k]  xj [k] + ij n , xi [k] = (ij + xj [k])
j=1

(i;j )

(29)

8

(30) (31) (32)

min

max

, xi [k] =

8XX 1 x[k] :
i

(33)

min max

Similarly, when using the th-order statistic for encoding and the last order statistic for decoding, a vector x is a xed point if and only if for must be greater than all vectors x X, the difference i 0 j 0  for all dimension pairs and or equal to the th difference  i j all vectors x X.

 b 

x [k] x [k] x x

(i;j )

MCELROY AND GADER: GENERALIZED ENCODING AND DECODING OPERATORS FOR LATTICE-BASED ASSOCIATIVE MEMORIES

1677

Fig. 1. Narrowing the xed point set for dimension pair (i; j ).

Fig. 2. Noise and the narrowed xed point set.

The beauty of encoding a set of vectors using generalized orderstatistics-based operators is that it allows for a certain amount of robustness to outliers in the input keys. Encoding a memory using the bth-order statistic implies that for any dimension pair (i; j ) the upper bound for the difference (xi 0 xj ) for any xed point x is the bth largest difference found between dimensions i and j in any of the original key vectors x  X . This means that the original patterns in X are not required to be free of impulse noise, as they are in the canonical min = max lattice-based associative memories. A. Robustness With Respect to Outliers For an n-dimensional example, consider the case when X is an n-dimensional matrix containing 11 vectors. For the sake of this example, only two rows need to be dened. Let row i be
[X ]i = (2; 3:5; 4; 4:5; 5; 5:5; 6; 7; 7:5; 3:5;

Fig. 3. Max encoding error.

08)

(34) periment was carried out on the auto-mpg data set from the Carnegie Mellon University Statistics Library (Pittsburgh, PA). (35) A. The Data The auto-mpg data set contains 398 records for the city-cycle fuel consumption of various automobiles [15]. It was originally used to test graphical analysis packages at the 1983 American Statistical Association Exposition and is currently available from the University of California, Irvine, Machine Learning Repository [16]. Included in each data point is the fuel efciency of each car in miles per gallon. The fuel efciency of the various vehicles covers a broad range, from 10 to 45 m/gal, though the bulk of these lie between 10 and 35. Primarily four-cylinder cars were used, though six-cylinder and eight-cylinder models were used as well. No vehicle had as much as 250 horsepower, though cars with greater horsepower existed. All vehicles were made between 1970 and 1982. In this experiment, a single vector was chosen at random and dimension 2 (cylinders), dimension 4 (horsepower), and dimension 7 (model year) were distorted. This noise vector is given ten cylinders, 350 horsepower, and a model year of 1983. Though a car with these specs may or may not exist in real life, each individual specication is realistic and this distortion may be considered the simulation of human error at the time of data collection. The point of this exercise is to quantify the effect that the choice of encoding operator has on the retrieval of stored patterns given noisy keys using a specic data set, in order to get a feeling for its effect in general. Next, each of the vectors was chosen in turn and distorted in dimensions 2, 4, and 7. The distorted vector was applied to each memory as the input and the output was compared to the original, clean version of the vector. This simulated the receipt of a noisy input. The root mean squared error (rmse) between the output vector and the original vector was computed and stored. This entire process was repeated 1000 times. Fig. 3 shows a histogram for the rmse values pertaining to MXX . Fig. 4 shows a histogram for the rmse values pertaining to 8XX . The error

and let row j be dened such that


[X ]j = (6; 4; 5; 6; 3; 6; 5:5; 3; 5;

05; 9):

Without specifying all of the elements in X , it is impossible to calculate an entire memory, but only the entries created from dimensions i and j are necessary for this example. Here, Mij = 8:5, Mji = 17, ij = 4, and ji = 4. Fig. 1 contains a plot of each point in X , along with the lines that form the boundaries for the xed point set of each memory, as before. Note that points x10 and x11 are extreme outliers for this data set and for this example they are presumed to be the result of some sort of noise in the original input set X . This causes the boundaries of the xed point set for the canonical lattice-based associative memory MXX to be much wider than necessary for the rest of the data points. The implications of this widening are twofold. First, points in X can be greatly distorted while still remaining in the xed point set. Second, a distorted point, once pushed outside the xed point set, cannot be brought back as closely to the original as possible because the boundary is too far away. Using the next-to-max encoding operators trimming the outliers, the xed point set is more descriptive of the remaining data points. Of course, in the event that the outliers are legitimate data points and not the result of noise, it may be optimal to use the canonical encoding memories rather than trimming them. Given their distance from the rest of the points, however, it may be better overall to trim them so that the majority of the points can be brought closer to their original values if distorted. This would improve performance with respect to the distorted input vector, on average, and only decrease performance on the outliers themselves. IV. PERFORMANCE ON REAL DATA In order to quantify the improvement that order-statistics-based encoding can have in the presence of outliers in the initial inputs, an ex-

1678

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 10, OCTOBER 2009

the behavior of the original lattice-based associative memories is still available. Choosing the canonical operators, then, constitutes a choice to create a xed point set wide enough to encompass all of the initial data points. This may be recommended for some data sets, but might not be optimal for all cases. Order-statistics-based memories provide the operator with the choice to remove outliers as necessary.

REFERENCES
[1] Y. Abu-Mostafa and J. S. Jacques, Information capacity of the Hopeld model, IEEE Trans. Inf. Theory, vol. 31, no. 4, pp. 461464, Jul. 1985. [2] J. Hopeld, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci. USA, vol. 79, no. 8, pp. 25542558, 1982. [3] B. Kosko, Bidirectional associative memories, IEEE Trans. Syst. Man Cybern., vol. 18, no. 1, pp. 4960, Feb. 1988. [4] D. Ventura and T. Martinez, Quantum associative memory, Inf. Sci., vol. 124, pp. 14, 2000. [5] R. Zhou, H. Zheng, N. Jiang, and Q. Ding, Self-organizing quantum neural network, in Proc. Int. Joint Conf. Neural Netw., 2006, pp. 10671072. [6] D. Ventura and T. Martinez, Quantum associative memory, Inf. Sci., vol. 124, pp. 14, 1998. [7] D. Ventura and T. Martinez, A quantum associative memory based on Grovers algorithm, in Proc. Int. Conf. Artif. Neural Netw. Genetic Algorithms, 1999, pp. 2227. [8] G. X. Ritter, P. Sussner, and J. L. D. de Leon, Morphological associative memories, IEEE Trans. Neural Netw., vol. 9, no. 1, pp. 281293, Jan. 1998. [9] G. X. Ritter and P. Sussner, Associative memories based on lattice algebra, in Proc. IEEE Int. Conf. Comput.Cybern. Simulat., Orlando, FL, Oct. 1997, vol. 4, pp. 35703575. [10] P. Sussner, Morphological associative memories, IEEE Trans. Neural Netw., vol. 9, no. 2, pp. 281293, Mar. 1998. [11] G. Ritter and P. Gader, Fixed points of lattice transforms and lattice associative memories, in Advances in Imaging and Electron Physics. New York: Academic, 2006, vol. 144, pp. 165238. [12] S. Haykin, Neural Networks a Comprehensive Foundation. Englewood Cliffs, NJ: Prentice-Hall, 1999. [13] R. Yager, On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. Syst. Man Cybern., vol. 18, no. 1, pp. 183190, Feb. 1988. [14] R. Yager, Families of OWA operators, Fuzzy Sets Syst., vol. 57, pp. 125148, 1993. [15] J. R. Quinlan, Combining instance-based and model-based learning, in Proc. 10th Int. Conf. Mach. Learn., Amherst, MA, 1993, pp. 236243. [16] D. N. A. Asuncion, UCI Machine Learning Repository, 2007 [Online]. Available: http://www.ics.uci.edu/~mlearn/MLRepository.html

Fig. 4. Next-to-max encoding error.

is substantially lower for 8XX because the single distorted vector in the memorized data set widened the xed point set of the canonical memory which, in turn, reduced the effectiveness of the memory for resolving distortions during decoding. V. CONCLUSION The nonlinear lattice-based associative memory introduced by Ritter et al. was analyzed as a piecewise linear system using a particular subset (the min and the max) of the set of ordered statistics, which is in turn a subset of the set of all OWA operators. This piecewise linear viewpoint allows for the extension of the lattice-based model to include other order statistic operators, and other OWA operators in general, beyond the min and the max. Some of these memories, only briey mentioned here, dene complex dynamical systems that are not well behaved in the minmax sense. A method for reducing the effect of outliers in the initial data set was described using the general ordered statistic operators for encoding memories and canonical minmax operators for decoding. Though the encoding and decoding operators lacked the duality property of the canonical memories, order statistics encoding was shown to provide the ability to choose the width of the xed point set. This ability, in turn, provides robustness to outliers in the initial data set. Since the canonical memories are a subset of the more general order-statistics-based memories,

You might also like