Thesis 2

UNIVERSITY OF SUSSEX
METAL DETECTION USING NEURAL NETWORKS

A THESIS SUBMITTED TO THE
DEPARTMENT OF ENGINEERING
FOR THE DEGREE, MASTER OF PHILOSOPHY

BY
STEVE FIELDING

BRIGHTON, EAST SUSSEX
SEPTEMBER, 1993

TABLE OF CONTENTS
Page
Abstract ................................................................................................................... iv
ACKNOWLEDGEMENTS.................................................................................... vi
CHAPTER 1 INTRODUCTION............................................................................ 1
CHAPTER 2 APPROACHES TO PATTERN RECOGNITION .......................... 3
CHAPTER 3 INDUSTRIAL STRENGTH METAL DETECTION - THE
ART OF CONTAMINANT DETECTION........................................................... 13
3.1 Introduction...................................................................................................... 13
3.2 Metal Detector Construction and Theory......................................................... 13
3.3 Detecting Foreign Bodies Within a Product .................................................... 16
3.4 Reliable Contaminant Detection ...................................................................... 18
CHAPTER 4 DESIGNING AN EXPERIMENTAL SYSTEM FOR SIGNAL
CLASSIFICATION............................................................................................... 20
4.1 The Pattern Recognition Problem.................................................................... 20
4.2 Pattern Recognition Using Neural Nets .......................................................... 21
4.3 Data Compression by Optimal Transformations ............................................. 24
4.4 Time Shift Invariant Pattern Recognition ........................................................ 30
4.4.1 Using a Hopfield Net to obtain Shift Invariant Coefficients ........................ 33
4.5 The WISARD Neural Net ................................................................................ 38
4.5.1 An Introduction to the WISARD Network................................................... 38
4.5.2 Multi-Discriminator System......................................................................... 42
4.5.3 Analysing Potential Performance.................................................................. 43
4.5.4 Advantages and Disadvantages of the WISARD net.................................... 45
4.5.5 Using the WISARD network with signals .................................................... 47
4.6 Summary .......................................................................................................... 47
ii

CHAPTER 5 A SIGNAL CLASSIFICATION SYSTEM OPERATING ON
STORED DATA.................................................................................................... 49
5.1 Introduction...................................................................................................... 49
5.2 Final Format of Optimal Transform................................................................. 49
5.2.1 Using a Correlation Matrix instead of a Covariance Matrix......................... 50
5.2.2 Choosing a Standard Transform for Data Compression............................... 52
5.2.3 Jacobi Plane Rotations as a Fast Transform Technique................................ 60
5.2.4 Difficulties in Permuting the Input Vector ................................................... 62
5.3 Pattern Recognition Using a WISARD Network............................................. 64
5.4 An Experimental System for Metal Detector Signal Classification ................ 68
5.5 Results and Analysis ........................................................................................ 70
5.6 Summary .......................................................................................................... 80
CHAPTER 6 CONCLUSION............................................................................... 81
REFERENCES....................................................................................................... 83
APPENDIX A TABLES....................................................................................... 86
APPENDIX B C CODE LISTINGS..................................................................... 87
APPENDIX C ASSEMBLER CODE LISTINGS.............................................. 129
iii

Abstract

The purpose of the work completed is to provide an improved method for
differentiating industrial products contaminated with metal, from
uncontaminated products.
Graseby Goring Kerr, the sponsoring company, currently produce
industrial metal detection equipment which provides reasonable detection
capabilities, but the actual signal detection process is a simple amplitude
thresholding system, and leaves a lot of scope for improvement.
A neural network approach is used for signal classification; this was
placed in context by reviewing other methods of pattern recognition. Neural
networks provide the best approach to this problem because of their ability to
cope with corrupted patterns, and the fact that no prior assumptions need to be
made about the form of the signal. A WISARD net, which is a simple RAM
based net, was used because of its relative simplicity and its ease of training.
Choosing to use a particular pattern recognition technique cannot be done
in isolation of the pattern type and the pre-processing that is to be used. A new
fast optimal transform was used to pre-process the quantised sampled signal to
provide a small number of coefficients for use as input to the WISARD net.
Reduction in data size means that the net can be smaller and work faster, both
important when the net is implemented in real time on cost sensitive hardware.
Tests were carried out on stored data using programs running on a PC.
This proved that improved detection of contaminated products is possible.

iv

I hereby declare that this thesis has not
been submitted, either in the same or
different form, to this or any other
University for a degree.

Signature :
v

ACKNOWLEDGEMENTS

I am grateful for the help and guidance, I have received from the
following university staff; Habib Talhami, Paul Gough, and
Mike English. A Special thanks to Graseby Goring Kerr, the
sponsoring company, provider of time and resources. Finally, I
would like to express my gratitude to my wife Sue, for behind
the scenes support.
vi

CHAPTER 1

INTRODUCTION
First a word or two about metal detection. In the context of this thesis,
metal detection is taken to mean industrial metal detection; that is the detection
of metal contaminants within a mass production industrial environment.
Usually this takes the form of product on a conveyor belt, passing through a
metal detector, and being rejected if contaminated.
There are two points to make about industrial metal detection. First of all
that it must be reliable, since it is unsupervised in its operation and false rejects
mean reduced profits; this is dealt with by a review of industrial metal
detection, and consideration of a future real implementation. Secondly if
ultimate performance is to be the goal then the signals that are to be classified
will have a low signal to noise ratio, thus affecting the pattern recognition
technique to be applied; this is covered by a review of pattern recognition
techniques, and the reasoning behind the selection of neural networks.
Neural networks as a solution to a pattern recognition problem cannot be
considered without the form of pre-processing to be used. In fact the choice of
pre-processing used often controls the type of neural net that can be used and
the eventual success of this network. A new optimal fast transform is used as a
pre-processor to the WISARD net. Optimality, in the sense of reduction in the
size of data without appreciable loss of information, allows use of a smaller net
with consequently less memory and less time required to train and recall from
the net. The new transform is constructed from separate transformations
applied in series to form a new transform. Only transformations capable of a
fast implementation are used, thus the new transform has a fast implementation,
1

and gives a useful reduction in processing time and memory requirements in its
implementation. The design of new optimal fast transforms in general is dealt
with in chapter four and the form of the actual transform used in the final
system is covered in chapter five.
The final system uses a WISARD net, and a new optimal fast transform
constructed from the real discrete Fourier transform and four stages of Jacobi
plane rotations. Metal signals are stored on computer disc and used as input to
the computer implementation of the system. Results from the tests were
encouraging. The metal detector system upon which the tests were completed,
was able to detect a 2.5mm diameter stainless steel sphere when a low grade of
digital filtering was applied to the signal, and a final detection capability of
1.25mm stainless steel when a highly tuned adaptive filter was applied to the
signal. The tests indicated that a stainless steel sphere with a diameter smaller
than 1.25mm could be detected by the new neural net based system.
2

CHAPTER 2

APPROACHES TO PATTERN RECOGNITION

The different approaches to pattern recognition [24] are illustrated in
figure (1), along with their relationships. The rest of this chapter will detail
these approaches. It must be noted that these techniques are not mutually
exclusive, and they can be applied in combination in accordance with their
respective capabilities.

Pattern
Recognition
Syntactic
Methods
Grammar
Grammar
Attributed
Error Correcting
Primitive Embedded
Parsing
Neural
Feature
Models
Template
Matching
Networks
Expert
Systems
Relaxation
Methods
Decision
Theoretic
Chomsky Hierarchy
of Formal
Language Grammars
Stochastic
Grammar
Weight
Based
Non Weight
Based

Figure (1) Different approaches to pattern recognition.

Template matching is the simplest and earliest method used for pattern
recognition. Normally the cross correlation function is used to determine how
similar a pattern vector is to a template pattern vector. Pattern vectors are
simply a quantised sampled signal or a sampled quantised image although this
3

is not necessarily the case as shown by General Electric's holographic character
reader, where the light from an object is effectively cross correlated with the
light from a template, without any sampling or quantisation taking place [1].
When correlation is used to match signals the technique can be made to give
scale invariance by using normalised and centralised vectors. This is a fairly
crude capability. It only works for linear scaling and does not extend to parts of
a signal being scaled nor does it extend to image processing where the same
technique gives brightness invariance rather than scale invariance. If scale
invariance is not required, then correlation can provide an exact match by
autocorrelation of the difference vector. That is the difference between the
template and the test pattern.
Instead of the pattern vector merely being a sampled version of the signal
or image, features can be extracted from the pattern, for example edges and
corners from characters, and these used to form a new feature vector. The
vector can vary in complexity, but in the simplest case it is a binary vector
representing the presence or absence of each feature. By using features, more
invariance to size, corruption and other variables can be achieved and also a
reduction in pattern vector size and processing time, but with the disadvantage
of requiring more knowledge of the pattern to enable correct definition of the
features and more time required for pre-processing of the pattern to extract the
features [1].

4

+
x1
x2
-
A
B

Figure (2) Decision function for two pattern class distributions.
The decision theoretic approach uses pattern vectors in the same way as
feature modelling and template matching, but the correlation function is
replaced by a more sophisticated matching process. Consider a simple two
dimensional vector and two classes of pattern (see figure (2) ). Given the
distribution for each class it is possible to derive a function by inspection which
will separate these two classes. In this case the dividing function is simply a
straight line but if necessary a more complex function can be used. The
equation defining the separating line,
d x x w x w x w ( , ) 0 = +
1 2 1 1 2 2 3
+ =
(where w1, w2, and w3 are constants) can be used to determine which class a
pattern belongs to, by inserting the appropriate values of x
1
and x
2
into the
equation and using the sign of d(x
1
,x
2
) to determine the class. A positive sign
indicates class A, a negative sign indicates class B and a zero result indicates a
point on the line, which is indeterminate in terms of classification. These
concepts can be extended to M pattern classes and N-dimensional vectors by
using M decision functions each with N weighting coefficients. The decision
theoretic approach deals with the algorithms that are required to choose these
weighting coefficients so that all the pattern classes are maximally separated.
5

The decision theoretic or statistical pattern recognition can be a useful
tool for the analysis of data measurements even if it is not used to implement
the actual classifier [32] [33]. As the name statistical pattern recognition
implies, a lot of statistical information can be ascertained from measured data,
along with an idea of the theoretical performance limit of a real classifier.
If we take a simple two class problem, what we need to know, given a
vector x, is whether x belongs to class A or class B. We can attach a vector
conditioned probability to the data, P(w
i
| x). That is the probability of class i
given vector x. The decision as to whether vector x belongs to class A or class
B can be made from the following rule. If P(w
a
| x) is greater than P(w
b
| x)
then choose class A, else if P(w
b
| x) is greater than P(w
a
| x) choose class B.
To arrive at these vector conditioned probability distributions we can use
Bayes' rule.
P w
p w P w
p
i
i i
( | )
( | ) ( )
( )
x
x
x
=
Where p(x | w
i
) is the class conditioned probability density function (i.e. the
probability of vector x given class i), P(w
i
) is the a priori class probability, and
p(x) is the unconditional probability density function. If we are simply trying to
find the largest P(w
i
| x) then p(x) is constant and can be ignored. If we assume
each class is equally likely then the term P(w
i
) can also be ignored. Now we
only need to find p(x | w
i
) in order to develop a classifier.
If a function can be found to fit the class conditioned probability
distribution (usually the Gaussian or Normal distribution) then a small fixed
number of parameters can be used along with the known function to
completely describe p(x | w
i
). These are known as parametric methods.
Alternatively if no function can be found to fit the distribution, then non
parametric methods must be used.
6

Using the parametric approach, and assuming a Gaussian distribution
leads to the most mathematically tractable solutions, and this is the most
common distribution, because all processes that are influenced by a large
number of factors approach the Gaussian distribution. It turns out that the
Gaussian distribution can be completely parametrized by the mean vector, and
covariance matrix.
( ) ( ) ( )
(
=

x x x
1
2 1 2
2
1
exp 2 ) | (
T d
i
w p
Where is the square covariance matrix, and is the mean vector.
From this equation it is possible to develop linear discriminant functions
for the two classes, and if these are equated we have an equation describing a
hyper plane decision region boundary between the two classes. Figure (2)
simply depicts the case of a two dimensional vector, with a consequent
decision boundary described by a straight line.
However, real world processes do not always have Gaussian distributions,
or any other known distribution, and in these cases non-parametric methods
must be used to determine the discriminant functions. These fall into two main
categories; those that estimate p(x | w
i
), and those that estimate P( w
i
| x)
directly. A method that falls into the second category is the nearest neighbour
classifier. The guiding principle of the nearest neighbour classifier is quite
simple: samples that are close in feature space are likely to belong to the same
class. All that is needed is some kind of distance measure between vectors. The
simplest distance measure is the d-dimensional Euclidean distance, which is a
generalisation of the 2-D and 3-D Euclidean distance.
| | ( )
2 1
1
2
,
(
=

=
d
i
il ik e
x x l k J
7

Where J
e
[k,l] is the Euclidean distance from the kth object to the lth object, d is
the dimensionality of the pattern space, and x
ik
is the ith co-ordinate of the kth
object.
All of the statistical methods can give useful information about the
separability of different classes, and consequently they are a useful analysis
tool for application before applying other pattern recognition technique.
From the middle 1960s for about a decade most pattern recognition
research centred around the decision theoretic/statistical approach. Around the
middle 1970s syntactic pattern recognition [27] started to take off, based on
Noam Chomsky's work, in the middle 1950s, on formal language theory. One
of the aims of linguists working in this field, was to use mathematical models
of language to allow computers to interpret natural language thus allowing
translation between natural languages and general problem solving. Although
these hopes were generally unrealised, the methods proved useful in other
areas, one of them being pattern recognition. Gonzalez gives the following
definition of syntactic pattern recognition [2]

"Syntactic pattern recognition is the application of formal language and
automata theory to the modelling and description of structural
relationships in pattern classes"
Starting from primitives (primitives are basic sub patterns within the
pattern, such as arcs and lines) a hierarchical description of a pattern can be
built up, in the same way that in language, characters are used to form words
which in turn form sentences. Only certain combinations of primitive are
allowable, to give a language. The generation of this language is governed by
production rules. Production rules together with symbols ( symbols
representing primitives or combinations of primitives) form a grammar.
Originally Chomsky defined four levels of grammar; unrestricted, context
8

sensitive, context free and regular grammar, listed in order of least restrictive to
most restrictive grammar [2]. Many grammars used in pattern recognition are
of the context free and regular type, partly due to their relative simplicity and
the consequent simplicity of the automata required to parse the input pattern
[2].
It is interesting to note the similarities and differences between the
decision theoretic approach and the syntactic approach. Features used in the
decision theoretic approach are very similar to the primitives used in the
syntactic approach. The difference is that primitives are always sub patterns
where as features may be sub patterns or any numerical measurement ( e.g.
signal peak values). The syntactic approach is most useful when the pattern is
rich in structural information, allowing both classification and description of
the pattern. These capabilities are utilised to the full in scene analysis, the
classic application for syntactic pattern recognition. If the decision theoretic
approach were used for scene analysis the number of pattern classes would
rapidly get out of control, as different classes would have to be assigned for
every combination of scene. The syntactic approach makes it possible to split a
scene into its constituent parts and classify each part separately.
Syntactic pattern recognition is not always the best approach. For instance
in speech recognition phonemes or syllables are often used as the primitives for
a syntactic pattern classifier, but the extraction of these primitives is not simple.
A decision theoretic approach can be used to classify and extract the primitives.
Here we have an example of two approaches being used in harmony to provide
a solution to the problem of speech recognition [3]
Up until the development of attributed grammars, primitives simply
described sub patterns of a larger pattern. If numerical information about the
pattern was required then other techniques such as decision theoretic would
9

have to be used. Attributed grammars combine the facility of features, to
describe numerical information, with the sub pattern information, contained
within primitives; to yield what is termed a total attribute vector. Every symbol
within the vector is accompanied by numerical attributes, and both the
symbolic and numerical information is incorporated in the production rules of
the grammar; so that it is possible to carry numerical information from stage to
stage. This means that numerical information can be relayed to the highest
level symbols, thus indicating factors such as the size of an object.
Real world patterns are rarely as consistent or free from noise as could be
wished for, and this can cause problems for syntactic approaches because
primitives and production rules of the grammar are so rigid in definition and
implementation. The solution would seem to be greater flexibility within the
approach thus allowing for noise and inconsistency, to arrive at what is
probably the most correct classification for a pattern. This is exactly what has
been achieved by error correcting grammar, stochastic grammar and primitive
embedded parsing. Error correcting grammar attempts to correct errors within
the symbol strings by computing the distance between a corrupted string and
known valid strings; then replacing the string by the closest string, measured
in terms of the number of insertions, deletions and substitutions required to get
from one to the other. Stochastic grammars incorporate probability into the
production rules, with the end result that classes which occur more often are
more likely to be selected if there is an error in the symbol string or the
languages for the different classes are not disjoint. Primitive embedded parsing
delays the assignment of primitive classification within parts of a pattern which
are uncertain, until the parts which have been classified are parsed according to
the rules of the grammar. Once this analysis has been completed, primitive
classifications can be assigned, which fit into the context of the pattern.
10

A technique which is similar to error correcting grammar, but which does
not necessarily incorporate the syntactic approach, is relaxation systems. In this
method nodes are used to represent the presence or absence of symbols,
features, or any other facet of the pattern; the trick then is to get the nodes to
alter their state so that in total they represent a valid combination. This is
achieved by checking the consistency of each node's labels with that of its
neighbours. In many ways this is similar to the Hopfield neural net and
McClelland and Rumelhart's interactive activation and competition neural net.
An example of a relaxation system is Davis and Rosenfield's hierarchical
relaxation system for waveform parsing [4]. In this system labels are applied to
segments of a waveform, but these labels are only deemed applicable if they
are consistent with neighbouring segments, and also in combination with other
labels form a valid higher level label. The process is iterated, so that as labels
are applied, the validity of neighbouring labels is re-analysed in light of this
new state and other labels discarded. In turn alterations at one level may cause
changes at another level, and the whole process is iterated until eventually there
is a globally consistent set of hierarchical labelling.
Expert systems are intended to mimic the way a human expert would
approach a problem [5]. The way in which the human mind is able to solve
problems is very complex and not fully understood, but at the highest levels it
is possible to determine how a person has gone about the task of solving a
problem. Needless to say it is an iterative process involving breaking a problem
into a hierarchy of sub problems, forming hypotheses, testing hypotheses and
using the results to re-analyse the problem. Expert systems try to mimic this
behaviour. In the language of artificial intelligence; facts are used to support
goals (e.g. a square has four sides. The goal is to identify a square, and four
sides is the fact), rules are used to determine when goals have been achieved
11

(e.g. IF the shape has four equal sides and the internal angles are all 90 THEN
the shape is a square), and inferences are used to generate new facts and rules
to verify that the goals were met correctly (e.g. IF we assume that the object is
a square THEN the area must equal one side squared).
The neural net approach [6] is one which takes its inspiration from the
human mind at the lowest level by attempting to mimic, in a very crude way,
the behaviour of neurones and their interconnections. All neural networks
require training on examples of the pattern classes which they will be required
to classify, and once trained are capable of recognising patterns which are
similar to the ones they have been trained on (generalisation). Because the
patterns which are presented to a net do not need to be exact, neural nets are
very good at classifying corrupted patterns, and because nets are generally
trained using example patterns no incorrect a priori assumptions are made
about the form of the pattern. This is in sharp contrast to most of the techniques
mentioned previously, which require the identification of features or primitives
within the pattern, and rules, grammars, facts, goals, or inferences to process
the data. Because these assumptions can cause problems when unforeseen
patterns occur, modifications have been made to the techniques, at the expense
of increased complexity, and greater processing power required to implement
them. But neural net techniques are inherently assumption free, and as will be
shown in the next chapter, can be relatively simple and easy to implement.
12

CHAPTER 3

INDUSTRIAL STRENGTH METAL DETECTION -
THE ART OF CONTAMINANT DETECTION

3.1 Introduction
Graseby Goring Kerr manufacture industrial metal detectors, for detecting
metal contamination within a production environment. Food is the most
common produce to be screened for metal contamination and the utilisation of
metal detectors within a food production environment, although not
compulsory, is a requirement in law due to the need for manufacturers to
ensure 'due diligence' within the manufacturing process. A typical metal
detector system comprises of the metal detector search head, a rectangular box
with a slot through it, mounted on a conveyor belt which carries the product
through the slot in the head. If a metal contaminated product is detected by the
search head it is rejected further down the conveyor belt by a reject system
which by means of an air blast, for example, removes the product from the
normal flow.
3.2 Metal Detector Construction and Theory
The sensor part of the metal detector, or the search head, is a metal box
constructed of aluminium or stainless steel, with a tunnel allowing a conveyor
belt to pass through the box (see figure(3) ). Wrapped around the tunnel are the
wire coils used both to generate an electromagnetic field and to pick up any
disturbances in the field. The enclosing metal box serves both to hold the coils
in position and also to keep the generated field from spreading outside the
confines of the box. Obviously the field can appear outside the box through the
13

apertures, this means that any metal objects must be kept a certain distance
from the apertures (especially moving metal objects) to prevent disturbance of
the field. To prevent the coils from moving with respect to the box and
disturbing the field, the box is filled with high density foam.

Figure (3) Metal Detector search head
Figure (4) shows the three coils set within the tunnel. A central coil, the
transmitter coil, is driven by an alternating current of between 30 KHz and 1
MHz, producing an electromagnetic field within the tunnel. Two equidistant
coils on either side of the transmitter coil act as receivers having an alternating
current induced in them by the field which surrounds them. By connecting the
coils in opposition the two EMF's generated within the coils oppose each other
and the final output voltage from the receiver coils is, in theory, zero.
Unfortunately the construction of the coils and box would have to be
perfect for the output to be exactly zero, therefore to set the output to zero
metal and ferrite slugs are added to trim the final output to a voltage
approaching zero.
14

Rx coil
Rx coil
Tx coil

Figure (4) Transmitter and receiver coil arrangement

With no objects present within the tunnel the output from the coils will
remain zero but if a metal object approaches the aperture this will cause a
disturbance in the field which will affect the field surrounding the receiver
coils, but this disturbance will be different for each coil thereby producing an
out of balance voltage at the output. As the metal object passes through the
detector the out of balance voltage will approach a peak when the object is
closest to the first coil, zero when the object is in the centre of the tunnel and a
second peak when the object is closest to the second receiver coil.
It is not only the amplitude of the signal at the output of receiver coils that
varies. The phase of the output signal varies with respect to the original signal
used to drive the transmitter coil. These two varying parameters, phase and
amplitude, are extracted by analogue signal processing electronics to produce
two signals, one representing the amplitude of the 0 and 180 component, and
the other representing the amplitude of the 90 and 270 component. Most
objects are characterised by a constant phase relationship regardless of the
amplitude of the signal. Figure (5) shows typical signals produced by the
passage of a small metal sphere at a constant conveyor speed through the
search head tunnel. In this case the second peak is negative, a negative
15

amplitude represents a 180 or 270 component, hence we can see that there
has been a complete phase reversal as the sphere passes through the centre of
the tunnel.

Resistive signal
Reactive signal
time
Amplitude

Figure (5) 0 and 90 component signals, caused by the passage of a
small metal sphere through the search head.

3.3 Detecting Foreign Bodies Within a Product
Metal is not the only type of foreign body, depending on the product,
there are numerous types of foreign body for example glass, plastic, bone and
stone to name a few. It turns out that metal is one of the most common foreign
bodies due to the fact that manufacturing equipment is metallic in construction,
and conveniently it is also the cheapest to detect. Foreign bodies other than
metal require expensive X-ray equipment to effect detection, making detection
uneconomic for the degree of contamination screening provided.
Materials other than metal can disturb the field within the search head.
Any material which is electrically conductive will affect the field, and this
includes all products containing water or carbon. A lot of food products contain
water, for example bacon, bread and cheese. Baked foods such as bread will
often contain some carbon as a consequence of the production process,
although an excessive amount can be considered as contamination. Sometimes
16

products may deliberately contain metal for example in the case of iron
enriched Corn flakes. A further source of product effect (bulk effect) comes
from products which are very large with respect to the size of the aperture. In
this case the capacitive effect is being altered due to the change in the dielectric
constant of the material between the search head coils.
So in many cases it is necessary to differentiate between product effect
and contaminant effect. If a product has no bulk effect then the product effect is
relatively easy to remove since the signal derived from the passage of the
product through the search head has a constant phase. Removal is carried out
by using either the 0 or 90 component as a reference signal, and using the
other component as the detection signal and removing a fixed proportion of the
reference signal from the detection signal to derive signal devoid of product
effect. This removal process is carried out by manipulating the discretely
sampled detection and reference signals using a DSP microprocessor.
Removal of product effect becomes a little more difficult if the phase of
the signal varies between product passes. This can happen if the product is of a
variable size, shape or consistency. Still, removal is possible because quite
often changes in product effect take place fairly slowly and all that is required
is a control algorithm to monitor the detection signal after phase removal and
vary the proportion of reference signal removed. Even bulk effect can be
removed using a DSP algorithm to remove a template signal from detection
signal, with the synchronisation provided by a photo eye on the infeed of the
search head, to signal the approach of a product. Through the use of some
clever signal processing variable bulk effect can be catered for as well.
Once all product effect has been removed a simple threshold clipping
algorithm is used to determine if metal contamination was present, it is this
17

detection process that can be improved through the use of new techniques
outlined in the following chapters.
When product effect is similar to the effect generated by metal, product
effect removal must be very accurate if removal of the signal of interest is to be
avoided. At the same time product effect removal should be flexible enough to
cope with variations in product effect between product passes. The success
with which these opposing constraints can be satisfied largely determines the
sensitivity that can be obtained.
3.4 Reliable Contaminant Detection
Reliability is the key to successful operation in an industrial, mass
production environment, and is affected by both direct and indirect factors.
Direct factors are consistent detection sensitivity and a low false reject
rate. Manufacturers require a specified detection sensitivity to be attained at all
times that the machine is operational, it is of no use if the metal detector is
specified to attain a very high detection sensitivity but can only achieve this
sensitivity under particular conditions or at particular times. In fact most
manufacturers will trade unreliable detection at a higher sensitivity for reliable
detection at lower sensitivity, and this has considerable bearing on the use of
neural networks for detection purposes as the method is probabilistic and there
is no absolute guarantee of detection for any size of metal contaminant.
A low false reject rate is particularly important in a mass production
environment where metal detectors are expected to operate unsupervised.
Although products that are falsely rejected can often be retrieved and passed
through the metal detector a second time this requires additional personnel,
added cost and added risks, so usually a rejected product is discarded and a cost
incurred. Keeping costs to a minimum and maximising output are at the top of
a manufacturer's list of priorities (below safety one would hope). Consequently
18

a low false reject rate is very important, but without jeopardising the sensitivity
to metal contamination. Using neural networks for detection means there is
uncertainty as to whether a product will be falsely rejected but the same can be
said of deterministic detection methods, because noise is the cause of false
rejects and the nature of noise is often probabilistic. Some experimentation will
be necessary to determine how the performance of a neural network based
detection method compares to the performance of the strictly deterministic
threshold detection method.
Indirect factors affecting reliability are the ease and speed with which the
metal detector can be initially set up and the ease with which adjustments can
be made during normal operation. If it is a long complicated task to set up and
operate the metal detector then mistakes are more likely to be made with
consequent reduction in reliability of operation. This is especially the case, if
these operations are carried out by untrained, on site staff. Unfortunately some
of the methods that are described in the following chapters can require a
considerably lengthy setting up procedure, and this will have to be reviewed to
ascertain whether there is any way of keeping set up and operation procedures
to a minimum.
19

CHAPTER 4

DESIGNING AN EXPERIMENTAL SYSTEM FOR SIGNAL
CLASSIFICATION

4.1 The Pattern Recognition Problem
Neural networks offer several advantages for pattern recognition over
more traditional methods. Let us consider some of these alternatives and assess
their advantages and disadvantages. Figure (6) shows a block diagram of a
decision theoretic pattern classifier with a pre-processor input. By partitioning
the feature space it is possible to classify each pattern presented at the input.
Of course, the complexity of this partitioning may vary and also the complexity
of the feature extraction. Suitable choice of features requires expert knowledge
if arbitrary selection is to be avoided. Decision theoretic recognition may be
taken one step further to syntactic pattern recognition. Here the symbol
primitives are parsed according to some pre-defined grammar production rules.
In this case even more expert knowledge is required if a good choice of
grammar is to be made.

Input
pattern
pre-
processing
Pattern
space
Feature
extraction
Feature
space
Classification
Classified
pattern
Fig. (6). Decision theoretic pattern classification.

The problem with all these methods is that they do not cope well with
unforeseen variations in the input pattern, for example variations due to noise.
Neural networks overcome this problem [7]. Although a neural network
20

requires a broad cross section of patterns or signals from each class that it is
required to classify, it is able to correctly classify signals or patterns which it
has not previously encountered. This ability, generalisation, is a most useful
attribute.
4.2 Pattern Recognition Using Neural Nets
Neural networks have been around for a long time, but recently there has
been a resurgence in interest and a proliferation in variety and application [27-
31].

i0
i1
i2
w0
w1
w2
O

Fig. (7). Single neural node.

Fig. (7) illustrates the basic form of a neural node. Each input to the
node is multiplied by a weight. These weights take on the values of zero to one
and have the effect of attenuating the inputs to the node. The output is simply a
summation of all the weighted inputs.
O i w i w i w = + +
0 0 1 1 2 2

This is the basic form of the neural node, but there are many variations,
such as bounded outputs, Boolean inputs, Boolean outputs, probabilistic output
functions, etc. Neural nodes can be formed into networks of nodes and these
networks trained to recognise patterns. Once trained on a particular pattern the
neural network may be used to recall or classify patterns.
Many different configurations are possible for the interconnection of
neural nodes. When using neural networks to classify signals, some thought
must go into the representation of the signal and the appropriate choice of
21

neural net configuration. According to Maren [8], the main ways in which
temporal information may be represented by neural network are these:-

"
Create a spatial representation of temporal data.
Put time delays into the neurons or connections.
Use back connections so that a network can create a temporal signal
sequence.
Use neurons with activations which sum inputs over time.
Use both short term and long term synaptic connections.
Use frequency-encoding for presenting data to nets and for networks
operations.
Use combinations of the above."
Of these seven different approaches, the first is most well known and
easiest to implement.

a) b)
c)

Fig. (8). Neural network configurations a)spatial
b) time delay c)recurrent.
22

Fig (8) shows neural networks configured for spatial input data, time
delay configuration and a recurrent neural network. The review of Maren's
notes several researchers who have used spatial representation successfully.
Soucek's review [9] of neural networks applied to real time data, also highlights
the success of using spatial representation.
Spatial representation of the input data is a well tried and easily
implementable method, but it does suffer from certain problems. Windowing
of the data and appropriate choice of window is the main problem. The relative
position of the sampled temporal pattern or signal within the window will
rarely be constant. The more the variation in this shift of the input pattern
within the window the longer it will take to train the neural network. This is
due to the fact that the input signal will have to be presented to the network in
most of its shifted positions if the network is to recall the pattern correctly after
being taught. This problem of shifted input signal and consequently increased
training time for the neural network can be overcome by the use of a pre-
processor to present the neural network with a shift invariant input pattern (fig.
(9) ). Training time can be reduced still further if the pre-processor reduces the
amount of data in the input pattern. In addition to reducing training time, there
will be a reduction in net size and recall time.

Pre-processing Neural Network
Input
pattern
Classification

Fig. (9). Optimal system for pattern classification

Neural networks have several desirable attributes as pattern classifiers,
such as generalisation and immunity to noise. Use of spatial representation of
23

temporal data leads to shifted patterns and increased training times. Training
times can be reduced by the use of a shift invariant pre-processor. Training
times, net size and recall times can be reduced by the use of a data reducing
pre- processor.
The next two sections describe an optimal transform for data compression
and show how the coefficients from this transform can be made shift invariant,
although the development of shift invariance was not followed through to a
final implementation, the reasons for this being stated in the next section.

4.3 Data Compression by Optimal Transformations

Data
Signal
Vector (x)
Output
Vector (y)
Compression

Fig. (10) Data compression
Figure (10) shows a signal vector x which is processed by a general data
compression processor. This data compression processor may take two forms.
Using expert knowledge about the signal an optimum number of features
are used to describe it completely. Features such as slopes and peaks can be
used. If the signal can be characterised by a small number of features (perhaps
just one such as peak amplitude) then the compression factor will be large. If
the signal is more complex then a more complex description is required. If the
signal is complex then a variation of signal may arise from which the correct
features are not extracted by the data compression processor. Also in the case
of 'noisy' or corrupted signals the correct features may not be extracted.
Obviously this is not the ideal pre-processor to a neural network. We want to
apply as few prior assumptions about the signal as possible.

24

Transform
Signal
Vector (x)
Output
Vector (y)

Fig. (11) Data compression using a transform
Figure (11) shows a different method of achieving data compression.
Here a transform is applied to the signal vector X and the coefficients Y
obtained. If the transform is optimal then it will represent the maximum
amount of information about the signal in the fewest number of coefficients.
By discarding the coefficients with the lowest values (that is below some
predetermined threshold value) it is possible to represent the signal in a reduced
number of coefficients. For any given ensemble of signals the optimal
transform for data compression is the Karhunen-Loeve transform (KLT) [10].
It consists of the eigenvectors of the signal covariance matrix. Unfortunately,
the KLT suffers from two problems. Firstly, it is signal specific and requires
the generation of the covariance matrix for the signal ensemble under
consideration. Secondly, there is no known fast representation of the KLT.
Several transforms which are not signal specific and which have fast
representations approach the performance of the KLT for specific signal types.
A new approach which has been suggested [11] is the design of new fast
transforms which approach the performance of the KLT.

P1 Ft P2 Jn
Input
pattern
Output
pattern
P1 and P2 = Permutations
Ft = Fast transform
Jn = n stages of Jacobi plane rotations

Fig. (12). New fast transform for data compression.

25

Fig. (12) shows a block diagram of the optimal fast transform scheme.
The optimal transform consists of several fast stages and is characterised by the
following matrix equation:
Y J P FP X = n 2 t 1
t

Where J
n
denotes n stages of Jacobi plane rotations, P
2
and P
1
are the
permutation transformations, and F
t
is a known fast transform.
Design of the new fast transform proceeds as follows. First, a fast transform
[12] [25] is chosen which is known to be optimal for the signals under
consideration (e.g. fast Fourier transform or fast Walsh transform). This can be
determined from the statistics of the signal or signal ensemble. For signal
vectors of length N, the covariance matrix, can be derived from the ensemble
of signals. This covariance matrix will be an N by N square matrix.

(
(
(
(
(
(
NN
N
C C
C C
C C C
N1
. .
. .
. .
22 21
1 12 11

Once the covariance matrix for the signal ensemble is determined, it can
be used to calculate the permutation sequence and plane rotation sequences
which are required. From the covariance matrix the transformed covariance
matrix may be obtained.
C TCT y x =
Where C
y
is the transformed covariance matrix, T is the transform matrix, C
x

is the original signal covariance matrix and T' denotes the transpose of T.
Every permutation, P1, is tested by applying P1 and FT to the original
covariance matrix.
C FP C P F y t 1 x 1 =
26

The optimal permutation P
1
has been found when the off diagonal energy in
C
y
has reached a minimum. The off diagonal energy E is determined as,
E
y
x
i j
ij
i j
N
ij
i j
N
=
C
C
,
,

where C
y
(i,j) and C
x
(i,j) represent the elements of C
y
and C
x
respectively.
One way in which the optimal permutation which gives minimum E can
be determined is to exhaustively try all possible permutations. This is perfectly
feasible for small sample sizes but for larger sample sizes the number of
permutations rises exponentially with respect to sample size. Another
approach is to use the method of simulated annealing [13]. A good way of
describing simulated annealing is to make an analogy with a bouncing ball
moving around in a hilly landscape (Fig. (13) ). To start off the ball bounces

Fig. (13).Simulated annealing analogy.

around vigorously and moves across all the dips and undulations of the
landscape. As time goes by the bouncing becomes less vigorous and the ball
remains within the lower valleys until the bouncing ceases altogether and the
ball comes to rest in the lowest valley in the landscape. Of course, there is
27

always a danger that if the bouncing decreases too quickly the ball will not
travel across all the valleys and not find the globally lowest valley. Essentially
this is simulated annealing. What we do in the case of the permutations is to
choose a random permutation and calculate the off diagonal energy in the
transformed covariance matrix for this permutation. The new permutation is
accepted with probability p.
( )
|
.
|
\
|
=
kT
E E
p
1 2
exp
E
2
is the off diagonal energy in the transformed covariance matrix. E
1
is the
off diagonal energy in the original covariance matrix due to the last accepted
permutation. K is Boltzman's constant and T is the temperature. If E
2
is less
than E
1
, then p is greater than unity. In this case the permutation is accepted
with a probability of one, that is, a downhill move is always accepted. If E
2
is
greater than E
1
then the probability, that a permutation will be accepted is
between zero and one. So sometimes a permutation which causes an uphill
move is accepted. When T is high, a lot of uphill moves are accepted, but as T
is gradually reduced less and less uphill moves are accepted and the system
gradually relaxes into the globally minimum energy state.

x0
x1
y0
y1

Fig. (14). Signal flow diagram for a single Jacobi plane rotation.
The other major transform technique used in the optimal fast transform
scheme, is the Jacobi plane rotation. The signal flow graph of a single Jacobi
rotation is shown in Fig (14). Each Jacobi rotation links two data points.
y x x
y x x
0 0 1
1 1 0
= +
=
Cos Sin
Cos Sin

28

The decision as to which points to choose and by what angle to rotate them can
be determined from the covariance matrix. Since the Jacobi rotations are
applied after the fast transform and P
1
, we use the transformed covariance
matrix derived from the fast transform and previously determined optimal
permutation sequence. The N/2 largest off diagonal elements from the
covariance matrix are chosen each with different row and column numbers.
The row column index of the elements then corresponds to the sample indices
in the signal. The rotation angle for the two data points is
Cot i j
Cii Cjj
Cij
2
2
( , ) =

where C
ij
is the off diagonal element of C in the ith row and jth column. N/2
rotations make up a single stage of Jacobi rotations. Any number of successive
plane rotations may be added.
The permutation sequence P2 is calculated in a similar way to that for P1,
but the signal covariance matrix is replaced by the transformed covariance
matrix from the previous stage and the fast transform is replaced by the Jacobi
plane rotations.
All the above methods (Jacobi rotations, permutations, fast transform)
have fast implementations, unlike the KLT which has no fast implementation.
But the new fast transform which has been designed still approaches the
performance of the KLT in terms of data compression.
4.4 Time Shift Invariant Pattern Recognition
According to Austin, there are four main methods of achieving shift
invariance along with other invariances such as scale and rotation invariance
[14]. These are:
(i) Training pattern is presented and learnt in all (or most) of its shifted
positions.
29

(ii) On recall the input pattern or the template is shifted until a match
with a previously learnt pattern is found.
(iii) The features of the pattern are stored and a match obtained with
these features.
(iv) The training pattern is stored in a transformation invariant form.
Method (i) is the simplest method. In the pattern recognition system
described earlier one of the reasons for adopting a pre- processor was to avoid
using this approach due to the excessive training times and large neural
networks which are required to support its implementation.
Method (ii) requires a large set of transformation matrices which are
applied to the input pattern to produce a large set of input patterns to be
matched against the stored template. Recall times become very long as each
shifted pattern is tested for a match. More positively, the method is relatively
simple, and also it may be possible that the human visual system works in a
similar fashion, especially if one considers the processes of mental rotation
which take place in order for a person to recognise inverted objects.
Method (iii) relies on an analysis of the pattern to derive shift invariant
features or primitives. Here a stored model is used to describe the pattern.
This model describes the pattern in terms of features and the relations between
these features. The problems with this method are long recall times, and the
development of complex algorithms, with the requirement of expert knowledge
to define the features or primitives.
Method (iv) involves the use of transforms such as the Fourier transform
to transform the input pattern to a shift invariant form. Transform techniques
give sub optimal results if used to recognise an object within a scene containing
many objects, but this is not so much of a problem when using signals rather
than images. Using transform techniques can also lead to a reduction in
30

information content of the transformed pattern when compared to the original
input pattern. For example, if the Fourier power spectrum of a signal is used as
a shift invariant representation of the input signal, then the phase component of
the signal is lost.
Effectively one is losing position information, and this may or may not be
advantageous, for example consider the case of a hand written document.
Whether a character is subscripted can be quite significant, i.e. the position of
the character is significant. If one was looking for rotational invariance then
there would effectively be no difference between a multiplication sign and an
addition sign, therefore a machine which recognised hand written text would
not be very useful if it exhibited shift and rotation invariance.
Another problem is the phenomena known as leakage. This is caused by
the windowing of data, but can be reduced by the use of very large windows,
window smoothing functions [15] (such as the Hamming Window) or the
method of spectral estimation [16].
Despite these problems, transform techniques have one main advantage.
The prime advantage of transform techniques compared to the other three
methods is speed of processing. This is especially the case when one considers
fast implementations of transform algorithms, such as the fast Fourier
transform and the fast Walsh transform.
Transform techniques provide the best solution for a shift invariant pre-
processor used in conjunction with a neural net classifier to classify the signals
described in the introduction.
31

P0
P1
P2
P3
y0
y1
y2
y3
y4
y5
y6
y7
Sq
Sq
Sq
Sq
Sq
Sq
Sq
Sq

Fig. (15). Signal flow diagram of Walsh-Hadamard shift invariant power
spectrum for an 8 point input pattern.
Derivation of shift invariant coefficients from the Fourier transform is
relatively straightforward, each coefficient is simply multiplied by its complex
conjugate to yield a set of shift invariant coefficients. This is the power
spectrum of the original signal, where each coefficient represents a single
discrete frequency. In the case of the Walsh shift invariant power spectrum the
derivation is slightly more complicated and the power coefficients represent
several discrete sequencies [12]. Fig. (15) shows a signal flow diagram for the
Walsh-Hadamard shift invariant power spectrum.
The development of a shift invariant power spectrum for the Karhunen-
Loeve transform is not so easy. There can be no constant definition or
algorithm to describe the derivation of the power spectrum as the KLT is signal
dependent and has no constant form. Since the KLT is signal dependent so is
the shift invariant power spectrum. A similar problem exists with the optimal
fast transforms. These new transforms require new algorithms for the
derivation of shift invariant power spectra. The next section describes a
possible method for determining these new definitions.

4.4.1 Using a Hopfield Net to obtain Shift Invariant Coefficients
32

Before describing the solution of the shift invariance problem a little
background on the Hopfield net may be useful [17]. Hopfield uses the term
energy to measure the extent to which a network is satisfying the constraints,
the energy of the system decreasing towards an energy minimum, but here the
term goodness is utilised, as defined by Rumelhart and McClelland [7]. When
using goodness to define the state of the system, the total goodness of the
system increases until a goodness maximum is reached.

A B
W(AB)
W(BA)
Ext(B) Ext(A)
W(AB) = W(BA)

Figure (16) Two node Hopfield net.
Consider two neural nodes, as shown in figure (16), both connected to
each other and with one external input each. Note that the weights are
symmetrical, this is an important feature of Hopfield nets. Weights are set
before the net is run and are not altered during the running phase, only the
activations are altered. The magnitude and sign of the weights define the
activations required at the nodes to satisfy the constraint. If the weight between
two nodes is positive then the greater the activation of each node the better they
satisfy the constraint. If on the other hand, the weight between the two nodes is
negative then the less the activation of each node, the better they satisfy the
constraint. The priority of the constraint is determined by the magnitude of the
weight. The degree to which node 'A' satisfies the constraints and contributes to
the overall goodness is defined by the following equation.
( )
A A B AB A
a ext a w + = goodness
33

(a
B
denotes the activation of node A) In general, for any fully connected
network of nodes.
+ =
=
j
i j ij i
i i i
ext a w net
a net goodness

When the network is run, the activations of the individual nodes are
updated asynchronously with the aim of increasing the goodness for each node.
This is done by increasing the activation of a node, if the net input is positive,
and decreasing the activation, if the net input is negative; using the following
equation to determine the magnitude of the change.
( ) ( (t) a net (t) a t a
i i i i
) + = + 1 1
Note that this update equation automatically prevents the activations of the
units from increasing indefinitely by constraining them between zero and one.
Now let us see how this net can be used to solve the problem of shift
invariance. If coefficients generated by the new transform are squared and
added together in a particular combination then some degree of immunity from
shifts in the input pattern should be possible; the exact combination being
determined through the use of a Hopfield neural net. There is no mathematical
proof for the feasibility of this approach. Indeed it was intuition that dictated
the line of research that was followed, based upon the prevalence of shift
invariant power spectra, within the literature, which use squaring and adding of
coefficients. Admittedly the non-sinusoidal and non-cosinusoidal transforms
have shift invariant power spectra generated by the use of additional arithmetic
manipulations but the new transform which is being developed is based on
sinusoids and cosinusoids. In the end, this line of research was not followed up
due to the difficulty in analysing the results mathematically.
Starting with an input signal which contains all the frequencies which are
representable within that input, for example an N point input can represent N/2
34

different frequencies, transform this input signal to obtain the coefficients,
circularly shift the input signal and obtain the new coefficients, repeat this for
every possible shifted position, storing the results at each stage. Now that we
have coefficients for all shifted positions it is possible to calculate the variance
of the power p for any given pair of coefficients, over a complete circular shift.
The power is calculated using the following simple equation.
p y y t k = +
2 2
l
Where p
t
is the power at shift t and y
k
and y
l
are the kth and lth coefficients
respectively. By fixing k and l and taking the value of p for all shifts we can
obtain the variance of p, for a given value of k and l.
p
N
p p p p = + + +
2 2 2 2
...
0 1 2

The calculation of the variance is repeated for all pairs. In total there are
n
p
pairs.
( ) ( ) 1 2
2
+ = N
N
np
Where N is the number of transform coefficients.
We now have results which indicate the pairs that have lowest variance
for all shifted positions, but only certain combinations of these pairs are valid.
Picking the combination of pairs which gives the lowest total variance, yet is
still a valid combination, is a task which is ideally suited to a Hopfield neural
net [17], or the Constraint Satisfaction model as it is called by McClelland and
Rumelhart [7]. Given conflicting constraints this type of network is able to
determine the maximum goodness of fit to these constraints. Hopfield has used
an analogue implementation of this net to solve the Travelling Salesman
Problem and it turns out that the configuration of network required to solve the
shift invariance problem is very similar to the network configuration used by
Hopfield.

35

Coefficient
1
2
3
4
Pair 1 Pair 2
A B C D

Figure (17) Hopfield net configuration, for a four coefficient, shift
invariance optimisation.
The net configuration shown in figure (17), is capable of solving the shift
invariance optimisation problem, SIOP, for four coefficients and as can be seen
this requires a total of sixteen nodes.
With this type of network the weights for the connections between nodes
are defined by the constraints which must be satisfied. In this case we are
looking for pairs of coefficients which when combined will yield the lowest
overall variance, so what we do is set the weights between nodes to reflect the
amount of variance contributed by each pair combination individually, then
allow the system to cycle to find the globally best solution. Taking node 1A in
figure (17) as an example, negative connections are made between 1A and 2B,
1A and 3B, 1A and 4B, to represent the degree of variance of that particular
pair combination. The greater the variance the more negative the weight
becomes. Of course, coefficient 1 cannot be combined with coefficient 1 and so
there is a negative connection between 1A and 1B, 1A and 1C, 1A and 1D.
Since there can only be one coefficient in column A, there are negative
connections between 1A and 2A, 1A and 3A, 1A and 4A. All of these weights
are negative and on their own would cause the activations of all the nodes to
36

become zero. To counter this all the nodes have a positive external input which
is used as a bias.
One of the problems with the Hopfield net is deciding on the relative
magnitudes of the weights and biases. If these relative magnitudes are wrong
then the system will not settle into a valid state, but there seems to be no
method for deciding in this case what the relative magnitudes should be. In fact
Hopfield encountered the same problem when trying to solve the Travelling
Salesman Problem although he professes in his paper that the parameters were
easily found.
"In our simulations, an appropriate general size of the parameters was
easily found, and an anecdotal exploration of parameters was used to find a
good (but not optimised) operating point."
Disadvantages of this technique are the large number of calculations
required to generate the variances for all the pair combinations, since the
number of pair combinations is in the order of N
2
, where N is the number of
coefficients, the number of calculations rises rapidly with signal length. The net
size can get rather large as the number of nodes required in the net is N
2
, this
leads to larger memory requirements and also greater computational load in the
simulation. Finally as stated before, it was felt that the results from this
technique needed to be analysed mathematically and this was not possible due
to lack of mathematical skills.
4.5 The WISARD Neural Net

4.5.1 An Introduction to the WISARD Network
A WISARD neural network [18] is a simple RAM based system which
can be used to classify input patterns that have been corrupted by some non-
deterministic process. This ability relies on the fact that the network has been
previously trained on a training set, representative of the different classes of
37

pattern that must be recognised. One of the most useful properties of the
WISARD net, in common with other neural network paradigms, is its ability to
correctly classify input patterns which it has not previously seen in the training
set. This capability is called generalisation. In other words, the network has the
ability to generalise.
Learn / Recall
Learn / Recall
Learn / Recall
R1
R3
r/w
r/w
r/w
D0
D0
D0
A0
A1
A3
A0
A1
A3
A0
A1
A3
RAM1
RAM2
RAM3
9 Pixel
Binary Image
R1

Figure (18) WISARD net in training configuration.
To understand how a WISARD net is trained to recognise different
patterns, let us apply a WISARD network to a simple nine pixel binary image
(see figure (18) ). The image is split into Ntuples of size 3, and the bits within
the individual Ntuples are used to address locations within a random access
memory (RAM). A black pixel is interpreted as a zero and a white pixel is
interpreted as a one. Although the Ntuples are mapped onto the rows of the
image this is not the only possible mapping and as we will see later the optimal
38

mapping turns out to be the one where the Ntuple inputs are taken at random
from the input pattern.
Before training may take place all locations within all the RAMs are set
to zero. During the learning phase the RAMs are in the 'write' mode and the
data inputs (R1,R2,R3) are held high. Not shown in figure (18) but also
present in a real system are RAM enable inputs, which would be connected to a
clock thereby co-ordinating training. Ones are stored in the RAMs at the
locations addressed by the binary pattern, hence after several patterns have
been presented at the inputs to the RAMs, they will contain ones at several
locations.
r/w
r/w
r/w
D0
D0
D0
A0
A1
A3
A0
A1
A3
A0
A1
A3
RAM1
RAM2
RAM3
Thresholder
Learn / Recall
Learn / Recall
Learn / Recall
R1
R2
R3
Threshold
Classification

Figure (19) WISARD net in recall configuration.
If during the recall phase (see figure (19) ) a pattern is presented within
the Ntuple which was presented during training, that pattern will address a
39

previously stored '1' which will be output to the summing unit. The summing
unit takes the outputs from all the RAMs, (three in this case) and adds the
outputs together. The output from the summing unit is fed to a thresholder,
which compares its input to a pre-determined threshold value. If the input is
greater than or equal to the threshold value then the thresholder outputs a '1',
indicating a positive identification
Consider a WISARD net trained to recognise vertical bars. The net is first
trained on the three vertical bars shown in figure (20). If one of these complete
vertical bars is presented during the recall phase then all three RAMs would
output a '1' giving a maximum
score of three. Now suppose the
net is presented with the
incomplete vertical bar shown
in figure (20). This would score
two out of three and a horizontal bar would score zero. What these results
indicate is that the net when presented with an incomplete vertical bar, a pattern
which it has not been trained on, gives a score which indicates a lack of total
certainty about this pattern belonging to the class of vertical bars, but instead
indicates a large degree of
confidence in the fact it belongs
to the class of vertical bar. As for
the horizontal bar, the net gives a
score which indicates absolute
certainty that this is not a member
of the class of vertical bars. There is no absolute indication as to what the
pattern could be. An absolute result can be obtained from the system shown in

Figure (21) Three vertical bars.

Figure (20) Incomplete vertical bar.
40

figure(19) by setting the threshold to two. Now the thresholder will output a
'1', when presented with this incomplete vertical bar as the input pattern.
This completes the description of the basic WISARD neural network, the
rest of this section goes on to describe ways in which the performance of the
WISARD net can be improved and tailored to suit individual applications, in
particular the application to the classification of signals from a metal detector.
4.5.2 Multi-Discriminator System
The WISARD type network described in the previous section can be
considered as a discriminator, in the sense that it is able to discriminate
between two classes of object, and this is fine, if one only has two classes of
object, but if there are more than two classes of object, then more than one
discriminator is required. In total, a discriminator will be necessary for each
class that is required to be classified. A system capable of recognising three
different classes is shown in figure (22).

Discriminator 1
Discriminator 2
Discriminator 3
Arbitrator
Classification
Confidence

Figure (22) A Multi-Discriminator WISARD type network
As can be seen from the diagram each discriminator is mapped on to the
input pattern, so that every pixel within the image is connected to each of the
three discriminators. These discriminators consist of RAM nodes and a
summing unit, as described in the previous section, but they do not use a
41

thresholding device on their outputs, instead the output processing is
undertaken by the arbitrator unit. Each discriminator outputs a score dependent
upon the closeness of the input pattern to a previously taught pattern, and these
are used as inputs by the arbitrator, which looks for the discriminator with the
largest score and outputs the number of that discriminator at the classification
output. This gives an indication as to the most likely class that the input pattern
belongs to, but gives no indication as to the closeness of the result. How close
the score R
max
of the winning discriminator was to the discriminator with the
next highest score is given by the confidence output.
C
D
R
=
max

Where D is the difference between the two scores and C is the confidence.
4.5.3 Analysing Potential Performance
It is possible to analyse the potential performance of a WISARD type
network when applied to a particular application [6], thus making it possible to
ascertain whether this type of network will be able to deliver the required
results. Analysis is based on the overlaps between patterns; the greater the
overlap between patterns from separate classes, the more difficult the
classification becomes, until at the point where the overlap becomes complete,
where classification becomes impossible. On the other hand, if one is
considering overlap between patterns within the same class then the higher the
overlap the better. Figure (23) and figure (24) illustrate two cases of low and
high overlap.

42

A1 B1 A1 U B1=U1
(a)
(b) (c)

figure (23) Slightly overlapping patterns

A2 B2 A2 U B2=U2
(a)
(b) (c)

figure (24) Highly overlapping patterns
Consider a single RAM discriminator connected to the input space in
figure (23c). This RAM discriminator has previously been trained on pattern
A1 and now, in the recall mode, is presented with pattern B1. If each
RAM node within the discriminator is randomly connected to this input space
then the probability that any single input to the RAM is connected to the area
of A1 union B1 (the cross hatched area) is equal to the area of union divided
by the total area of the input space. If we make the area of the input space equal
to unity then this probability of connection is simply equal to U1, the area of
overlap.
43

So, let's say we have a WISARD net consisting of three separate RAMs
and each RAM has three inputs. This net is trained on a single pattern and a
single pattern is used to assess recall performance. Now, if the over lap
between training and recall patterns is U1 then the probability that a single
input of one of these RAMs is connected to U1 is exactly U1 (assuming a total
input space of unity again). For a single RAM to fire then all its inputs must be
connected to U1, the probability of this is (U1)
3
. The absolute score for the net
is equal to the probability of a single RAM firing, multiplied by the total
number of RAMs, i.e. 3(U1)
3
. For any net with K RAMs and N inputs per
RAM, the absolute response is
( )
N
U K r 1 =
From this result it is possible to see the effect of varying the Ntuple size
N. The larger the value N the lower the absolute response r (remember U1 is a
probability of less than unity). Although the absolute response is lowered by
increasing the Ntuple size, if we compare the response of the net to two recall
patterns with overlaps U1 and U2 respectively, then the relative difference in
absolute response for the two overlaps will increase with respect to N. This is
how it is possible to increase the discrimination of a net by increasing the
value of N. Given two typical patterns from two separate classes of pattern, it is
possible to assess the value of N required to give correct classification. One
needs to be very careful when using this technique as it is not always possible
to find a typical pattern, but in most cases it will at least enable one to assess
the feasibility of using a neural network for classification of the patterns.
4.5.4 Advantages and Disadvantages of the WISARD net
When compared to other types of neural network the WISARD net offers
several advantages. When comparing the different types of net, it is assumed
that the networks are being simulated on a digital computer. Although some of
44

the weight based paradigms can be implemented using analogue circuitry,
offering very real speed advantages, this can severely restrict the size and
flexibility of the network.
Indeed, most other neural net paradigms use analogue weights to bias the
inputs to the net, and these weights must be calculated during the learning
phase. Thus the learning time and processing power required are quite
considerable, compared to the WISARD net, which stores data to Random
Access Memory during the learn phase. Compared to the back propagation
learning method for multi layer nets, which requires several iterations of weight
modification, the WISARD learning method is simple and quick.
Recall is quicker using a WISARD net, as compared to weight based nets
where multiplication of the input pattern by weights requires greater
computational effort than the simple RAM read which is required using a
WISARD network, although some of the speed advantages are lost when using
a simulation, due to the inherent serial processing nature of a digital computer.
Nevertheless, advantages are to be had due to the fact that integer arithmetic
can be used, and a lot less arithmetical manipulation is required, compared to
the numerous floating point arithmetic manipulations that must be employed
when using weight based nets.
A WISARD network has very similar properties to the McCulloch and
Pitt's Single Layer Perceptron [6]. Like the Perceptron it has a certain amount
of immunity to noise in the input pattern and provides the property of
generalisation, but also like the Perceptron it is unable to solve so called hard
learning problems such as parity. Depending on the complexity of the
difference between different classes of patterns this may or may not be a
problem.
45

4.5.5 Using the WISARD network with signals
The WISARD net was first used with binary images, to which it is ideally
suited, due to the fact that the binary pixel elements correspond exactly to ones
and zeroes, which is the format of data stored within RAMs. By taking the
binary value of each sample of a discretely sampled analogue signal, we have
the data in a format suitable for using with a WISARD type net. If one
considers a temporal sequence of k points taken from a continuous stream of
data, with each sample from the sequence encoded in an N bit binary format
then this is to all intents and purposes equivalent to a k by N binary image.
Points or pseudo-pixels from this bit-time binary image can be randomly
mapped to the inputs of the RAMs which form the discriminators of the
WISARD net.
4.6 Summary
Many classification techniques require expert knowledge of the pattern,
and may also require assumptions to be made about the pattern, which at a later
date may prove to be untrue. Neural net based classification has been chosen
because this technique makes no prior assumptions about the form of the
pattern and consequently copes better with unforeseen variations in the input
pattern. This is further enhanced by the capability of neural nets to recognise
previously unencountered patterns, and to generalise away from the strict
pattern set that has been presented to it during training.
The choice of the type of neural net to use is to some extent governed by
the form of the data used as input to the net. Output from an optimal transform
is chosen as input to the net. This gives a spatial representation of the signal,
which is desirable because it makes implementation of the neural net easier,
and it is also the most common approach, with a lot of examples available, and
a wealth of accumulated experience. Optimal transformations generate a small
46

number of optimal coefficients, ensuring that the neural net can be of reduced
size; which in turn leads to shorter training and recall times and lower memory
requirements. The spatial data set of reduced size is used as the input to a
WISARD net. WISARD nets can be well suited for the classification of
spatially represented data, offering the advantages over weight based nets of
ease of implementation, and lower training and recall times.
47

CHAPTER 5

A SIGNAL CLASSIFICATION SYSTEM OPERATING ON STORED
DATA

5.1 Introduction
In this chapter a description is given of an implementation of a signal
classifier operating on stored data. Techniques discussed in the previous
chapter are implemented as C code programmes, with a pseudo code based
description of the algorithms. Some of the implementation differs from the
format discussed in the previous chapter, and where this is the case, a full
explanation is given for the reasons, and a description of the way in which the
techniques have been altered. Finally, the results from tests carried out, on
stored metal detector signals, are displayed, documented, and analysed.
5.2 Final Format of Optimal Transform
A new fast transform was reviewed in chapter 4. This new fast transform
consisted of a permutation, a known fast transform, another permutation and
finally a series of N Jacobi plane rotations [11]. The final system uses a known
fast transform and the Jacobi plane rotations leaving out entirely the
permutation phase of the optimisation due to the computation time required to
calculate the optimal permutation. Performance of the new transform is
calculated using the off diagonal energy in the correlation matrix as a measure
of the transform's ability to transform the signal into the smallest number of
uncorrelated coefficients. This performance measure is used to choose the most
optimal standard fast transform, as well as the optimal configuration for the
Jacobi plane rotations.
48

5.2.1 Using a Correlation Matrix instead of a Covariance Matrix
If the 128 point signal vectors collected from the metal detector are
arranged as the rows of a matrix x, then we can obtain the correlation matrix
for these signals,
Cor x x x =
T
,
where Cor
x
denotes the 128 by 128 symmetric correlation matrix and T
denotes the transpose operation. What this matrix effectively gives us is the
correlation between every vector pair. The elements on the leading diagonal are
the correlation between a vector and itself or in other words the leading
diagonal elements are a measure of the autocorrelation of a vector. Of course
the autocorrelation of an unshifted vector is always non zero and positive for a
non zero vector, to get some idea of the relative magnitude of the
autocorrelation with respect to the correlation with other vectors it is necessary
to normalise the matrix by dividing every element by the trace,
t x x =
=
ii
i
N
1

If the diagonal elements are large and the off diagonal elements are small then
the vectors are highly uncorrelated with each other. On the other hand if the
diagonal elements are small and the off diagonal elements are large then the
vectors are highly correlated.
The covariance matrix is generated from the correlation matrix by the
subtraction of the mean from the correlation matrix [19] [26],
Cov Cor m m x x x x =
T

where,
m x x
j i
i
M
M
=
=
1
1
j

49

,mx is the row vector denoting the mean of matrix x, and M is the ensemble
size.
When signals within an ensemble are relatively similar then by using the
covariance matrix it is possible to generate an optimal transform which is
optimised for the small differences between these signals, compared to the use
of correlation matrix where the actual constant form of the signal as well as the
variations in the signal must be encoded within the transform coefficients. But
there is a penalty to be paid for this increased efficiency. Before the transform
can be applied to a new signal the previously calculated mean vector must be
subtracted from the new signal vector,
( ) x m x A y =
where y is the vector of optimal coefficients and x is the signal vector to be
transformed. In a real implementation this imposes overheads when using the
transform; extra memory required to store the mean vector, and the extra stage
of arithmetic manipulation required for subtraction of the mean vector.
If the transform is to be optimised for signals that vary quite significantly,
then optimising for the covariance matrix rather than the correlation matrix is
not so advantageous, because now both approaches will result in a transform
which is optimised to represent the form of the signal and the variation between
signals. In this particular application the transform must be applied to the class
of signal where no metal contaminant is present and also the class of signal
where metal contamination is present. These two classes of signal vary
significantly in their form, so there is very little disadvantage in using a
transform optimised for the covariance matrix. Now the transform can be
applied directly to the signal vector,
y Ax =
saving on the overheads mentioned earlier.
50

5.2.2 Choosing a Standard Transform for Data Compression
Four standard transforms were tested to find the best transform to use
with signals from the metal detector. All of these transforms were real
transforms for use with real signals, as is the case with the signals derived from
the metal detector system. Three of the transforms are Sine/Cosine based; Sine
Transform, Cosine Transform, Real Discrete Fourier Transform and the other
transform uses binary level basis vectors; Walsh Transform [12]. All of the
transforms are related to the Discrete Fourier Transform in some way and
consequently to the complex Fourier Series. This makes the complex Fourier
series [20] a good place to start when trying to describe these transforms. The
complex Fourier series is applied to complex signals, the following equation
defines the complex Fourier Series coefficients for a signal with period 2:
( ) ( ) ( )
( ) ( ) ( ) ,...) 2 , 1 , 0 ( Sin + Cos ) (
2
1
Sin + Cos ) (
= =
=
n dz nz j nz z F c
nz j nz c z F
n
n
n

From this is derived the Discrete Fourier Transform which is applied to
sampled complex signals, the following equation defines the Discrete Fourier
Series for a signal with period 2:

( ) ( ) ( )
=
=
1
0
2 Sin 2 Cos ] [ ] [
N
n
N kn N kn n x k Y
Closely related to the complex Fourier series is the real Fourier series.
The following equation defines the real Fourier series for a real aperiodic
signal:

51

,...) 3 , 2 , 1 ( Sin ) (
1
,...) 3 , 2 , 1 ( Cos ) (
1
) (
1
Sin Cos
2
) (
0
1
0
= |
.
|
\
|
=
= |
.
|
\
|
=
=
|
|
.
|
\
|
|
.
|
\
|
+
|
.
|
\
|
+ =
=
n dx
L
x n
x f
L
n
b
n dx
L
x n
x f
L
n
a
dx x f
L
a
L
x n
n
b
L
x n
n
a
a
x f
L
L
L
L
L
L
n

If one considers the real Fourier series for a real periodic signal then it
becomes very similar to the complex Fourier series, the essential difference
being the absence of complex coefficients. In fact if one looks at the basis
vectors for both series then they are the same apart from the fact that the Sine
basis vectors are imaginary in the case of the complex Fourier series. From the
real Fourier series is derived the Real discrete Fourier transform:
2 / , 2 / ) (
2 / 0 , 0 ) (
) (
2
Cos ] [ ] [
1
0
N n n
N n n
n
N
nk
k x n y
N
k
> =
=
|
.
|
\
|
+ =

The Real Discrete Fourier Transform (RDFT) [21] utilises the fact that
information is effectively duplicated by the Discrete Fourier transform when
applied to real data or signals. This manifests itself in the coefficients
representing the negative frequencies equalling the complex conjugate of
coefficients representing the positive frequencies,
N n n F n N F < = 0 ) ( ) (
*
,
at once this symmetry can be used to discard half the coefficients without any
loss of information and this useful reduction in computational effort has been
used before now to develop faster algorithms [13], but what Ersoy has done is
develop the mathematical theory behind the RDFT and prove its validity from
first principles.
52

The easiest way of implementing the RDFT is to use a standard FFT
algorithm and rearrange the coefficients, discarding all surplus coefficients.
This was the method used in the computer simulations, but it is not the most
efficient method. A better way of implementing the FFT, quicker by a factor of
two, is to split the real input data into even and odd indexed data, and use these
data vectors as the real and imaginary components of a new data vector.
Starting from a data vector k(n) we create a new vector f(n) and obtain the
transform of this vector F(n)
2 / 0 ) ( ) ( ) ( N n n jF n F n F
o e
< + =
F(n) is made up of the even of and odd components of the original data vector,
these must be extracted from F(n) to generate K(n). Because F
e
(n) is real
F N n F n
e e
( ) (
*
2 = )
where * indicates the complex conjugate. Similarly because F
o
(n) is imaginary
F N n F n
o o
( ) (
*
2 = )
This leads to the following equations for F
e
(n) and F
o
(n)
( )
( )
*
*
) ( ) 2 (
2
1
) (
) ( ) 2 (
2
1
) (
n F n N F n F
n F n N F n F
o
e
=
+ =

Using the equation for the decimation in time FFT
K n F n jn N F n
e o
( ) ( ) exp( ) ( ) = + 2
F
e
(n) and F
o
(n) can be combined to form K(n)
( ) ( ) ( ) ( ) ( ) ( )
2 0
) 2 exp( 2
2
2
2
1
) (
* *
N n
N jn n F n N F
j
n F n N F n K
<
+ =

This results in a complex vector of coefficients, size N/2. But if the
imaginary coefficients are converted to real coefficients and then ordered so
that the first half of the vector is made up of original real coefficients and the
second half of the converted imaginary coefficients, then we have a real vector
53

of size N. This essentially gives the same result as the RDFT. The only
difference is that the RDFT gives us the coefficient ReK[N/2] and drops the
coefficient ImK[0], which is always zero. Unfortunately the DFT of N/2 points
does not give the point K[N/2], but because this is the transform of real data,
F[N/2]=F[0], and this can be substituted into the equation for K[n] to find
K[N/2]
The Sine and Cosine transforms use Sine and Cosine basis vectors
respectively, they are the discrete forms of the Sine and Cosine series the
following equations define the coefficients:
) ( Cos ] [ ] [
) ( Sin ] [ ] [
1
1
1
1
N nk n x k F
N nk n x k F
N
n
N
n
=
=
=

At first glance it appears that these basis vectors are simply the Sine and Cosine
basis vectors from the DFT, but this is not the case. There is a factor of two
missing from the Sine and Cosine arguments when compared to the DFT and
this ensures that a complete set of Sine and Cosine basis vectors are generated.
A simple way of generating the Sine transform coefficients is to use the
standard DFT but extend the data to twice its original length. The data is
extended as an odd function about k = N, with f[k] = 0. When the DFT is
applied to this new data all the real terms (i.e. the cosine terms) are zero, only
the imaginary terms (i.e. the sine terms) are set. Although simple, there is a
factor of two inefficiency inherent in this calculation due to the extension of the
original data set and a further factor of two inefficiency due to the effective
duplication of terms when using the DFT on real data. Similarly the Cosine
transform coefficients can be generated by extending the original data set as an
even function about j = N. Because these methods of generation are slightly
inefficient it is worth developing a more efficient fast algorithm based on the
54

above equation. For the experiments that were carried out, fast implementations
from 'Numerical Recipes in C' were used [13].
The fourth transform used was the Walsh transform [22]. This function
uses only two levels represented by +1 and -1 resulting in basis vectors for the
transform kernel which are square waves with an irregular mark space ratio and
increasing in sequency (number of zero crossings). The Walsh coefficients
W(u) of function f(x) are derived thus:
( )

=
=

=
1
0
) ( ) (
1
0
1
1 ) (
1
) (
n
i
u b x b
N
x
i n i
x f
N
u W
where N=2
n
, and b
k
(z) is the k
th
bit in the binary representation of z.
Because of its binary form the Walsh transform is very efficient when
applied to signals which are digital in nature and conversely not so efficient for
signals which are not digital in nature, where a sine/cosine based transform is
usually more efficient. Although different from the DFT useful comparisons
can be made between sequency and frequency and also the DWT can be broken
down into odd and even functions, CAL and SAL, in the same way that the
DFT can broken down into sine and cosine functions. So the peculiar world of
sequency can be interpreted using the more familiar form of frequency and it
turns out that the algorithm used to implement the FFT can also be used to
implement the FWT by setting all trigonometric terms to either +1 or -1. In fact
computation of the FWT is further expedited by the complete absence of any
requirement for multiplication, which can be replaced by addition. Also
quantisation noise due to coefficient truncation is non existent. When operating
on real data there is no overhead when using the DWT because it is a real
transformation, but this means the DWT cannot be applied to complex data.
In the experiments a FWT algorithm from Digital Image Processing [10]
was used. This was derived from a FFT algorithm and written in Fortran.
Conversion to C allowed it to be used in the experiments.
55

All four transforms were tested on real data from a metal detector
system and the results used to determine which transform should be used as the
basis for the new optimal fast transform. This metal detector system comprised
a TEK DSP metal detector mounted on a conveyor belt, data being down
loaded to a PC for storage onto hard disk. Data was captured within windows
of 512 samples a sample period corresponding to 4.2 mS giving a window
duration of approximately 2 seconds. Bodies passing through the metal detector
only generate signals for less than a second, so some accurate means of
aligning the signal within the sample window was required. This was done by
using a photo eye on the in feed of the conveyor to trigger a data collection on
the leading edge of the body passing down the conveyor. By using this method
the necessity for time shift invariance was negated at the expense of limiting
any eventual system to products capable of breaking a photo eye beam.
Positioning the signal within the window became even more critical when it
was decided to take a sub window of the stored data window to reduce memory
and processing requirements. The sub window used was a 256 sample window
at a constant position within the original window allowing just enough room to
fit the signal and making accurate alignment fairly critical. Data size was
further reduced by a factor of two by using a simple two point moving average
filter and using two times under sampling, giving a final window length of 128
samples. Figure (25) shows the reactive and resistive signals generated by a
200g block of cheese passing through the metal detector, only the smallest
signal, the reactive signal was used during all the experiments because the
resistive signal for wet products is much larger than the reactive signal, and
consequently gives rise to larger variations in signal strength when only
product is being passed through the detector, thus making it more difficult to
pick out variations in signal strength due to metal contamination.
56

Resistive signal
Reactive signal
1500
-1500
512
Amplitude
Sample Index

Figure (25) Signals from metal detector system
Seven classes of signal were used for testing; bread (bread), bread plus
2.5mm stainless steel sphere (brd25st), cheese (cheese), cheese plus 1.5mm
brass sphere (ch15mmNF), cheese plus 1mm iron sphere (ch1mmFe), cheese
plus 3mm stainless steel sphere (ch3mmSt), and nothing or system noise
(noise). Ensemble sizes of 20 were used for each class and all four transforms
were applied to each class of signal. Off diagonal energy in the normalised
correlation matrix was measured for the original signal ensemble and for the
coefficient ensemble after transformation. Reduction in off diagonal energy
was measured as a percentage of the off diagonal energy in the original
correlation matrix. As can be seen from figure (26) all transforms performed
better for 'noise', 'bread' and 'brd25st'. This was due to the uncorrelated
stationary nature of these signals, obvious in the case of 'noise' but in the case
of 'bread' and 'brd25st' these were very weak signals and hence gave a close
approximation to pure noise. When the signal strength is greater the signals are
more highly correlated and hence it becomes more difficult to effect a large
reduction in off diagonal energy by generating a set of uncorrelated basis
vectors to describe the signal. For every class of signal the Walsh transform
gave the worst performance although for 'noise' its relative performance was
some what better than for the other classes of signal. This was to be expected
57

due to the binary nature of the Walsh basis vectors which are more appropriate
for describing square waves and pulses than the smoother wave shapes that are
present here.

Off
Diagonal
Energy
(%)
b
r
d
2
5
s
t
b
r
e
a
d
c
h
1
5
m
m
N
F
c
h
1
m
m
F
e
c
h
3
m
m
S
t
c
h
e
e
s
e
n
o
i
s
e
Sin

0
5
10
15
20
25
30
35
40
45
RDFT
Cos
Walsh
Figure (26) Performance of transforms for different signal types
Between the RDFT, DCT and DST there is not too much difference in
performance but as figure (27) shows the RDFT is on average the best and
consequently was the one chosen as the basis for the new fast optimal
transform.
RDFT Cosine Sine
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
Off
Diagonal
Energy
(%)
RDFT Cosine Sine

Figure (27) Average performance for RDFT, DCT and DST
5.2.3 Jacobi Plane Rotations as a Fast Transform Technique
58

Jacobi plane rotations are normally used to generate eigenvectors and
eigenvalues for a real symmetric matrix [13]. The method is not as efficient as
some other methods of generating the eigenvectors, but it is simple, and
foolproof. If the Jacobi method is used to generate the eigenvectors for a signal
ensemble's covariance matrix one has effectively generated a specific
Karhunen-Loeve transform, for which another name happens to be the
Eigenvector transform.
A rotation matrix is used to transform the original matrix, being applied
twice, to transform both the rows and columns of the original matrix, according
to the equation
pq
T
pq
AP P A = '
Where P
pq
is the Jacobi rotation matrix and p and q denote the row and column
indices of the element to be transformed. The following matrix is a 5 by 5
example with p and q equal to 2 and 4 respectively
(
(
(
(
(
(
=
1 0 0 0 0
0 0 0
0 0 1 0 0
0 0 0
0 0 0 0 1
c s
s c
pq
P
The following equation can be derived for off diagonal element a'
pq
and
similarly for a'
qp
( ) ( )
qq pp pq pq
a a sc a s c a + =
2 2

Now, to reduce the off diagonal energy to zero this element a'
pq
must be
equated to zero, which gives the following equation for the angle of rotation
cot2
2
=
a a
a
qq pp
pq

In the process of setting these elements to zero other elements are altered
but the combined effect is one where the overall off diagonal energy is reduced
by 2 at each step. Thus by choosing the largest value of a
pq
it is possible to
2
a
pq
59

ensure the maximum reduction in off diagonal energy for each rotation. For an
N by N matrix it is possible to complete N/2 rotations in parallel if the row and
column indices of the elements are distinct, giving a useful increase in
processing speed due to the fact that the transformed matrix need not be
recalculated at every rotation, at the cost of reduced optimisation of the
transform, in terms of reduction in off diagonal energy per rotation. The
following algorithm was implemented to generate two size N arrays containing
the row and column indices of the elements to be rotated and the corresponding
sine and cosine of the angle of rotation.
JacobiRotations
Arrange absolute values of upper diagonal elements in ascending order.
For N/2 rotations
Find the next largest element with distinct row and column indices.
Calculate the angle of rotation.
Save the row column indices of the element and the sine and cosine of the angle of rotation.
Several N/2 plane rotations can be applied in series, so at each step the
transformed matrix moves closer to a diagonal matrix, where the diagonal
elements denote the eigenvalues of the original matrix and the product of all
the rotation matrices approach the eigenvectors for the original matrix. Of
course we have not stored a complete matrix for each rotation simply a pair of
indices and the sine and cosine of the angle of rotation, hence memory and
processing requirements are much reduced. The pairs of indices and
corresponding angles of rotation are stored in arrays and applied to the signal
vectors by the function jact, which generates the transform coefficients.

Jact
For N/2 signal points
Get the point pair
Get the sine and cosine of the angle of rotation
Calculate the transformed point pair
5.2.4 Difficulties in Permuting the Input Vector
60

Permuting the input vector as part of the new optimal transform helps to
bring the final transform closer to the theoretical best performance. The main
attraction of permuting the input vector before applying any other transforms is
the ease with which this can be done, computationally it is extremely effective.
In the work completed by Ersoy and Chen [11] reductions in off diagonal
energy in the transformed covariance matrix of greater than 12% where
achieved, the best results being achieved when using the permutation in
addition to a very simple optimal transform. When the transform became more
complex (i.e. an RDFT plus several plane rotations) the improvements became
much reduced although in percentage improvement terms the results where still
very good.
There's one catch though! It can take an awful long time to calculate the
optimal permutation. The time taken to do an exhaustive search of all possible
permutations is proportional to exp(kN) where k is a constant and N the
number of elements in the input vector. To get an idea of how prohibitive this
processing time can be, consider an 8 point signal which takes 10 minutes to
search all possible permutations, then if processing requirements remain
constant for testing a permutation to ascertain its effect on the off diagonal
energy (which they don't) then it would take nearly 7 billion years to search all
possible permutations for a 128 point signal. Unfortunately in this case, at each
permutation the test requires the application of a 2 dimensional transformation
which even in the case of a fast RDFT, is an order N(log
2
N) process, the
processing time would be increased by approximately a factor of 37.
One way of alleviating this problem is to use simulated annealing to avoid
an exhaustive search of all permutations but even this method requires that a
certain percentage of the total possible number of permutations is searched and
this proved to be the stumbling block.
61

A program was developed for the PC, based on Press's [13] simulated
annealing program and using the methods of permutation choice developed for
the solution of the travelling salesman problem. Permutation choice was based
on the methods of reversal and transportation. Reversal, where a set of points
between two randomly chosen pairs of points are reversed, and transportation,
where a complete section of points between two randomly chosen pairs of
points are moved to a new random location. For a 128 point input signal, even
with simulated annealing it proved impossible to attain any improvement in off
diagonal energy whilst running the program for a period of four days. A
different method of permutation choice was used, simply swapping randomly
chosen pairs of points, without any improvement in results. At this stage no
idea as to a better choice of next permutation algorithm and a complete lack of
success led to the dropping of the permutation optimisation part of the new
optimal transform.
5.3 Pattern Recognition Using a WISARD Network
A complete simulation of a WISARD net was developed to take as input
integer arrays of variable length and use this information to train or recall from
an integer array representing the WISARD net. Flexibility was considered an
important attribute of the simulation especially during the assessment and
testing phase, and therefore variable Ntuple size was built into the design.
In its simplest form a WISARD simulation would only need to decode the
input pattern to address a specific location in memory, with a flag to specify
whether the location should be read from or written to. A fairly essential
addition to this system, is random mapping, to improve the performance in
terms of discrimination. In order to reduce the amount of memory required for
the actual RAM net itself (very useful if a multi discriminator system is
envisaged) it was decided to use the whole of a location to store data rather
62

than the easier option of using one bit per location. This makes the address
decoding rather more difficult but in the case of PC/AT reduces the memory
requirements by a factor of 16. To illustrate the way in which the address
decoding works consider a simple 9 bit binary image, mapped onto a RAM net
using an Ntuple size of 3. Each NTuple will address a total of 2
3
locations in
memory with a total memory requirement of 24 bits, but using a 16 bit word
size to store one bit of data, the memory requirement becomes 384 bits. This
can be reduced to 32 bits (not 24 bits because it is not possible to address less
than one word) by utilising every bit within each word. The first Ntuple would
address the first 8 bits of the first word, the second Ntuple the second 8 bits of
the first word and the third NTuple would address the first 8 bits of the second
word. In general for the ith NTuple in a sequence of NTuples size n, and word
size w, the net memory address and bit position are defined thus:
( ) ( )
( )
n
n
n
A
w w i A
w i A
2 0
mod % 2 = n BitPositio
1 2 = s WordAddres
<
+
+ +

where A is the address defined by the Ntuple bit pattern, and % denotes modulo
division. Note that the word address is incremented by 1 to address memory
defined as starting at location 1, compared to the bit positions which are
defined as starting at location 0. The total memory required is:
( )
0 %
2
0 %
2
+
=
n N when
nw
n N
n N when
nw
N
n
n

where N is the input pattern size in bits.
An alternative to this scheme is to address the full 24 locations in memory
but use the individual bit positions within a word to store data for separate
discriminators up to a maximum of 16 discriminators, this has the advantage of
63

simplified address decoding but is inefficient when less than 16 discriminators
are required.
The following algorithm was used to train and recall from the net:

Wisnet
Reset the score
Create bit pattern from input data
Scramble the bit pattern
For i = 1 to total number of NTuples
Get bit pattern for Ntuple
Set location addressed by bit pattern and Ntuple number
Set bit position addressed by bit pattern and Ntuple number
If training
Set bit at selected bit position and location
If recall
If bit set at selected bit position and location
Increment the score

Scrambling of the bit pattern is governed by a permutation vector
generated by a separate function (init_scramble ) and called before any training
or recall takes place.
Potential performance of a WISARD type network with a particular data
set can be determined by using the function netperform, which calculates the
overlap between any two patterns and thereby gives an indication as to whether
discrimination between patterns of different classes will be possible and
whether there is sufficient similarity between patterns belonging to the same
class to allow correct classification. As an example of how overlap analysis can
be used to assess feasibility, consider the overlaps encountered between two
very similar classes of pattern. The two classes in this case are no contaminant,
or a very small contaminant. No difference can be discerned when looking at
the signals by eye so it is difficult to tell whether it is possible to use a
WISARD network. Overlap analysis can be undertaken on the signals
themselves rather than the truncated coefficients as only a slight amount of
information is lost when using the coefficients. In this case two signals, of
64

length 128 words, were selected at random from each class and the overlaps
were measured.
1224
1314
1288
1290
= =
= =
= =
= =
BA A B
BB B B
AB B A
AA A A
O R T
O R T
O R T
O R T
U
U
U
U

Where T
k
is training pattern k, R
l
is recall pattern l and O
kl
is the overlap
between training pattern k and recall pattern l. The maximum overlap is 2048.
Using the equation for net confidence we can compute the net confidence for
Ntuple sizes of 1 and 8.

( )
( )
( )
( ) % 433 . 0 3512 . 7 167 . 4 3512 . 7
% 012 . 0 343 . 6 265 . 6 343 . 6
3512 . 7 167 . 4 265 . 6
343 . 6
2048
1290
8
2048
8 = size Ntuple For
% 0684 . 0 1314 1224 1314
% 0015 . 0 1290 1288 1290
1314 1224 1288
1290
2048
1290
1
2048
1 = size Ntuple For
8
= =
= =
= = =
=
|
.
|
\
|
=
= =
= =
= = =
=
|
.
|
\
|
=
B
A
BB BA AB
AA
B
A
BB BA AB
AA
C
C
S S S
S
C
C
S S S
S

Where S
kl
represents the absolute response of the network to pattern l after
being trained on pattern k and C
m
represents the confidence of the network
trained on a pattern of class m, when presented with a different pattern of the
same class. In this case discrimination is possible although difficult especially
in the case of class A. It can be easily seen that increasing the NTuple size
increases the confidence level, but at the same time the absolute response
values have been reduced to the level where the fractional part of the response
65

becomes important. This would in practice mean a more variable confidence
level due to the non existence of a fractional output from the network. These
figures are for a single training pattern and consequently are worst case.
Response levels and confidence levels would be better for larger training sets.
5.4 An Experimental System for Metal Detector Signal Classification
What follows, describes the system that was developed to analyse metal
detector signals and a description of how the signals were obtained and stored
ready for use by the signal analysis program.
A complete metal detector conveyor system was used to generate signals,
being a digital signal processor based system, the signals were already digitised
and pre-processed to reject high frequency noise. An in house system
developed for viewing signals on a PC allowed easy retrieval and storage of
signals onto computer hard disk. Capture of signals generated by the passage of
an object along the conveyor belt, and through the metal detector, were cued by
the leading edge of the product clipping a photo eye beam before entering the
search head. Once the photo eye beam is clipped the next 512 samples of
reactive and resistive channel data are transmitted to the PC via a RS232 serial
link. A program running on the PC stores the data to hard disk (in ASCII
format) while at the same time allowing the signal to be viewed on the screen,
warning the user if any bad data has been stored to disk. An ensemble of
signals were stored for different classes of signal by setting the file name prefix
for the signal class to be stored, once set up signals were automatically stored
to hard disk with file name suffixes beginning at "1" and automatically
incremented for each signal stored, up to a maximum of "999".
The MDSignalClass program represents the culmination of all the
research that has been completed, it includes most of the techniques covered
already in this chapter. Stored signals from hard disk are used as the input to
66

the program and output is presented as a series of on screen graphs and charts
which allow the analysis of system performance for different signal types and
varying system parameters.

MDSignalClass
Get start up file
Calculate correlation matrix (corrCal)
Generate new optimal transform for correlation matrix and generate index for ordered off diagonal
elements (transCorrCal)
For i = startval to endval
Assign nCut or Ntuple to i
Initialise random mapping for WISARD net (init_scramble)
Initialise discriminators
Train discriminators
For all recall classes
Get discriminators response to recall class (recall_nets)
If recall class specified for results compilation
Compile results for recall class
If i is specified nCut or Ntuple to be used for results compilation
Compile results for nCut or NTuple
Plot results
Run time parameters are initialised to the values defined in the
configuration file (see appendix A) this determines parameters such as the
signals to be used for training the discriminators. For the defined signals the
correlation matrix is calculated, the signals used would typically be a sample
from all signal classes used to train the discriminators, to allow an optimised
transform which is optimised for all signals used, although increasing or
decreasing the number of signals from a particular class affects the bias of the
correlation matrix either towards or against that class of signal. A new optimal
transform is then developed from the correlation matrix. This comprises of a
RDFT followed by four stages of Jacobi plane rotation. Also at this stage an
array (co_index) is generated, that indexes the leading diagonal elements in
descending order. A truncated version of the coefficients, generated when the
transform is applied to the signals, is used to train the discriminators and also
for recall performance tests. One way of choosing the coefficients to retain
would simply be to select the largest, but to ensure that the same coefficients
67

are chosen every time the size indexer, co_index, is used to select the
coefficients.
Run time parameters for the program are defined in a separate file,
"start.up", and allow a flexible configuration so enabling analysis of the effect
on performance, of varying different parameters. One of the main features is
the ability to define a variable NTuple size, or a variable coefficient truncation
length, NCut, and run the tests for different values of these parameters.
5.5 Results and Analysis
All of the tests were carried out using an empty cardboard box as the
product. The box had no effect on the signals from the metal detector. This
may seem a rather unrealistic product to use but in fact, most real products
have no effect on the metal detector and for those that do, the effect can be
removed by appropriate signal processing techniques, whilst leaving the signal
due to the contaminant relatively untouched. Contaminants in the form of metal
spheres and wire were passed through the metal detector taped to the cardboard
box. Dimensions for spheres are the diameter in millimetres.
Metal detector
Conveyor belt

Figure (28) Plan view of metal detector conveyor system with length of wire at an angle to the
conveyor
Due to the unavailability of metal spheres in sufficiently small increments
of diameter, a 0.2mm diameter, 2mm length of stainless steel wire at varying
angles, was used in place of stainless steel spheres for some of the tests. The
highly directional nature of the magnetic field within the metal detector
aperture makes this possible. By aligning the length of the metal along the
length of the conveyor, the wire has virtually no effect as it passes through the
68

detector or to be more precise the same effect as a 0.2mm stainless steel sphere.
By altering the angle of the wire (see figure (28) ) with respect to the metal
detector the amount of effect can be altered accordingly up to a maximum
when the wire is angled at 90.
Figure (29) shows the response of a discriminator trained to detect
contaminated product, when presented with uncontaminated product and
product contaminated by stainless steel wire. The discriminator responds well
to stainless steel wire at both angles but unfortunately it also responds well to
just product. The reason for this is the difference in sizes of the training
ensemble used to train the contaminant discriminator and the product
discriminator. What has happened is that the discriminator trained on
contaminated product has been presented with more training patterns than the
other discriminator and hence it has been presented with a greater number of
variations in pattern. When the patterns are very similar as is the case here the
same types of variations will appear in all classes of pattern (mainly due to
noise) and assume a fairly large role in the make up of the signal.

0
20
40
60
80
100
120
Product StSt wire15 StSt
wire22.5
Recall Signal Class
(%)

Conf
Score
Figure (29) Contaminant discriminator response after training on three classes of signal; 1mm StSt
(40), 1.25mm StSt (40), 1.5mm StSt (40). Product discriminator trained on product (40) alone. NTuple
size equals 7. NCut equals 20. Numbers in brackets indicate ensemble size. StSt wire15 denotes
stainless steel wire at an angle of 15.

69

In terms of the statistical nature of the signals, the two class clusters are
very close in hyper space, and consequently difficult to separate. The variance
of the class with the larger training set is a more accurate estimate of the true
variance, because the way the WISARD net works is to effectively create a
region of variance around each training set vector (the size of the region of
variance is dependent upon the NTuple size), and the further that recall vectors
move away from the hyper space points defined by these training vectors, the
lower the score from the WISARD net. This can be visualised in the two
dimensional case, as shown in figure (30). This is not a very good
representation for the case of binary vectors, but it is a useful aid to
visualisation of these regions of variance. The regions of variance are the same
for both classes, but since there are more training examples from class A then
this class has greater coverage of the 2-D space. The crooked line shows the
real boundary between the two classes, and it is not difficult to see how vectors
belonging to class B, can be wrongly classified as class A. Consequently if one
training set is smaller than the other training set, the more gaps there are in the
effective variance map for that class, and the more likely it is that a vector
presented for recall will be closer in hyper space (or Hamming distance) to a
training vector from the incorrect class.

70

x
y
A
A
A
A
B

Figure (30) Regions of variance for two classes of pattern.

By training the product discriminator on the same number of signals as
the contaminant discriminator, this problem can be cured (see figure (31)).
Now with the training ensemble size the same for both discriminators the recall
performance is much improved. As can be seen from the graph, the response of
the contaminant discriminator for the steel wire is still good, but the response
for product alone is much reduced. Response to StSt wire22.5 is greatest as this
gives the largest signal, but it is worth noting that the response to StSt wire15 is
still quite good relative to the response to product, this is especially impressive
when one looks at the response for the discriminators trained on single classes
of signal (see figure(37)), where the response is much poorer. This tends to
suggest that multi class training is beneficial for recall performance.
71

0
10
20
30
40
50
60
70
80
Product StSt wire15 StSt wire22.5
Recall Signal Class
(%)

Conf
Score
Figure (31) Contaminant discriminator training as in figure (29) but this time product discriminator
trained on product ensemble of size 120.
NTuple size is a fairly critical parameter if successful classification is to
be achieved with minimum overheads in terms of; training set size, memory
requirements and processing speed. But if too small an NTuple size is used
then there may not be enough discrimination between patterns to allow
successful classification. By increasing the NTuple size the ability of the net to
generalise is reduced for a given size of training ensemble, this increases the
discrimination between classes but also increases the amount of training
required. Increasing the NTuple size also increases the memory required to
store the net and increases the time required to process the training and recall
phases.
NTuple Size
(%)
0
10
20
30
40
50
60
70
80
3 4 5 6 7 8 9 10
Conf
Score

Figure (32) Training as in figure (31). Recall performance tested with StSt wire22.5 and the NTuple
size varied from 3 to 10.
72

Using a set of training parameters that had already proved successful, a
test was performed to check the effect of varying NTuple size (see figure(32)).
NTuple sizes below 5 proved to be completely useless providing no
discrimination at all. Even at an NTuple size of 5, performance was poor with a
relatively low score and a very low confidence level. At NTuple sizes above 5
there is no improvement in scores but the confidence level continues to
improve although the rate of change of confidence with respect to NTuple size
tends to decrease with increasing NTuple size. These results confirm the choice
of an NTuple size of 7, which was used for most of the tests.
Looking at these results in a statistical sense, we can consider the regions
of variance again. For small NTuple sizes the regions of variance are very
large, and over lapping, so the recall vectors tend to give maximum scores for
each class leading to no decision. As the NTuple size increases the regions of
variance become smaller, until in the limiting case, where the NTuple size is
the same as the vector dimensionality, and there is no region of variance. In this
limiting case the recall vector must be identical to one of the training vectors
from the same class for correct classification to take place. Of course reduced
regions of variance means that greater discrimination is possible, but the
training set has to be increased in size to fill in all the gaps in the variance map.
The optimised transform that is used is optimised towards the ideal of
concentrating the maximum amount of information in the smallest number of
coefficients. These coefficients are then placed in descending order of size or in
other words descending order of information content. From looking at figure
(33) it appears that this optimisation has been highly successful as the first two
coefficients are all that are required to achieve a reasonable level of
performance. There is a dip in response at NCut equal to 5. A possible reason
for this is that the first two coefficients contain most of the information about
73

the part of the signal due to contaminant, whereas between 2 and 5 the
coefficients are adding information about noise which is not relevant to the
correct classification of the signals and therefore detracts from the performance
of the net.
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10
NCut Size
(%)

Conf
Score
Figure (33) Product discriminator trained on product (80) alone and contaminant discriminator trained
on 1mm StSt (80). Contaminant discriminator performance recorded for 1mm StSt. NTuple size equals
7.
Figure (34) shows the performance of both discriminators, using the same
parameters as in figure (33) but with a fixed NCut value of 20. Performance is
reduced compared to an NCut of 2, but as there was no confidence about what
information was contained within which coefficients, an NCut value of 20 was
used for most tests. Never the less the performance is very good, with a 1mm
stainless steel sphere being detected over 70% of the time with an average
confidence of 15%. By setting the confidence level required for a positive
classification at 10%, false classifications could be largely avoided. When one
considers the normal maximum sensitivity for the search head being used in
these tests is 2.5mm, then the performance is very good indeed.
74

Discriminator
(%)
0
10
20
30
40
50
60
70
80
Product Contaminant
Conf
Score

Figure (34) Training and recall signal classes same as figure (33). NCut equals 20. NTuple size equals
7.
Running the same test with a NTuple size of 10 (see figure (35)) resulted
in a reduction in performance, suggesting that a larger training set was required
for this NTuple size.
0
10
20
30
40
50
60
70
Product Contaminant
Discriminator
(%)

Conf
Score
Figure (35) Parameters same as in figure (34) but NTuple size equals 10.
Figure (36) and figure (37) show the results from training four separate
discriminators. One discriminator is trained on product the other three on
different sizes of stainless steel sphere. In figure (36) the recall class used is
StSt wire22.5 and in this case the results are fairly encouraging. It is interesting
to note that the discriminator trained on a 1mm stainless sphere responds best
to the wire contaminant and this suggests that the signal from the stainless steel
75

wire at an angle of 22.5 closely approximates the signal caused by a 1mm
stainless steel sphere.
0
5
10
15
20
25
30
35
40
Product 1mm StSt 1.25mm
StSt
1.5mm
StSt
Discriminator
(%)

Conf
Score
Figure (36) Four discriminators trained on ensemble size of 80 and their performance recorded for
recall tests using StSt wire22.5. NTuple size equals 7. NCut equals 20.
If one simply wishes to classify between contaminated and non-
contaminated product then the response from all 3 contaminant trained
discriminators can be used to indicate metal contamination. This gives a
probability of contaminant detection of 81%, this compares quite favourably
with the results in figure (31) for the two discriminator system trained on the
same classes of signal where the probability of detection was 75%.
76

0
5
10
15
20
25
30
Product 1mm StSt 1.25mm
StSt
1.5mm
StSt
Discriminator
(%)

Conf
Score
Figure (37) Four discriminators trained as in figure (36) but recall tests completed using StSt wire15.
The next set of results are a good example of misclassification. From
figure (37) it appears that it should be possible to detect the wire contaminant
with probability 55% as compared to 27% for misclassification. The only
problem is that misclassification has a higher confidence associated with it than
correct classification, meaning that thresholds could not be set to filter out
misclassifications.
To sum up the results. NTuple sizes below 5 were found to be useless, as
with these values discrimination was impossible. NTuple sizes of 6 and above
were required for discrimination between signal classes, but the exact choice of
NTuple size is dependent upon training set size. Larger NTuple sizes give
better discrimination but require larger training set sizes to work efficiently.
NCut sizes of 2 and above proved successful, highlighting the efficiency of the
optimal transformation. Training sets of equal size are required if optimal
performance is to be achieved. If unequal training set sizes are used then the
discriminator with the largest training set will on average respond better than
the other discriminators. Training contaminant discriminators on several
77

contaminant classes offers superior performance to training several separate
discriminators each trained on a single contaminant class.

5.6 Summary
Techniques covered in the previous chapter have been altered and
enhanced to work in a real system dedicated to the classification of metal
detector signals.
Using a correlation matrix instead of a covariance matrix has saved on
computation time and memory requirements.
Four different transformations have been tested, and the RDFT chosen as
the initial transformation for the new fast optimal transformation.
Permutations promised to be a useful part of the new transform because
of their simplicity and ease of implementation, but the reality of finding the
optimal permutation from a very large number of possible permutations, meant
that this technique had to be abandoned.
By addressing all the bit locations within a 16 bit word, the memory
requirements of the WISARD net have been reduced. Potential performance of
the WISARD net for given signal classes has been analysed, and used as a
guide to the feasibility of classification.
Finally a program has been developed to classify stored metal detector
signals. The performance of this system on real sets of data has been analysed
for different values of system parameter, such as NTuple size, and training set
size.
78

CHAPTER 6

CONCLUSION
Within the context of metal detection, and the recognition of metal
contaminated products; neural networks provide a useful addition to amplitude
threshold detection, in a simulated factory environment, and using stored
signals from a metal detector.
Detection of a stainless steel sphere, diameter 1mm, was possible with a
success rate of over 70%, and a confidence of 15%. This compares favourably
to the normal detection sensitivity for the same system, which is 2.5mm
diameter stainless steel sphere with low grade signal processing, improving to
1.25mm stainless steel when using a highly tuned adaptive filter. Both
sensitivities are achieved with a 100% success rate. Although the adaptive filter
provides the system with a large increase in performance, due to its highly
tuned state, it is susceptible to ringing when a large contaminant passes through
the detector or when noise, of an impulse or step like form is embedded in the
signal. This tends to cause false rejections, where as the same input signals
cause no problems for the neural net detection system. The results are
especially good when one considers that the effect due to a metal sphere is
largely proportional to the surface area of the sphere, which has a square law
relationship to diameter.
It appears that an NTuple size of 7 is about optimal for training set sizes
of the order of 100. It was also found that the training of just one contaminant
discriminator, trained on all sizes of contaminant gave superior performance to
a number of discriminators each trained on different sizes of contaminant.
79

Requirements of metal detector customers dictate that reliability is of
prime importance. Consequently a real world implementation needs to address
the two aspects which affect reliability; probability of false detection, and
simplicity of set up and operation. Also limitations of memory size effectively
prevents storage of the large correlation matrices required by the transform
optimisation process used when setting up the system. A possible solution is a
pre-optimised transform and pre-trained neural net. Training would take place
on a computer, and the results could then be hard coded into EPROM. To cope
with the differences in conveyor speed and coil spacing an interpolation, and
decimation process could be used to effectively alter the sample rate, to
simulate the original conveyor speed and coil spacing. This would alleviate the
requirement for training within the factory, and negate the subsequent
possibility of incorrect setting up. The need for a low probability of false
rejects would probably mean that the detection sensitivity would have to be
operated at below the peak level that is achievable in a simulated environment.
Next stage in the development is to implement the required algorithms on
the DSP56000 processor. Functions which must be implemented are; the
RDFT, the Jacobi transform, and the WISARD net. The easiest way to
accomplish this initially, is by replacing function calls within the C coded
system with function calls to functions implemented on a DSP56000 PC card.
This allows testing of the functions within an already proved, and tested
environment, thus speeding the development process. Appendix C contains
DSP56000 assembler listings [23] for the implementation of the Jacobi
transformation, the WISARD net, and the RDFT. It is envisaged that these
functions will be integrated into the metal detector signal processing software,
provided with a pre-trained data set from the computer, and the complete
system performance analysed within a more realistic operating environment.
80

REFERENCES

[1] Kanal.N.L (ed), "Pattern Recognition", Thomson Book
Co., 1968.

[2] Gonzalez.R.C & Thomason.M.G, "Syntactic Pattern
Recognition. An Introduction", Addison Wesley, 1978.

[3] Fu.K.S, "Syntactic Methods in Pattern Recognition",
Academic Press, 1974.

[4] Davis.L.S & Rosenfield.A, "Hierarchical Relaxation for
Waveform Parsing", in ,"Computer Vision Systems",
Hanson.A.R & Riseman.E.M (eds), Academic Press, 1978.

[5] Levine.R.I, Drang.D.E & Edelson.B, "AI and Expert
Systems. A Comprehensive Guide, C Language", McGraw
Hill, 1990.

[6] Aleksander,I & Morton.H, "An Introduction to Neural
Computing", Chapman & Hall, 1990

[7] MacClelland.J.L & D.E.Rumelhart, "Explorations in
Parallel Distributed Processing: A Handbook of Models,
Programs and Exercises", Cambridge MA, 1988.

[8] Maren.A.J, "Neural Networks for Spatio Temporal Pattern
Recognition", in ,"Handbook of Neural Computing
Applications", Maren.A.J, Harston.C.D & Pap.R.M,
Academic Press, 1990, pp.295-308.

[9] Soucek.B, "Neural and Concurrent Real-time Systems, the
Sixth Generation", New York, 1989, pp. 99-139.

[10] Gonzalez.R.C and Wintz.P, "Digital Image Processing,
second edition", Reading MA, 1987, pp.122-130.

[11] Ersoy.O.K and Chen.C.H, "Learning of Fast Transforms
and Spectral Domain Neural Computing", IEEE
81

Transactions on Circuits and Systems, 1989, Vol.36, No.5,
pp.704-712.

[12] Elliot.D.F and Rao.K.R, "Fast Transforms, Algorithms,
Analyses, Applications", New York, 1982.

[13] Press.W.H et al, "Numerical Recipes in C, the Art of
Scientific Computing", Cambridge UK, 1988.

[14] Austin.J, "High Speed Invariant Pattern Recognition using
Adaptive Neural Networks", IEE conference on pattern
recognition, 1989, pp.28-32.

[15] Roberts.R.A and Mullis.C.T, "Digital Signal Processing",
New York, 1987.

[16] Burg. J.P, "Maximum Entropy Spectral Analysis", Society
of exploratry geophysics.

[17] Hopfield.J.J & Tank.D.W, " "Neural" Computation of
Decisions in Optimization Problems", Biological
Cybernetics, 1985, Vol 52, pp141-152.

[18] Aleksander.I and Burnett.P, "Reinventing Man, the Robot
becomes Reality", London, 1983, pp. 209-266.

[19] Bendat.J.S and Piersol.A.G, "Engineering Applications of
Correlation and Spectral Analysis", New York, 1980.

[20] O'Neil.P.V, "Advanced Engineering Mathematics",
Wadsworth Pub. Co., 1987.

[21] Ersoy.O.K, "Real Discrete Fourier Transform", IEEE
transactions on acoustics, speech, and signal processing,
Vol. 33, No.4, 1985.

[22] Beauchamp.K.G, "Walsh Functions and Their
Applications", London, 1975.

[23] Motorola Corporation, "DSP56000/DSP56001 Digital
Signal Processor User's Manual".

82

[24] Bishop.M.L, "A General Waveform Shape Analyzer",
Doctoral Thesis, Sussex University, 1988.

[25] Ahmed.N & Rao.K.R, "Orthogonal Transforms for Digital
Signal Processing", Springer Verlag, 1975.

[26] Bendat.J.S and Piersol.A.G, "Measurement and Analysis of
Random Data", Wiley, 1966.

[27] Pavlidis.T, "Structural Pattern Recognition", Springer
Verlag, 1977.

[28] Widrow.B, Winter.R.G and Baxter.R.A, "Layered Neural
Nets for Pattern Recognition", IEEE Transactions on
Acoustic Speech and Signal Processing, 1988, Vol. 36, No.
7, pp. 1109-1117.

[29] Upda.L & Upda.S.S, "Neural Networks for the
Classification of Non-destructive Evaluation Signals", IEE
proceedings F, Vol. 138, No.1, 1991, pp41-45.

[30] Gorman.R.P & Sejnowski.T.J, "Learned Classification of
Sonar Targets Using a Massively Parallel Network", IEEE
transactions on ASSP, 1988, Vol. 36, No.7, pp1135-1140.

[31] Elliman.D.G & Banks, "Shift Invariant Neural Net for
Machine Vision", IEE proceedings, 1990, Vol.137, No.3,
pp183-187.

[32] Schalkoff.R, "Pattern Recognition. Statistical, Structural
and Neural Approaches", John Wiley, 1992, pp34-108.

[33] Smith.Nadler, "Pattern Recognition Engineering".

83

APPENDIX A

TABLES

Parameter Description Max and min values Options
ntup_var Select variable NCut or
NTuple
n/a ncut
ntuple
start_val Starting value for either
NCut or Ntuple
1 .. 20 n/a
end_val End value for either NCut or
NTuple
1 .. 20 n/a
fixed_val Fixed value for either NCut
or Ntuple
1 .. 20 n/a
fnCorr File names for signal classes
to be used for correlation
matrix generation
n/a Valid file suffix up to a
maximum of four file names
numDiscrim Number of discriminators 1 .. 4 n/a
fndis List of file of file names for
signal classes to be used for
discriminator training
for each discriminator
fnrecall List of file of file names for
signal classes to be used for
discriminator recall testing
size_trn_set Number of files from each
class to be used for training
discriminators
1 .. 999 n/a
recall_set_size Number of files from each
class to be used for recall
testing discriminators
1 .. 999 n/a
corr_set_size Number of files from each
class to be used for
generating correlation
matrix
1 .. 999 n/a
rec_trn_overlap Record, training set over lap
flag
n/a overLap
noOverLap
netNumFixed Number of discriminator to
be used for performance
analysis
1 .. 4 n/a
ntupleFixed NTuple number to be used
for performance analysis
1 .. 20 n/a
fileClassFixed Number of recall file class to
be used for performance
analysis
1 .. 4 n/a
netName List of names to be applied
to discriminators and used
on graphs
n/a Any valid string
Table (1) MDSignalClass configuration file format
84

APPENDIX B

C CODE LISTINGS

/* ---------------------- MDSigCls.c --------------------- */
/* NRC - Press et al, "Numerical Recipes in C" */
/* heap memory matrices, managed by NRC */
#include "main.h"
#include "md_sigs.h"
#include "stenruti.h"
#include "covar.h"
#include <stdio.h>
#include "graph2.h"
#include "trans.h"
#include "toolset.h"
#include "utl.h"
#include "walsh.h"
#include "matprt.h"
#include "nr.h" /* NRC */
#include "stejacob.h"
#include "marith.h"
#include "rdft.h"
#include "wisnet.h"
#include "terminal.h"
#include <string.h>
#include <conio.h>
#include <stdlib.h>
#include <graph.h>
#include <pgchart.h>

#define MAX_ROT 4
/* note! this is a misnomer, doubles up for MAX_NCUT */
#define MAX_NTUPLE 10
#define MAXDIS 4
#define MAXDISFILES 4
#define MAXCORRFILES 4
#define MAXRECALLFILES 4

FILE *infile, *outfile;
float **a, ode, **b, **rotaf, *x, *x2;
int **pairs;
char fp[80], fp2[80];
int *co_index, *y, ncut, *perm, **net, *y2;
unsigned scram = 1;
unsigned part;
unsigned sets;
int fno;
int size_trn_set,
recall_set_size,
non_trn_set,
corr_set_size;
unsigned net_size, mloop;
85

float **fclass1;
long net_size_l;
char fndis [MAXDIS] [MAXDISFILES+1] [80],
fncorr [MAXCORRFILES+1] [80],
fnrecall [MAXRECALLFILES+1] [80];
int numCorrFileClasses, start_val, end_val, ntup_var, numDiscrim,
rec_trn_overlap, numRecallClasses;
int *net_op, *net_win_sum;
float *net_conf_sum, *netNum_v_conf, *netNum_v_score, *netWin_v_ntuple,
*netConf_v_ntuple, *netWin_v_fileClass, *netConf_v_fileClass;
int netNumFixed, ntupleFixed, fileClassFixed;
float f100 = 100.0f;
chartenv chrt_env;
char *net_categ [MAXDIS];
char *ntup_categ [MAX_NTUPLE];
char *file_categ [MAXRECALLFILES];
char *netName [MAXDIS];
struct net_res_st
{ int netRes;
int netNum;} ;

/* ----------------------------- MDSignalClass ---------------------------------- */
main()
{
unsigned i,j;
int netCnt, fileClassCnt;

start_up();
a = matrix(1,SL,1,SL);
b = matrix(1,SL,1,SL);
pairs = imatrix(1,MAX_ROT,1,SL);
rotaf = matrix(1,MAX_ROT,1,SL);
x = vector(1,SL);
co_index = ivector(1,SL);
x2 = vector(1,SL);

net_win_sum = ivector(1, numDiscrim);
net_conf_sum = vector(1, numDiscrim);
netNum_v_conf = vector(1, numDiscrim);
netNum_v_score = vector(1, numDiscrim);
netWin_v_ntuple = vector(1, MAX_NTUPLE);
netConf_v_ntuple = vector(1, MAX_NTUPLE);
netConf_v_fileClass = vector(1, numRecallClasses);
netWin_v_fileClass = vector(1, numRecallClasses);

corrCal();
transCorrCal();

free_matrix(a,1,SL,1,SL);
free_matrix(b,1,SL,1,SL);
fclass1 = matrix(1,SL,1,SL);

/* main loop for learning and recall, with different values for
either ncut or part */
for (mloop = (unsigned) start_val; mloop <= (unsigned) end_val; mloop++) {
86

if (ntup_var) part = mloop; else ncut= mloop;
sets = (ncut * 16) / part;
if ( ((ncut * 16) % part) != 0) sets++;
net_size_l = ( (long) (1 << part)* (long) sets) / 16L;
if ( (( (long) (1 << part)* (long) sets) % 16L) != 0) net_size_l++;
if (net_size_l > 0xffffL) nrerror("net size too large");
net_size = (unsigned) net_size_l;
y = ivector(1,ncut);
y2 = ivector(1,ncut);
perm = ivector(1,16*ncut);
net = imatrix(1, numDiscrim, 1, net_size);

init_scramble(16*ncut,perm);

/* init nets */
for (i=1; i<= (unsigned) numDiscrim; i++)
for (j=1; j <= net_size; j++)
net [i] [j] = 0;

/* train nets */
printf("\n\nTRAINING NETS");
for (netCnt=1; netCnt <= numDiscrim; netCnt++) {
printf("\nTraining net %d\n",netCnt);
train_net(netCnt);
}

/* recall from nets */
printf("\n\nRECALL FROM NETS\n");
for (fileClassCnt=1; fileClassCnt <= numRecallClasses ;fileClassCnt++) {

/* init performance summers */
net_win_sum [netCnt] = 0;
net_conf_sum [netCnt] = 0.0f;
}

printf("\n");
recall_nets(fileClassCnt);
if (fileClassCnt == fileClassFixed) {
netWin_v_ntuple [mloop - start_val + 1]
= (float) net_win_sum [netNumFixed];
netConf_v_ntuple [mloop - start_val + 1]
= net_conf_sum [netNumFixed] /non_trn_set;
if (mloop == (unsigned) ntupleFixed) {
netNum_v_conf [netCnt] =net_conf_sum [netCnt] /non_trn_set;
netNum_v_score [netCnt] = (float) net_win_sum [netCnt];
}
}
}
if (mloop == (unsigned) ntupleFixed) {
netWin_v_fileClass [fileClassCnt] = (float) net_win_sum [netNumFixed];
netConf_v_fileClass [fileClassCnt] = net_conf_sum [netNumFixed] /non_trn_set;
}
}
87

free_ivector(y,1,ncut);
free_ivector(y2,1,ncut);
free_ivector(perm,1,16*ncut);
free_imatrix(net,1, numDiscrim, 1, net_size);
}

printf("\nSpace to quit, any other key to continue ... ");
while (getch() != ' ') {

set_vidmode();
_pg_initchart();
_pg_defaultchart(&chrt_env, _PG_COLUMNCHART, _PG_PLAINBARS);
strcpy(chrt_env.maintitle.title, "Net Cofidence");
sprintf(fp,"Ntuple: %d FileClass: %s ", ntupleFixed
, fnrecall [fileClassFixed-1] );
strcpy(chrt_env.subtitle.title, fp);
sprintf(fp,"'%s' Confidence",netName [netNumFixed-1] );
strcpy(chrt_env.xaxis.axistitle.title, "Net");
strcpy(chrt_env.yaxis.axistitle.title, "Confidence");
_pg_chart(&chrt_env, netName, &netNum_v_conf[1] , numDiscrim);
getch();

set_vidmode();
strcpy(chrt_env.maintitle.title, "Net Wins");
sprintf(fp,"Ntuple: %d FileClass: %s MaxScore: %d", ntupleFixe
, fnrecall [fileClassFixed-1], non_trn_set );
strcpy(chrt_env.yaxis.axistitle.title, "Score");
_pg_chart(&chrt_env, netName, &netNum_v_score[1] , numDiscrim);
getch();

set_vidmode();
sprintf(fp,"'%s' Wins",netName [netNumFixed-1] );
strcpy(chrt_env.maintitle.title, "Wins vs Ntuple");
sprintf(fp,"FileClass: %s MaxScore: %d Net: '%s'"
, fnrecall [fileClassFixed-1], non_trn_set, netName [netNumFixed-1] );
strcpy(chrt_env.xaxis.axistitle.title, "Ntuple size");
_pg_chart(&chrt_env, &ntup_categ [start_val-1],
&netWin_v_ntuple[1] , end_val-start_val+1);
getch();

set_vidmode();
strcpy(chrt_env.maintitle.title, "Confidence vs Ntuple");
sprintf(fp,"FileClass: %s Net: '%s'"
, fnrecall [fileClassFixed-1], netName [netNumFixed-1] );
_pg_chart(&chrt_env, &ntup_categ [start_val-1]
, &netConf_v_ntuple[1] , end_val-start_val+1);
getch();

set_vidmode();
strcpy(chrt_env.maintitle.title, "FileClass vs Wins");
sprintf(fp,"Ntuple: %d Net: '%s' MaxScore: %d"
88

, ntupleFixed, netName [netNumFixed-1], non_trn_set);
strcpy(chrt_env.xaxis.axistitle.title, "File Class");
_pg_chart(&chrt_env, file_categ, &netWin_v_fileClass[1] , numRecallClasses);
getch();

set_vidmode();
sprintf(fp,"'%s' Confidence",netName[netNumFixed-1]);
strcpy(chrt_env.maintitle.title, "FileClass vs Confidence");
sprintf(fp,"Ntuple: %d Net: '%s'", ntupleFixed, netName [netNumFixed-1] );
_pg_chart(&chrt_env, file_categ, &netConf_v_fileClass[1] , numRecallClasses);
getch();

set_text_mode();
printf("\nPress space bar to quit");
}

free_matrix(fclass1,1,SL,1,SL);
free_imatrix(pairs,1,MAX_ROT,1,SL);
free_matrix(rotaf,1,MAX_ROT,1,SL);
free_vector(x,1,SL);
free_ivector(co_index,1,SL);
free_vector(x2,1,SL);

free_ivector(net_win_sum,1, numDiscrim);
free_vector(net_conf_sum,1, numDiscrim);
free_vector(netNum_v_conf,1, numDiscrim);
free_vector(netNum_v_score,1, numDiscrim);
free_vector(netWin_v_ntuple,1, MAX_NTUPLE);
free_vector(netConf_v_ntuple,1, MAX_NTUPLE);
free_vector(netConf_v_fileClass,1, numRecallClasses);
free_vector(netWin_v_fileClass,1, numRecallClasses);

return 0;
}

/* ------------- start_up ----------------- */
/* get start up data from file start.up */
void start_up(void)
{
int fixed_val, cnt2, fin_loop, cnt;

/* get start up data from file */
/*
Variable parameter ie 'ncut' or 'ntuple'
starting value for variable parameter
end value for variable parameter
value of fixed parameter
File name1 (Files for correlation matrix calculation)
|
|
File name N
.*END
89

Number of discriminators
File Name 1
|
|
File name N
.*END
File Name 1
|
|
File name N
.*END
File name1 (Files for recall tests)
|
|
File name N
.*END
Training set size
Recall set size
Correlation set size
recall classes same as training classes 'overLap' 'noOverLap'
netNumFixed -netNum to be fixed for plots of ntuple size
versus net wins and net confidence
ntupleFixed -ntuple number to be fixed for plots of wins and
confidence versus net number
fileClassFixed -fileClass to be fixed.
nameDiscrim 1
|
|
nameDiscrim N
*/

/* init file name strings to empty string */
for (cnt2= 0; cnt2 < MAXDIS; cnt2++) {
for (cnt = 0; cnt< MAXDISFILES; cnt++) {
strcpy(fndis [cnt2] [cnt],"");
}
}
for (cnt = 0; cnt< MAXRECALLFILES; cnt++) {
strcpy(fnrecall[cnt],"");
}
if ((infile = fopen("start.up", "r")) == NULL)
{printf("\nFile open failed"); exit(1);}

fgets(fp,80,infile);
fp [strcspn(fp," \n") ] = 0;
if (strcmp(fp,"ntuple") == 0) ntup_var = TRUE;
else ntup_var = FALSE;
printf("\nntup_var = %d",ntup_var);

start_val = atoi(fp);
printf("\nStarting value %d", start_val);

end_val = atoi(fp);
printf("\nending value %d", end_val);
90

fixed_val = atoi(fp);
printf("\nfixed value %d", fixed_val);
if (ntup_var == TRUE) ncut = fixed_val;
else {
part = fixed_val;
if (part > 16 || part < 1) nrerror("Ntuple size out of range");
}

for (cnt = 0, fin_loop=FALSE ; cnt<= MAXCORRFILES && fin_loop == FALSE; cnt++) {
printf("\nCorr File %d = %s",cnt+1,fp);
if (strcmp(fp, ".*END") == 0) fin_loop = TRUE;
strcpy(fncorr [cnt], fp);
}
if (cnt-1 > MAXCORRFILES) nrerror("Too many correllation file");
numCorrFileClasses = cnt-1;
printf("\nnumCorrFileClasses = %d",numCorrFileClasses);

numDiscrim = atoi(fp);
printf("\nnumDiscrim %d",numDiscrim);
if (numDiscrim > MAXDIS) nrerror("Too many discriminators");

for (cnt2= 0; cnt2 < numDiscrim; cnt2++) {
for (cnt = 0, fin_loop=FALSE; cnt<= MAXDISFILES && fin_loop==FALSE; cnt++) {
strcpy(fndis [cnt2] [cnt] , fp);
printf("\nDiscrim %d File %d Name %s",cnt2,cnt, fndis [cnt2] [cnt]);
}
if (cnt-1 > MAXDISFILES) nrerror("Too many discriminator files");
}

for (cnt = 0, fin_loop=FALSE; cnt<= MAXRECALLFILES && fin_loop==FALSE; cnt++) {
printf("\nRecall file %d = %s",cnt,fp );
strcpy(fnrecall [cnt], fp);
}
if (cnt-1 > MAXRECALLFILES) nrerror("Too many recall files");
numRecallClasses = cnt-1;
printf("\nNumRecallClasses %d", numRecallClasses);

size_trn_set = atoi(fp);
printf("\nsize_trn_set %d", size_trn_set);

recall_set_size = atoi(fp);
printf("\nRecall_set_size %d", recall_set_size);
91

corr_set_size = atoi(fp);
printf("\ncorr_set_size %d", corr_set_size);

if (strcmp(fp,"overLap") == 0) rec_trn_overlap = TRUE;
else rec_trn_overlap= FALSE;
printf("\nrec_trn_overlap = %d",rec_trn_overlap);
if (rec_trn_overlap == TRUE)
non_trn_set = recall_set_size - size_trn_set;
else non_trn_set = recall_set_size;
printf("\nnon_trn_set %d",non_trn_set);

netNumFixed = atoi(fp);
if (netNumFixed > numDiscrim || netNumFixed < 1)
nrerror("netNumFixed out of range");
printf("\nnetNumFixed %d", netNumFixed);

ntupleFixed = atoi(fp);
if (ntupleFixed > end_val || ntupleFixed < start_val)
nrerror("ntupleFixed must be between start and end values");
printf("\nNtupleFixed %d", ntupleFixed);

fileClassFixed = atoi(fp);
if (fileClassFixed > numRecallClasses || fileClassFixed < 1)
nrerror("fileClassFixed must be between 1 and numRecallClasses");
printf("\nfileClassFixed %d", fileClassFixed);

/* set netName for pg_chart. Note first index into array is zero */
for (cnt=0; cnt < numDiscrim; cnt++) {
fp [strcspn(fp,"\n") ] = 0;
strcpy(netName[cnt],fp);
printf("\nNet name %d = %s",cnt,netName[cnt]);
}

/* set file_categ for pg_chart */
for (cnt=1; cnt<=numRecallClasses; cnt++)
strcpy( file_categ[cnt-1], fnrecall[cnt-1]);

fclose(infile);
}

/* -------------- corrCal -----------*/
/* uses 'fncorr' & 'numCorrFileClasses' , returns corr matrix in 'b' */
void corrCal(void)
{
int i,j,cnt;

/* init 'b' */
92

for (i=1; i <= SL; i++)
for (j=1; j <= SL; j++)
b[i] [j] = 0.0f;

for (cnt=0;cnt<numCorrFileClasses;cnt++) {
printf("\nCorrelation calc pass %d\n",cnt+1);
strcpy(fp,fncorr [cnt]);
corr_md_sigs(SL, SL, a, fp, corr_set_size);
for (i=1; i <= SL; i++)
for (j=1; j <= SL; j++)
b[i] [j] += a[i] [j];
}

/* average */
for (i=1; i <= SL; i++)
for (j=1; j <= SL; j++)
b[i] [j] = b[i] [j] / (float) numCorrFileClasses;
}

/* ------------ calTransCorr ----------- */
/* uses 'b' (corr matrix) and trashes it. returns 'a' (transformed
corr matrix) , 'x' a vector containing leading diagonal elements
from 'a' , 'pairs' & 'rotaf' data required for fast jacobi plane
rotations */
void transCorrCal(void)
{
int i;

printf("\nDoing rdft");
two_d_rdftrans(SL,b);
copy_mat(b,a);
printf("\nCalculating plane rotations");
for (i=1; i<= MAX_ROT; i++) {
printf("\nRotation %d",i);
jacobiRotations(a,SL, pairs[i] ,rotaf[i] ); /* calc SL/2 rotations */
twod_jacobi(SL, pairs[i] ,rotaf[i] , b); /* apply rotations */
copy_mat(b,a);
}

covar_norm(SL,b, a);
for (i=1;i<=SL;i++) x[i] = a [i] [i];
/* indexx from NRC */
indexx(SL,x,co_index); /* co_index contains size oredered index for x */
/* x = vector of ordered diagonal elements */
for (i=1; i <= SL; i++) x[i] = a [co_index[SL+1-i]] [co_index[SL+1-i]];
}

/* --------------- train_net ----------------- */
/* train discrim 'netNum' on specified file classes */
void train_net(int netNum)
{
int netFileNum;
int i,cnt;

for (netFileNum=0;
93

strcmp(fndis[netNum-1] [netFileNum],".*END"); netFileNum++) {
get_sigs_mat(SL,SL,fclass1,size_trn_set, fndis[netNum-1] [netFileNum]);
printf("\n");
for (cnt=1; cnt <= size_trn_set; cnt++) {
for (i=1; i <= SL; i++) x[i] = fclass1[cnt] [i]; /* x= sig vector */
frdft(SL, x); /* x= RDFT(x) */
for (i=1; i<= MAX_ROT; i++)
jact(SL,pairs[i] ,rotaf[i] ,x); /* x= JacT(x) */
/* order x2 according to predetermined order in co_index */
for (i=1; i <= SL; i++) x2[i] = x [co_index[SL+1-i]];
vec_flt_to_int(ncut,x2 , y, TRUE);
cursor(0,24);
printf("Training net pass no. %d",cnt);
/* train 'net[netNum]' on 'ncut' first coefficients from vector 'y' */
wisnet(ncut, sets, y, net [netNum], part, scram, perm, 1);
}
}
}

/* -------- recall_nets --------------- */
void recall_nets(int fileClassNum)
{
int i, netCnt, passCnt=1;
struct net_res_st big, next_big;

/* get all signal vectors from disc */
get_sigs_mat(SL,SL,fclass1, recall_set_size, fnrecall [fileClassNum-1]);
printf("\n");
if (rec_trn_overlap) fno=size_trn_set+1; else fno=1;
for (; fno <= recall_set_size; fno++) {
cursor(0,24);
printf("Recall net pass no. %d",passCnt);
passCnt++;
/* copy signal vector into 'x' */
for (i=1; i <= SL; i++) x[i] = fclass1[fno] [i];
frdft(SL, x); /* x= RDFT(x) */
for (i=1; i<= MAX_ROT; i++)
jact(SL,pairs[i] ,rotaf[i] ,x); /* x= JacT(x) */
/* order x2 according to predetermined order in co_index */
for (i=1; i <= ncut; i++) x2[i] = x [co_index[SL+1-i]];
for (i=ncut+1; i <= SL; i++) x2[i] = 0.0f;
/* convert x2 to integer vector y */
vec_flt_to_int(ncut,x2 , y, TRUE);

/* recall from nets and compute results */
big.netRes=0, next_big.netRes=0; big.netNum = 1;
for (netCnt=1; netCnt <= numDiscrim; netCnt++)
net_op [netCnt]
= wisnet(ncut, sets, y, net [netCnt], part, scram, perm, 0);
for (netCnt=1;netCnt<=numDiscrim;netCnt++)
x[netCnt] = (float) net_op [netCnt];
indexx(numDiscrim,x,co_index); /* NRC */
big.netNum = co_index[numDiscrim];
big.netRes = net_op[big.netNum];
next_big.netNum = co_index[numDiscrim-1];
next_big.netRes = net_op[next_big.netNum];
94

if (big.netRes > next_big.netRes) {
net_conf_sum [big.netNum]
+= (((int) big.netRes - (int) next_big.netRes)*100)
/ (int) big.netRes;
net_win_sum [big.netNum]++;
}
}
}
95

/* -------------- md_sigs.c ---------------- */
#include "main.h"
#include <math.h>
#include "graph2.h"
#include "stenrutil.h"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include "utl.h"
#include "marith.h"
#include "covar.h"
#include "correl.h"
#include <string.h>

#define LFTSHFT 70
#define MDSIGLEN 512
#define UND_SAMP 2

/* ----------------- get_mdsig ------------------ */
void get_mdsig(int nr, float *ch1, float *ch2, char *fileName,int zcPnt,int spread)
{
char line[81], filePath[81];
int cnt, i;
FILE *infile;

if ((zcPnt < spread) || (zcPnt + spread > MDSIGLEN)) nrerror("Invalid data window");
/* sprintf(filePath, "c:\\wave\\%s", fileName); */
strcpy(filePath, fileName);
cursor(0,24);
printf("Opening File %s ...", fileName);
if ((infile = fopen(filePath,"r")) == NULL)
nrerror("File open failed");
fgets(line,80,infile);
if (MDSIGLEN*2 != atoi(line)) nrerror("File contains wrong amount of data");
cnt = 1;
while ( fgets(line,80,infile) != NULL) {
ch1[cnt] = (float) atoi(line);
ch2[cnt] = (float) atoi(line);
if (cnt > nr) nrerror("Too much data in file");
cnt++;
}
i = 1;
for (cnt = zcPnt-spread; cnt < zcPnt +spread; cnt++) {
ch1[i] = ch1[cnt]; ch2[i] = ch2[cnt]; i++;
}
if (fclose(infile) != 0) nrerror("file close failed");
}

96

/* ------------ filter_sig --------------- */
/* moving average filter, averaging over 'wind' points */
void filter_sig(int size, float *a, int wind)
{
float tmp = 0.0f, *b;
int i,j, odd;

b = vector(1,size);

if (wind%2 == 1) odd = TRUE; else odd = FALSE;
for (i = 1; i <= size; i++) {
if (odd == TRUE)
for (j=i - (wind/2); j <= i + (wind/2); j++) {
if (j >= 1 && j <= size)
tmp += a[j];
}
else
for (j=i - (wind/2); j < i + (wind/2); j++) {
if (j >= 1 && j <= size)
tmp += a[j];
}
b[i] = tmp/(float) wind;
tmp = 0.0f;
}
for (i=1;i<=size;i++) a[i] = b[i];

free_vector(b,1,size);
}

/* ---------- under_samp ------------- */
void under_samp(int size, float *a, int wind)
{
int i;
for (i=1; i<= size/wind; i++) a[i] = a[i*wind-1];
}

/* -------- get_sigs_mat ----------- */
/* rows of 'a' contain on return the signal vectors. 'nr' and 'nc'
denote the number of rows and columns in 'a'. en_num or
ensemble_number denotes the number of signal vectors to be returned
in 'a'. filePrefix is a pointer to a character string to be used as
the prefix of the signal file */
void get_sigs_mat(int nr, int nc, float **a, int en_num, char *filePrefix)
{
int sig_num, col;
char fileName [80];
float *ch1, *ch2;
#define UND_SAMP 2

ch1 = vector(1,MDSIGLEN);
ch2 = vector(1,MDSIGLEN);

for (sig_num = 1; sig_num <= en_num; sig_num++) {
sprintf(fileName,"%s%s%d",filePrefix,".",sig_num);
get_mdsig(MDSIGLEN, ch1, ch2, fileName, 170, nc*UND_SAMP/2);
filter_sig(nc*UND_SAMP, ch1, UND_SAMP);
97

under_samp(nc*UND_SAMP, ch1, UND_SAMP);
for (col = 1; col <= SL; col++)
a[sig_num] [col] = ch1[col];
}
free_vector(ch1,1,MDSIGLEN);
free_vector(ch2,1,MDSIGLEN);
}

/* -------- corr_md_sigs ------------ */
/* calculate the correlation matrix for 'enSum' sized ensemble of signals of signal class
'fp'. Return in 'a' .Calculate the off diagonal energy. */
void corr_md_sigs(int nr, int nc, float **a, char* fp, int enNum)
{
float **sigs,ode;

sigs = matrix(1,nr,1,nc);
get_sigs_mat(nr, nc, sigs, enNum, fp);
calc_correlmat(enNum, SL, sigs, a);
ode = off_diag(nc, a,TRUE);
printf("\nOff diag Energy = %f", ode);
graph_mat(nr, nc, a,"correl"); */
free_matrix(sigs,1,nr,1,nc);
}
98

/* -------------- correl.c ---------------- */
#include "marith.h"
#include "covar.h"
#include "main.h"
#include <math.h>
#include "matprt.h"
#include <stdio.h>
#include "correl.h"

/* ----------- calc_correlmat ------------- */
void calc_correlmat(int nr, int nc, float **sigs, float **a)
/* Transpose sigs
sig_prod = sigsT * sigs
sig_prod = sig_prod/vector length
covar = (sig_prod) Normalized ie divide each element by the trace
*/
/* Signals are contained in the rows, the column position denotes
the sample number */
/* correlation matrix returned in 'a' */

{
float multer = 1.0f/(float)nr;
float **b;

printf("\nCalculating Correlation matrix ");
b = matrix(1,nc,1,nc);
transpose2(nr,nc, sigs, a); /* a = sigsT */
mult2(nc,nr,nr,nc,a, sigs, b); /* b = sig_prod = sigs * sigsT */
sc_mult2(nc,nc, b, &multer); /* watch out passing float by value
does not work */
covar_norm(nc, b, a);

free_matrix(b,1,nc,1,nc);
printf("\nFinished Calculating Correlation Matrix");
}

/* ------------- off_diag ---------- */
/* calculate the off diagonal energy in matrix 'data1', with the option for taking the absolute value
of the off diagonal elements */
float off_diag(int size, float **data1, int absol)
{
int row,col;
float tmp_tot= 0;

if (absol == TRUE) {
for (row=1 ;row <= size; row++) {
for (col=row+1 ;col <= size; col++)
tmp_tot += fabs(data1[row] [col]);
}
}
else {
for (row=1 ;row <= size; row++) {
for (col=row+1 ;col <= size; col++)
99

tmp_tot += data1[row] [col];
}
}
tmp_tot *= 2;
return tmp_tot;
}

/* --------- covar_norm ---------- */
/* normalise the matrix in 'a', by dividing every element by the trace.
Trace = the sum of the diagonal elements */
void covar_norm(int size, float **a, float **b)
{
int row;
float diag_tot;

for (row = 1; row <= size; row++)
diag_tot += a[row] [row];
diag_tot = 1.0f/diag_tot;
copy_mat(a,b);
sc_mult2(size,size,b,&diag_tot);
}
100

/* ---------------- trans.c --------------- */
/* All the 2D transformations required to transform the
square correlation matrices */
#include <stdio.h>
#include "walsh.h"
#include "matprt.h"
#include "utl.h"
#include "rdft.h"

/* ------------- two_d_rdftrans --------------- */
/* Send data in "a". Returns transformed matrix in "a" */
/* A 2d transformation is equivalent to y=TxT'
because TxT' = T(Tx')' ,in the case of a symmetric matrixsuch as a
covariance matrix x = x' .Therefore y = T(Tx)'
T = transform matrix
x = covariance matrix
y = transormed covariance matrix */

void two_d_rdftrans(int size, float **a)
{
int i, row, col;
float *data;
data = vector(1,size);

/* printf("\nReal Discrete Fourier Transform"); */
/* do the columns first */
/* printf("\nTransforming the columns ..."); */
for (col= 1; col<=size; col++) {
for (i=1; i <= size; i++) {
data[i] = a[i] [col];
}
frdft(size, data);
for (i=1; i <= size; i++) {
a[i] [col] = data[i];
}
}

/* printf("\nReal Discrete Fourier Transform");
printf("\nTransforming the rows ..."); */
for (row= 1; row<=size; row++) {
for (i=1; i <= size; i++) {
data[i] = a[row] [i];
}
frdft(size, data);
for (i=1; i <= size; i++) {
a[row] [i] = data[i];
}
}
free_vector(data,1,size);
}

101

/* ------------- two_d_sincostrans --------------- */
covariance matrix x = x' .Therefore y = T(Tx)' */
void two_d_sincostrans(int size, float **a, int tchoice)
{
int i, row, col;
float *data;

if (tchoice == 1)
printf("\nSine transform");
else
printf("\nCosine transform");

printf("\nTransforming the columns ...");
for (i=1; i <= size; i++) {
}
if (tchoice == 1)
sinft(data, size); /* NRC */
else
cosft(data, size, 1); /* NRC */
for (i=1; i <= size; i++) {
}
}

/* then the rows */
printf("\nTransforming the rows ...");
for (i=1; i <= size; i++)
if (tchoice == 1)
sinft(data, size); /* NRC */
else
cosft(data, size, 1); /* NRC */
}


}

/* ------------- two_d_walshtrans --------------- */
covariance matrix x = x' .Therefore y = T(Tx)' */
void two_d_walshtrans(int size, float **a)
{
int i, row, col, ln, size_cpy;
102

float *data;

printf("\nTwo-D Walsh transform");
ln = 0;
size_cpy = size;
while (size_cpy > 1) {size_cpy >>= 1; ln++;} /* find order of 2 */
printf("\nln = %d",ln);

printf("\nTransforming the columns ...");
for (i=1; i <= size; i++) {
}
fwt(data, ln);
for (i=1; i <= size; i++) {
}
}

/* then the rows */
printf("\nTransforming the rows ...");
fwt(data, ln);
}
}
103

/* -------------- rdft.c ---------------- */
#include "main.h"
#include <stdio.h>
#include <math.h>
#include "matprt.h"
#include "utl.h"
#include "rdft.h"

/* ------------ frdft --------------- */
void frdft(int size, float *a)
{
float *data;
int i;

data = vector(1,size*2);

for (i=1; i <= size; i++) {
data[2*i] = 0.0f;
data[2*i-1] = a[i];
}
four1(data, size, 1); /* NRC */
a[1] = data[1]; /* no imaginary DC element */
/* element N/2 has no imaginary component */
a[size/2 + 1] = data[size+1];
for (i=2; i <= size/2; i++) {
a[i] = data[2*i-1]; /* data.real for all i <= size/2 */
a[size+2-i] = data[2*i]; /* data.im for all i > size/2 */
}
free_vector(data,1,size*2);
}
104

/* ----------- walsh.c ----------- */

/* ----- fwt ------------ */
/* based on Fortran fast walsh transform from: "digital image processing",
Gonzalez & Wintz; which is itself based on a successive doubling fft
algorithm */
/* data is passed in 'f', contains the coefficients on return. Order of
transform is 2^ln */
void fwt(float *f, int ln)
{
int n, nv2, nm1, i,j,k,l, le, le1, ip;
float t;

n = 1 << ln;
nv2 = n/2;
nm1 = n-1;
j=1;

for (i=1; i<=nm1; i++) {
if (i<j) {
t= f[j];
f[j] = f[i];
f[i] = t;
}
k = nv2;
while (k<j) {
j = j-k;
k = k/2;
}
j += k;
}
for (l=1; l<=ln; l++) {
le = 1 << l;
le1 = le/2;
for (j=1; j<=le1; j++) {
for (i=j; i<=n; i += le) {
ip = i + le1;
t = f[ip];
f[ip] = f[i] - t;
f[i] += t;
}
}
}
for (i=1; i<=n; i++) f[i] = f[i]/ (float) n;
}
105

/* -------------- marith.c ------------ */
#include "main.h"

/* ------------- mult -------------- */
/* multiplication of square matrices size 'SL' */
void mult(float **data1, float **data2, float **result)
{
int row, col, col2, no_row= SL, no_col= SL;
float subtot;

for (col2=1; col2 <= no_col; col2++) {
for (row=1 ;row <= no_row; row++) {
subtot = 0;
for (col=1 ; col <= no_col; col++)
subtot += (data1[row] [col] * data2[col] [col2]);
result[row] [col2] = subtot;
}
}
}

/* ------------- mult2 -------------- */
/* multiplication of matrices size 'nr' by 'nc' times 'nr2' by 'nc2'.
'nc' must equal 'nr2' */
void mult2(int nr, int nc, int nr2, int nc2, float **data1, float **data2, float **result)
{
int row, col, col2;
float subtot;

for (col2=1; col2 <= nc2; col2++) {
for (row=1 ;row <= nr; row++) {
subtot = 0;
for (col=1 ; col <= nc; col++)
subtot += (data1[row] [col] * data2[col] [col2]);
result[row] [col2] = subtot;
}
}
}

/* ------------- mat_sub -------------- */
void mat_sub(float **data1, float **data2, float **res)
{
int row, col;

for (col=1;col <= SL; col++)
for (row=1; row <= SL; row++)
res[row] [col] = data1[row] [col] - data2[row] [col];
}

/* ------------- transpose -------------- */
void transpose(float **data1, float **res)
{
int row, col, no_row=SL, no_col=SL;

for (row=1 ;row <= no_row; row++)
for (col=1 ; col <= no_col; col++)
res[col] [row] = data1[row] [col];
106

}

/* ---------- sc_mult2 -------- */
void sc_mult2(int nr, int nc, float **a, float *multer)
{
int row,col;

for (row=1 ;row <= nr; row++) {
for (col=1 ;col <= nc; col++)
a[row] [col] *= *multer;
}
}
/* ------------- transpose2 -------------- */
void transpose2(int nr, int nc, float **data1, float **res)
{
int row, col, no_row=SL, no_col=SL;

for (row=1 ;row <= nr; row++)
for (col=1 ; col <= nc; col++)
res[col] [row] = data1[row] [col];
}

/* ------------- mat_mean2 -------------- */
/* Take the mean of the column vectors which make up the matrix 'data1' and return in row vector
of length 'nc' */
void mat_mean2(int nr, int nc, float **data1, float **res)
{
int row,col;
float temp;

for (col=1; col <= nc ; col++) {
temp = 0;
for (row=1 ;row <= nr ; row++) {
temp += data1[row] [col];
}
res[1] [col] = temp / nr;
}
}
/* --------- copy_mat ------- */
void copy_mat(float **data1, float **data2)
{
int row,col;

for (col = 1; col <= SL; col++) {
for (row = 1; row <= SL; row++) {
data2[row] [col] = data1[row] [col];
}
}
}
107

/* ---------------------- stejacob.c --------------------- */
#include <math.h>
#include <stdio.h>

/* ------------- jacobiRotations ------------------- */
/* This routine is based on a routine from NRC 'jacobi' */
/* Covariance matrix is passed in 'a'. Contents of upper diagonal are
destroyed. Pairs returns the pairs of points to be rotated. rotaf or
rotation_angle_function returns the sin and cosines of the angle of rotation */
void jacobiRotations(float **a,int n, int *pairs, float *rotaf)
{
int iq,ip;
float theta,t,s,h,g,c;
float *up_sort;
int cnt, *up_index, *up_col_no, *up_row_no, large_ind, up_diag, *bar_list;
int rot_no, fnd_next_el;

int pcnt = 1;

up_diag = (int)(n * (n-1)*0.5);
up_sort = vector(1,up_diag);
up_index = ivector(1,up_diag);
up_col_no = ivector(1,up_diag);
up_row_no = ivector(1,up_diag);
bar_list = ivector(1,n);
cnt = 1;
for (ip=1;ip<=n-1;ip++) /* init vector copy of upper diagonal */
for (iq=ip+1;iq<=n;iq++) {
up_sort[cnt] = (float) fabs (a[ip] [iq]);
up_row_no[cnt] = ip;
up_col_no[cnt] = iq;
cnt++;
}

indexx(up_diag, up_sort, up_index); /* NRC */
large_ind = up_diag;
for (cnt = 1; cnt <= n; cnt++) /* init col and row strikes */
bar_list[cnt] = FALSE;
for (rot_no = n/2;rot_no >= 1;rot_no--) { /* for N/2 rotations */
fnd_next_el = FALSE;
while(fnd_next_el == FALSE) {
ip = up_row_no[up_index[large_ind]];
iq = up_col_no[up_index[large_ind]];
if (bar_list[ip] == TRUE || bar_list[iq] == TRUE)
large_ind--;
else {
bar_list[ip] = TRUE;
bar_list[iq] = TRUE;
fnd_next_el = TRUE;
large_ind--;
108

}
}
if (fabs(a[ip] [iq]) > 0.0f) {
g=100.0f* (float) fabs(a[ip][iq]);
h= a[iq] [iq] - a[ip] [ip];
if ((float)(fabs(h)+g) == (float)fabs(h))
t=(a[ip][iq])/h;
else {
theta=0.5f*h/(a[ip][iq]);
t=1.0f/(fabs(theta)+sqrt(1.0f+theta*theta));
if (theta < 0.0f) t = -t;
}
c=1.0f/sqrt(1+t*t);
s=t*c;
pairs [pcnt] = ip;
rotaf[pcnt] = s;
pcnt++;
pairs [pcnt] = iq;
rotaf[pcnt] = c;
pcnt++;
if (pcnt > n+1) nrerror("pcnt over run in jacobiRotations");
}
else {
pairs [pcnt] = ip;
rotaf[pcnt] = 0.0f;
pcnt++;
pairs [pcnt] = iq;
rotaf[pcnt] = 1.0f;
pcnt++;
if (pcnt > n+1) nrerror("pcnt over run in jacobiRotations");
}
}
free_vector(up_sort,1 ,up_diag);
free_ivector(up_index, 1, up_diag);
free_ivector(up_col_no, 1 ,up_diag);
free_ivector(up_row_no,1, up_diag);
free_ivector(bar_list,1,n);
}

/* ---------------- twod_jacobi -------------- */
/* Matrix to be transformed is passed in 'a'. Transformed matrix is
returned in 'a' */
void twod_jacobi(int size, int *pairs, float *rotaf, float **a)
{
int cnt, row, col;
float s,c, p1;

/* transform the rows */
for (row = 1; row <= size; row++) {
for (cnt = 1; cnt <= size -1; cnt +=2) {
s = rotaf [cnt]; c = rotaf [cnt +1];
p1 = a [row] [pairs[cnt]];
a [row] [pairs[cnt]] = (p1 * c) - (a [row] [pairs [cnt+1] ] * s);
a [row] [ pairs[cnt+1]] = (a [row] [pairs [cnt+1] ] * c) + (p1 * s);
}
}
109

/* transform the columns */
for (col = 1; col <= size; col++) {
for (cnt = 1; cnt <= size -1; cnt += 2) {
p1 = a [pairs[cnt]] [col];
a [pairs[cnt]] [col] = (p1 * c) - (a [pairs [cnt+1] ] [col] * s);
a [ pairs[cnt+1]] [col] = (a [pairs [cnt+1]] [col] * c) + (p1 * s);
}
}

}

/* ----------------- jact ------------- */
/* vector to be transformed is passed in 'a'. Transformed vector is
returned in 'a'. pairs contains the indices for the pairs of points to
be rotated. rotaf contains the the cosine and sines of the angle of
rotation */
void jact(int size, int *pairs, float *rotaf, float *a)
{
int cnt;
float s,c, p1;

/* printf("\nDoing Jacobi transformation"); */
for (cnt = 1; cnt <= size -1; cnt +=2) {
p1 = a [pairs[cnt]];
a [pairs[cnt]] = (p1 * c) - (a [pairs [cnt+1] ] * s);
a [ pairs[cnt+1]] = (a [pairs [cnt+1] ] * c) + (p1 * s);
}
}
110

/* ----------------- wisnet.c --------------------- */
#include <stdio.h>
#include "wisnet.h"
#include <math.h>

#define INT_LEN 16
#define MAX_INT 65536
#define MAX_SGNED_INT 32766
#define MIN_SGNED_INT -32767

/* -------------------- wisnet ------------------- */
unsigned wisnet(unsigned N, unsigned sets, unsigned* a,
unsigned* net, unsigned part,
unsigned scram, unsigned* perm, unsigned train)
/* 'N' size of input data vector
'sets' actual number of RAM sets = N*INT_LEN/part, this ensures no
spurious net training and recalls. Used as a check.
input data sent in 'a'
'net' current net state
'part' partition size
'scram' scramble bit stream flag
'perm' permutation data used to scramble the bit stream
'train' train or recall flag
*/

{
unsigned cnt, cnt2, i, set_num, addr_int_ind, addr_arry_ind, address, tmp,
part_len;
/* part_len = the maximum number of bits that can be addresed with the
given partion size ie 2^part */
/* unsigned bit_str[32], pbit_str[16]; */
unsigned *bit_str, *bit_str2, *pbit_str;
unsigned score = 0; /* only used in recall mode */

if (part > INT_LEN) nrerror("Ntuple size too large");

part_len = 1 << part;
/* make bit stream arrays a little larger to cope with all 'part' sizes
less than INT_LEN */
bit_str = ivector(1,(N+2)*INT_LEN);
bit_str2 = ivector(1,(N+2)*INT_LEN);
pbit_str = ivector(1,part);
/* init bit stream */
for (i=1; i <= (N+2)*INT_LEN; i++) {bit_str[i] = 0; bit_str2[i] = 0;}

/* create bit stream */
for (cnt=1; cnt <= N; cnt++)
for (cnt2=1; cnt2 <= INT_LEN; cnt2++) {
if (a[cnt] & (1 << (cnt2-1)) ) {
bit_str2 [ INT_LEN* (cnt-1) + cnt2 ] = 1;
}
111

}

if (scram) {
/* scramble bit stream */
for (cnt = 1; cnt <= N*INT_LEN; cnt++)
bit_str [ perm[cnt] ] = bit_str2[cnt];
}
else {
/* keep it straight */
for (cnt = 1; cnt <= N*INT_LEN; cnt++)
bit_str [cnt] = bit_str2[cnt];
}

/* Number of sets */
set_num = (N * INT_LEN) / part;
/* if remainder after division, round up */
if ((N*INT_LEN) % part != 0) set_num++;
/* nrerror("Incorrect set number");
if (set_num != sets)
nrerror("Incorrect partition size"); */

/* decode address , train or recall */
for (cnt=1; cnt <= set_num; cnt++) {
/* get partition bit stream */
for (cnt2=1; cnt2 <= part; cnt2++)
pbit_str[cnt2] = bit_str[ (cnt-1) * part + cnt2 ];
address = 0;
/* get address */
for (cnt2=1; cnt2 <= part; cnt2++)
if (pbit_str[cnt2])
address += 1 << (cnt2-1);
/* addr_arry_ind = ( (address + (cnt * part_len) ) / INT_LEN ) + 1 */
addr_arry_ind = (unsigned) (( ( (long) (cnt-1) * (long) part_len) / (long) INT_LEN )
+ 1L + (long) address/ (long) INT_LEN );
/* addr_int_ind = ( address + (cnt * part_len) ) % INT_LEN */
addr_int_ind = (unsigned) ( ( (long) address % (long) INT_LEN)
+ (( (long) (cnt-1) * (long) part_len) % (long) INT_LEN));
if (train)
net[addr_arry_ind] = net[addr_arry_ind] | (1 << addr_int_ind);
else if ( net[addr_arry_ind] & (1 << addr_int_ind) )
score++;
}

free_ivector(bit_str,1,(N+2)*INT_LEN);
free_ivector(bit_str2,1,(N+2)*INT_LEN);
free_ivector(pbit_str,1,part);

return score;
}

/* ----------------- vec_flt_to_int -------------------- */
void vec_flt_to_int(unsigned N, float* a, unsigned* b, int twosFlag)
/* float vector sent in 'a'
integer vector returned in b. Integer valuse can be either signed two's complement or unsigned
biased positive values*/
{
112

unsigned cnt;

if (twosFlag == FALSE) {
for (cnt=1; cnt<= N; cnt++) {
if (a[cnt] > (float) MAX_SGNED_INT || a[cnt] < (float) (MIN_SGNED_INT+1) )
nrerror("vec_flt_to_int error");
b[cnt] = (unsigned) (a[cnt] + (float) MAX_SGNED_INT);
}
}
else {
for (cnt=1; cnt<= N; cnt++) {
if (a[cnt] > (float) MAX_SGNED_INT || a[cnt] < (float) (MIN_SGNED_INT+1) )
nrerror("vec_flt_to_int error");
b[cnt] = (int) a[cnt];
}
}
}

/* ---------------- init_scramble ------------- */
/* This routine is needlessly complicated */
void init_scramble(int size, int *iord_tmp)
{
int *iord_tmp2;
int cnt, city_pick_ind, city_pick_ind_ind, city_cnt, tmp_city,
first_city_ind, tmp_city2, idum = -1;
int i,j,fit;

iord_tmp2 = ivector(1,size); /* !!! free it */

for (cnt =1; cnt <= size; cnt++) {
iord_tmp2[cnt] = cnt;
iord_tmp[cnt] = cnt;
}

for (city_cnt = 1 ; city_cnt <= size ; city_cnt++) {
city_pick_ind_ind =1+(int) ( (size- city_cnt + 1) *ran3(&idum));
city_pick_ind = iord_tmp2[city_pick_ind_ind];
for (cnt = city_pick_ind_ind; cnt < size-city_cnt+1; cnt++)
iord_tmp2[cnt] = iord_tmp2[cnt+1];
if (city_cnt == 1) {
tmp_city = iord_tmp [city_pick_ind];
first_city_ind = city_pick_ind;
}
else {
tmp_city2 = iord_tmp[city_pick_ind];
iord_tmp[city_pick_ind] = tmp_city;
tmp_city = tmp_city2;
if (city_cnt == size)
iord_tmp[first_city_ind] = tmp_city;
}
}

/* check for legal permutation */
for (i= 1; i <= size; i++) {
fit = FALSE;
for (j=1; fit == FALSE; j++) {
113

if (iord_tmp[j] == i) fit = TRUE;
if (j > size) { printf("\nj= %d i= %d",j,i); nrerror("\nScramble error");}
}
}

free_ivector(iord_tmp2,1,size);
}

/* --------------- netperform ------------------ */
int netperform(unsigned N, unsigned* a, unsigned *b, int* bcnt1, int* bcnt2)
/* 'N' size of input data vector
input data sent in 'a' and 'b'
'bcnt1' and 'bcnt2' return the number of bits set in 'a' and 'b' respectively
function returns the overlap between the two sets of data.
*/
{
int olap= 0, bc1=0, bc2=0;
int *bit_str1, *bit_str2;
unsigned i, cnt, cnt2;


/* init bit streams */
for (i=1; i <= (N+1)*INT_LEN; i++) bit_str1[i] = 0;
for (i=1; i <= (N+1)*INT_LEN; i++) bit_str2[i] = 0;

/* create bit stream */
for (cnt=1; cnt <= N; cnt++)
for (cnt2=1; cnt2 <= INT_LEN; cnt2++) {
if (a[cnt] & (1 << (cnt2-1)) ) {
bc1++;
}
if (b[cnt] & (1 << (cnt2-1)) ) {
bc2++;
}
}

for (cnt=1; cnt <= N*INT_LEN; cnt++)
if (bit_str1[cnt] == bit_str2[cnt]) olap++;

printf("\nbcnt1= %d bcnt2= %d olap= %d",bc1,bc2,olap);


*bcnt1 = bc1; *bcnt2 = bc2;
return olap;
}
114

/* ---------- topt.c -------------- */
/* based on NRC program for the solution of the travelling
sales man problem */
#include "main.h"
#include "stenrutil.h"
#include "topt.h"
#include <stdio.h>
#include <math.h>
#include "marith.h"
#include "perm.h"
#include "covar.h"
#include <math.h>
#include <conio.h>
#include "trans.h"

#define TFACTR 0.9

/* ---------------- topt --------------------- */
/* Modified version of the anneal function in NRC. See "Numerical Recipes in C" for a full
description of how anneal works */
/* Find a permutation that results in reduced off diagonal enrgy in the matrix 'cv' */
/* permutaion returned in b, original signal covariance matrix sent in
cv */
void topt(int size, int *b, float **cv)
{
int ans,nover,nlimit,idum;
unsigned long int iseed;
int j,k,nsucc,nn,idec;
static int n[7];
float ode, ode_new ,de,t;

int *iord_tmp, *iorder, ncity=size, cnt;
float ode_orig;

int tmp_city, tmp_city2, *iord_tmp2, city_cnt,
city_pick_ind, city_pick_ind_ind, first_city_ind, no_of_cities,
tsp_move;

float **cv_cpy;

printf("\nReversal and transportation perms (1 or 0): ");
tsp_move = get_int(0,1,1);

printf("\nOptimising permutations");

iord_tmp = ivector(1,ncity);
iorder = ivector(1,ncity);
iord_tmp2 = ivector(1,ncity);
cv_cpy = matrix(1,size,1,size);
copy_mat(cv, cv_cpy);

115

init_perm_order(size, iorder);
nover=ncity*4;
nlimit=ncity;
t=0.5f;
ode = perm_cost(size, iorder, cv_cpy);

ode_orig = off_diag(size, cv,TRUE);
printf("\n %s %10.6f %s %12.6f %s\n","T =",t,
" Off diag Energy =",ode*100.0/ode_orig,"%");
disp_perm(size, iorder);
idum = -1;
iseed=111;
for (j=1;j<=100;j++) {
nsucc=0;
for (k=1;k<=nover;k++) {
if (kbhit())
nrerror("Aborted by user");
do {
n[1]=1+(int) (ncity*ran3(&idum));
n[2]=1+(int) ((ncity-1)*ran3(&idum));
if (n[2] >= n[1]) ++n[2];
nn=1+((n[1]-n[2]+ncity-1) % ncity);
} while (nn<3);
for (cnt=1;cnt<=ncity;cnt++) /* copy perm */
iord_tmp[cnt]=iorder[cnt];
b[cnt]=iorder[cnt];

if (tsp_move) {
idec=irbit1(&iseed);
if (idec == 0) {
n[3]=n[2]+(int) (abs(nn-2)*ran3(&idum))+1;
n[3]=1+((n[3]-1) % ncity);
trnspt(iord_tmp,ncity,n);
ode_new = perm_cost(size, iord_tmp, cv_cpy);
de = ode_new - ode;
ans=metrop(&de,&t);
if (ans) {
++nsucc;
iorder[cnt]=iord_tmp[cnt];
ode = ode_new;
}
} else {
reverse(iord_tmp,ncity,n);
de = ode_new - ode;
ans=metrop(&de,&t);
if (ans) {
++nsucc;
ode = ode_new;
116

}
}
}
else {
for (cnt =1; cnt <= ncity; cnt++)
iord_tmp2[cnt] = cnt;
no_of_cities =2+(int) ((ncity-1)*ran3(&idum));
if (no_of_cities > size) nrerror("Random error anneal.c");

for (city_cnt = 1 ; city_cnt <= no_of_cities ; city_cnt++) {
city_pick_ind_ind =1+(int) ( (ncity- city_cnt + 1) *ran3(&idum));
city_pick_ind = iord_tmp2[city_pick_ind_ind];
for (cnt = city_pick_ind_ind; cnt < ncity-city_cnt+1; cnt++)
iord_tmp2[cnt] = iord_tmp2[cnt+1];
if (city_cnt == 1) {
tmp_city = iord_tmp [city_pick_ind];
first_city_ind = city_pick_ind;
}
else {
tmp_city2 = iord_tmp[city_pick_ind];
iord_tmp[city_pick_ind] = tmp_city;
tmp_city = tmp_city2;
if (city_cnt == no_of_cities)
iord_tmp[first_city_ind] = tmp_city;
}
}
de = ode_new - ode;
ans=metrop(&de,&t);
if (ans) {
++nsucc;
ode = ode_new;
}
}
if (nsucc >= nlimit) break;
}
printf("\n %s %10.6f %s %12.6f %s\n","T =",t,
" Off diag Energy =",ode*100.0/ode_orig,"%");
printf("Successful Moves: %6d\n",nsucc);
disp_perm(size, iorder);
t *= TFACTR;
if (nsucc == 0) {
free_ivector(iorder,1,ncity);
free_ivector(iord_tmp,1,ncity);
free_ivector(iord_tmp2,1,ncity);
free_matrix(cv_cpy,1,size,1,size);
return;
}
}
free_ivector(iorder,1,ncity);
free_ivector(iord_tmp,1,ncity);
free_ivector(iord_tmp2,1,ncity);
free_matrix(cv_cpy,1,size,1,size);
117

}

/* --------- perm_cost -------- */
/* Apply the permutation, and the RDFT and calculate the new off diagonal energy in the
normalised covariance matrix */
/* signal covariance sent in cv and transformed covariance returned in cv.
Permutation order in p_order. Returns the
off diagonal energy in the transformed normalized covariance matrix
the information contained in cv is over written */
float perm_cost(int size, int *p_order, float **cv)
{
float ode, **tmp;

tmp = matrix(1,size,1,size);

twod_perm(size, cv, p_order);
two_d_rdftrans(size, cv);
covar_norm(size, cv, tmp);
copy_mat(tmp,cv);
ode = off_diag(size, tmp,TRUE);

free_matrix(tmp,1,size,1,size);

return ode;
}

/* --------------------- twinrev ---------------------------- */
/* Swap two elements */
void twinrev(int iorder[] ,int ncity, int n[])
{
int itmp;

itmp=iorder[n[1]];
iorder[n[1]]=iorder[n[2]];
iorder[n[2]]=itmp;
}

/* ----------------------- reverse -------------------------- */
/* see NRC */
void reverse(int iorder[], int ncity, int n[])
{
int nn,j,k,l,itmp;

n[3]=1 + ((n[1]+ncity-2) % ncity); /* was in revcst */
n[4]=1 + (n[2] % ncity);

nn=(1+((n[2]-n[1]+ncity) % ncity))/2;
for (j=1;j<=nn;j++) {
k=1 + ((n[1]+j-2) % ncity);
l=1 + ((n[2]-j+ncity) % ncity);
itmp=iorder[k];
iorder[k]=iorder[l];
iorder[l]=itmp;
}
}

118

/* ----------------------------------- trnspt --------------------------- */
/* see NRC */
void trnspt(int iorder[], int ncity, int n[])
{
int m1,m2,m3,nn,j,jj,*jorder,*ivector();
void free_ivector();

n[4]=1 + (n[3] % ncity); /* was in revcst */
n[5]=1 + ((n[1]+ncity-2) % ncity);
n[6]=1 + (n[2] % ncity);

jorder=ivector(1,ncity);
m1=1 + ((n[2]-n[1]+ncity) % ncity);
m2=1 + ((n[5]-n[4]+ncity) % ncity);
m3=1 + ((n[3]-n[6]+ncity) % ncity);
nn=1;
for (j=1;j<=m1;j++) {
jj=1 + ((j+n[1]-2) % ncity);
jorder[nn++]=iorder[jj];
}
if (m2>0) {
for (j=1;j<=m2;j++) {
jj=1+((j+n[4]-2) % ncity);
}
}
if (m3>0) {
for (j=1;j<=m3;j++) {
jj=1 + ((j+n[6]-2) % ncity);
}
}
for (j=1;j<=ncity;j++)
iorder[j]=jorder[j];
free_ivector(jorder,1,ncity);
}

/* ---------------------------- metrop ---------------------------- */
/* see NRC */
int metrop(float *de, float *t)
{
static int gljdum=1;
float ran3();
return *de < 0.0f || ran3(&gljdum) < (float)exp( - ( *de )/ (*t) );
}
119

/* --------------- perm.c ------------------- */
#include "marith.h"
#include <stdio.h>

/* -------- twod_perm ---------------- */
void twod_perm(int size, float **a, int *perm)
{
float **tmp;
int row,col;

tmp = matrix(1,size,1,size);
/* printf("\nTwo D Permutaion"); */
copy_mat(a,tmp);

/* permuting the rows */
/* printf("\nPermute rows"); */
for (col = 1; col <= size; col++)
a[row] [col] = tmp [perm[row] ] [col];

copy_mat(a,tmp);
/* permuting the columns */
/* printf("\nPermute columns"); */
a[row] [col] = tmp [row] [ perm[col] ];

free_matrix(tmp,1,size,1,size);
}

/* ------- init_perm_order -------- */
void init_perm_order(int size, int *a)
{
int cnt;

for (cnt = 1; cnt <= size; cnt++)
a[cnt] = cnt;
}

/* -------- disp_perm --------- */
void disp_perm(int size, int *a)
{
int col;

for (col=1 ;col <= size; col++)
printf("%5d ", a[col]);
printf("\n");
}
120

/* ---------------- hopnet.c ------------ */
/* Program partially based on the interactive activation
and competetion (iac) program in,
McClelland.J.L & Rumelhart.D.E, "Explorations
in Parallel Distributed Processing" */
#include "main.h"
#include "hopnet.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include "utl.h"
#include "matprt.h"

#define rnd() ((float)rand()*3.0518507e-5)
#define ACONST 0.9f
#define BCONST 0.9f
#define CCONST 0.2f
#define DCONST 0.1f

int num_its = 5, sim_ann = FALSE;
float excite_bias = 0.5f;
float net_strength = 1;
float temp_start = 10.0f, temp_red_rate = 0.9f;

/* --------- set_hop_it_num ------------- */
/* Get run time parameters from the user */
int set_hop_it_num(void)
{
clr_scrn();
printf("Number of iterations: ");
num_its = get_int(0,30000,5);
printf("\nNet input strength scaler: ");
net_strength = get_flt(0.0f,10.0f,2);
printf("\nExcitation bias: ");
excite_bias = get_flt(0.0f,10.0f,2);
printf("\nSimulated Annealing (1 or 0): ");
sim_ann = get_int(0,1,1);
if (sim_ann) {
printf("\nStarting Temp: ");
temp_start = get_flt(0.0f,1000.0f,4);
printf("\nTemp rate of change: ");
temp_red_rate = get_flt(0.0f,1.0f,4);
}
return 0;
}

/* -------- ind_eq --------- */
int ind_eq(int mod, int a, int b)
{
/* if (a > mod)
121

a = a % mod + 1; */
if (a!=b)
return 0;
else
return 1;
}

/* ------- calc_goodness -------- */
float calc_goodness(int size, float **hopactiv, float *a)
/* hopfield net node activation levels sent in 'hopactiv',
signal mean sent in 'a' . For every node calculate the netinput,
apply a few fiddle factors and then multiply by the node activation.
Sum for all nodes. */
{
int i,j;
float netinput, goodness;

goodness = 0.0f;
for (i = 1; i<= size; i++) {
for (j = 1; j<=size; j++) {
netinput = calc_netinput(size, hopactiv, a, i, j);
netinput += excite_bias;
netinput *= net_strength;
goodness += netinput * hopactiv[i] [j];
}
}
return goodness;
}

/* ------- calc_hop_perm -------- */
int calc_hop_perm(int size, float **a, int *perm)
/* send Hopfield activation levels in 'a', returns validity of permutations,
ie If their is only one node active for each row column pair then the matrix
is valid. Permutation returned in perm */
{
int X,j, found_row_entry=TRUE, found_col_entry=TRUE;

for (X = 1; X<= size; X++) {
if (found_col_entry == FALSE) return FALSE;
else found_col_entry = FALSE;
if (found_row_entry == FALSE) return FALSE;
else found_row_entry = FALSE;
for (j = 1; j<=size ; j++) {
if (a[X] [j] >= 0.99f) {
if (found_col_entry) return FALSE;
else found_col_entry = TRUE;
perm [j] = X;
}
if (a[j] [X] >= 0.99f)
if (found_row_entry) return FALSE;
else found_row_entry = TRUE;
}
}
return TRUE;
122

}

/* -------- logistic ---------- */
/* iac logistic function for conversion of input */
float logistic(float *input, float *temper)
{
return (1.0f / (1.0f + (float) (exp(-(*input)/(*temper)))));
}

/* --------- probability ------ */
/* iac probabilistic update rule. Used in conjunction with
logistic function */
int probabil(float *prob1)
{
return((rnd() < (*prob1)) ? 1 : 0);
}

/* -------- calc_netinput ------------ */
/* Hopfield net node activation levels are passed in hopactiv,
centralised signal is sent in a, row and col number of node for which
netinput is to be calculated is sent in i and j */
float calc_netinput(int size, float **hopactiv, float *a, int i, int j)
{
float netinput;
int row,col, col_next, col_prev;

netinput = 0.0f;
/* Inhibition within the cols */
if (row != j)
netinput += hopactiv [row] [j] * -ACONST;
}

/* Inhibition within the rows */
for (col = 1; col <= size; col++) {
if (col != i)
netinput += hopactiv [i] [col] * -BCONST;
}

/* Inhibition between adjacent colums proportional to the distance */
if (j == 1) col_prev = size;
else col_prev = j - 1;
if (j == size) col_next = 1;
else col_next = j + 1;
netinput += (hopactiv [row] [col_next] + hopactiv [row] [col_prev] )
* (max(a[row],a[i]) - min(a[row],a[i]) )
* -DCONST;
}

/* Global Inhibition */
netinput += hopactiv [row] [col] * -CCONST;
/* Don't inhibit self */
netinput = netinput - (hopactiv [i] [j] * -CCONST);
123

return netinput;
}

/* ------ run_hopnet ------ */
void run_hopnet(int size, float *a, int *perm)
/* This is the top level function */
/* send signal vector in 'a'. Returns permutation in perm */
{
int i,j, it_num,cyc, valid_mat;
float netinput, goodness, temp, tmp_log;
double tmp;
float **hopactiv;
int rnd_i, rnd_j;
int idum = -123; /* seed for random number generator */

hopactiv = matrix(1,size,1,size);

temp = temp_start;
clr_scrn();
for (i = 1; i<= size; i++) /* set activations to zero */
for (j = 1; j<= size; j++)
hopactiv [i] [j] = 0.0f;

disp_mat2(hopactiv,"Hopfield activation levels");

for (it_num = 0; it_num < num_its; it_num++) {
temp *= temp_red_rate; /* simulated annealing schedule */
if (temp < 1e-20f) break; /* If temperature is close to zero then quit */
for (cyc = 1; cyc <= (size * size); cyc++) {
tmp = rnd() * size;
rnd_i = 1 + (int) (ran3(&idum) * size); /* pick node at random */
rnd_j = 1 + (int) (ran3(&idum) * size);
if (rnd_i > size || rnd_j > size ||
rnd_i < 1 || rnd_j < 1) { /* bug trap */
printf("\nrnd_i= %d rnd_j= %d",rnd_i,rnd_j);
nrerror("Run Hopnet error ");
}

netinput = calc_netinput(size, hopactiv, a, rnd_i, rnd_j);
netinput += excite_bias;
netinput *= net_strength;
if (sim_ann) {
tmp_log= logistic(&netinput, &temp);
if (probabil(&tmp_log) == 1)
hopactiv [rnd_i] [rnd_j] = 1.0f;
else
hopactiv [rnd_i] [rnd_j] = 0.0f;
}
else {
if (netinput > 0.0f)
hopactiv [rnd_i] [rnd_j] += netinput
* (1.0f - hopactiv [rnd_i] [rnd_j] );
else
hopactiv [rnd_i] [rnd_j] += netinput * hopactiv [rnd_i] [rnd_j] ;
if (hopactiv [rnd_i] [rnd_j] > 1.0f) hopactiv [rnd_i] [rnd_j] = 1.0f; /* limit output */
124

if (hopactiv [rnd_i] [rnd_j] < 0.0f) hopactiv [rnd_i] [rnd_j] = 0.0f; /* limit output */
}
}
goodness = calc_goodness(size, hopactiv, a); /* calc goodness for whole net */
printf("\nGoodness = %f ",goodness);
}
valid_mat = calc_hop_perm(size, hopactiv , perm); /* check for valid permutation */
if (valid_mat);
else printf("\n-------INVALID PERMUTATION");
pause();

clr_scrn();
disp_mat2(hopactiv,"Hopfield activation levels");
pause();

free_matrix(hopactiv, 1,size,1,size);

}
125

/* ------------------------------------ tsthop.c ------------------------ */
/* simple test, with small data set */
#include "main.h"
#include "covar.h"
#include <stdio.h>
#include "graph2.h"
#include "trans.h"
#include "utl.h"
#include "walsh.h"
#include "matprt.h"
#include "nr.h"
#include "perm.h"
#include "topt.h"
#include "hopnet.h"

main()
{
float *a;
int *perm;

a = vector(1,SL);
perm = ivector(1,SL);

a[1] = 0.1f;
a[2] = 0.8f;
a[3] = 0.2f;
a[4] = 0.7f;
a[5] = 0.3f;
a[6] = 0.6f;
a[7] = 0.4f;
a[8] = 0.5f;

set_hop_it_num();
run_hopnet(SL, a, perm);
disp_perm(SL, perm);

free_vector(a,1,SL);
free_ivector(perm,1,SL);

return 1;
}
126

APPENDIX C

ASSEMBLER CODE LISTINGS

;--------------------------------------- DSPJact.asm -----------------------
; one dim Jacobi transform
;designed to run on Ariel DSP56000 PC card (PC56)
; tested and working August 26, 1993
include 'ioequ.asm' ;include standard I/O port equates file.

DEFINE BEGIN '$100'
DEFINE DSPJACT_INTVEC '$28'
DEFINE X_START '$0'
DEFINE Y_START '$0'

MAX_VECT_SIZE EQU 128

org x:X_START
rotaf ds MAX_VECT_SIZE

;test value
;rotaf dc 0.866025403,0.5,0.866025403,0.5 ;sin 60, cos 60,sin 60, cos 60

org y:Y_START
pairs ds MAX_VECT_SIZE
data ds MAX_VECT_SIZE
size ds 1

;test value
;pairs dc 0,1,2,3
;data dc 0.9,0.2,0.9,0.2
;size dc 4

org p:DSPJACT_INTVEC
jsr DSPJact ;do N/2 Jacobi plane rotations

org p:BEGIN
begin
bclr #M_HF2,x:<<M_HCR
waitForInt
jmp waitForInt

DSPJact
move #pairs,r2 ;array of indices into data
move #rotaf,r1 ;an array of angles of rotation for pairs
move #data,r4 ;data vector to be transformed, contains coeff
;on exit
nop
move (r4)- ;only needed for stejacob.c, where indices in
;pairs start at 1
nop
move r4,r3
127

move #-1,m1
move m1,m2
move m1,m3
move m1,m4
move x:(r1)+,x1 ;x1 = Sin(*)
move x:(r1)+,x0 ;x0 = Cos(*)
move y:(r2)+,n4 ;n4 = pairs[i]
move y:(r2)+,n3 ;n3 = pairs[i+1]
move y:(r4+n),y1 ;y1 = data[n4] = p1
move y:size,a
asr a ;loop by size/2
do a1,allCoeffDone ;loop over all data
mpy -x1,y0,a ;a = -sin(*)p2
macr y1,x0,a ;a += Cos(*)p1
move a,y:(r4+n) ;store result
mpy x0,y0,a y:(r2)+,n4 ;a=Cos(*)p2 n4=pairs[i]
macr x1,y1,a x:(r1)+,x1 ;a+=Sin(*)p1 x1=Sin(*)
move a,y:(r3+n) ;store result
move x:(r1)+,x0 ;x0 = Cos(*)
move y:(r2)+,n3 ;n3 = pairs[i+1]
allCoeffDone
bset #M_HF2,x:<<M_HCR ;flag host. Processing finished
;return from int. Host freezes processor and reads data. The program
;is then restarted from 'begin' and HF2 is cleared at this point.
rti

END begin
128

;--------------- wisnet.asm
;tested and working August 24, 1993
;Function could be more efficient if scrambling did not require two vectors
;and bit stream was capable of storing more than 1 bit per word
;Performs recall only. The variables 'data', 'dataSize', 'NTupleSize',
; 'perm', and'net'must be passed as parameters at call time.

FALSE EQU 0
TRUE EQU 1

;conditional assembly flags
POWER2_NTUPLE_LEN EQU FALSE
POWER2_WORD_LEN EQU FALSE

MAX_DATA_LENGTH EQU 20
MAX_NTUPLE_SIZE EQU 8
LOG2_WORD_LENGTH EQU 4 ;optimally 24, but to mimic microsoft C set to 16
;test value
;LOG2_WORD_LENGTH EQU 1
WORD_LENGTH EQU @CVI(@POW(2,LOG2_WORD_LENGTH))

increment macro var
move var,a
move #>1,x0
add x0,a
move a,var
endm

decrement macro var
move var,a
move #>1,x0
sub x0,a
move a,var
endm

ORG Y:$0
data ds MAX_DATA_LENGTH
dataSize ds 1
NTupleSize ds 1
;add 1 to bit stream length to cope with all NTuple sizes
bitStrScram ds (MAX_DATA_LENGTH*WORD_LENGTH)+1
bitStr ds MAX_DATA_LENGTH*WORD_LENGTH
score ds 1
numberOfNTuples ds 1
pow2NTupleSize ds 1
dividend ds 1

;test values
;data dc 3,0,0,3,0,0,3,0
;dataSize dc 8
;NTupleSize dc 4
;bitStrScram ds MAX_DATA_LENGTH*WORD_LENGTH
;bitStr ds MAX_DATA_LENGTH*WORD_LENGTH
129

;score ds 1
;numberOfNTuples ds 1
;pow2NTupleSize ds 1
;dividend ds 1

ORG X:$0
perm ds MAX_DATA_LENGTH*WORD_LENGTH
net ds
@CVI(((@POW(2,MAX_NTUPLE_SIZE)*MAX_DATA_LENGTH)/MAX_NTUPLE_SIZE)+(
@POW(2,MAX_NTUPLE_SIZE)/WORD_LENGTH)+1)
cnt ds 1

;test values
;perm dc 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
;net dc 0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
;cnt ds 1
org p:$26
jsr wisnet

ORG P:$100
begin
waitForInt
jmp waitForInt

wisnet
move #data,r1
move #bitStrScram,r2
move #bitStr,r3
move #perm,r4
move #net,n5
move #-1,m1
move m1,m2
move m1,m3
move m1,m4
move m1,m5
move #>WORD_LENGTH,x1 ;reserve x1 for WORD_LENGTH
move y:NTupleSize,y1 ;reserve y1 for NTupleSize

;generate result 2^NTupleSize
move #>1,a
rep y1
asl a ;a= 1 << NTupleSize
move a,y:pow2NTupleSize ;pow2NTuple = 2^NTupleSize

move #>1,x0 ;set up x0 and y0 for loop
move #0,y0
move y:dataSize,a
;convert data to bit stream. 1 bit per word and prepare
;bitStrScram by clearing
do a1,bitStreamDone ;for every word
move y:(r1)+,b0 ;put data word in b0
do x1,loopOverWord ;for every bit in a word
asr b
jcc clearBitInStream
move x0,y:(r3)+
130

jmp bitSetDone
clearBitInStream
move y0,y:(r3)+
bitSetDone
move y0,y:(r2)+ ;clear bitStrScram
loopOverWord
nop
bitStreamDone
move y0,y:(r2) ;clear the last word of bitStrScram

;Scramble bit stream
;when using r2 with C subtract 1 first to account for
;difference in starting location
move y:dataSize,y0
mpy x1,y0,a ;a= WORD_LENGTH * dataSize
asr a ;convert to integer
move #bitStrScram,r2
move #bitStr,r3
move (r2)- ;modified for MS C version
do a0,scrambleDone
move x:(r4)+,n2 ;get perm index
move y:(r3)+,x0 ;get bit stream bit
move x0,y:(r2+n) ;copy bit into scrambled bit stream at perm index
scrambleDone

;calc number of NTuples
move y:dataSize,y0
mpy x1,y0,b ;b= WORD_LENGTH * dataSize
asr b ;convert to integer
move b0,a ;dividend must be in a
jsr divideByNTupleSize
;numberOfNTuples=(WORD_LENGTH*dataSize)/NTupleSize

tst b ;if remainder greater than zero
jeq NTupleNumberOK
move #>1,x0 ;increment numberOfNTuples
add x0,a

NTupleNumberOK
move a1,y:numberOfNTuples

move #0,x0
move x0,x:cnt ;initialise cnt
move x0,y:score ;and score

do a1,allNTuplesDone ;for every NTuple
;get NTuple from scrambled bit stream (i.e the Ntuple address) and store in b
;do in reverse order
move x:cnt,x0

move #>1,a
add x0,a
move a,x0 ;increment cnt by 1

mpy x0,y1,a ;a=(cnt+1)*NTupleSize
131

move a0,y0
move #>bitStrScram,a
add y0,a ;a=(cnt+1)*NTupleSize + bitStrScram address
move a1,r2 ;r2 is new index into bitStrScram
nop ;allow r2 update
move (r2)- ;point to end of NTuple word
clr b ;clear b before loop
move y:(r2)-,a ;get first bit from bitStrScram
do y1,getNTupleWordDone ;loop by NTupleSize
tst a y:(r2)-,a ;test bit and get next
jne setCCR
andi #$fe,CCR ;clear carry flag
jmp finishedCCRSet
setCCR
ori #$1,CCR ;set carry flag
finishedCCRSet
rol b ;carry flag rotated into LSB of b1
getNTupleWordDone

;generate result (2^NTupleSize*cnt + NTupleAddress)
move y:pow2NTupleSize,y0 ;y0 = 2^NTupleSize
move x:cnt,x0
mpy x0,y0,a ;a=cnt*(2^NTupleSize)
move a0,x0
add x0,b ;b=(2^NTupleSize*cnt + NTupleAddress)
move b,a

;generate results 'a/WordLength' = wordAddress (returned in a1)
;and 'a%WordLength' = bitAddress (returned in b1)
jsr divideByWordLength

;get net contents and test
move a1,r5 ;r5=word address n5=net base address
move #>1,x0 ;x0=1
move x0,a ;a=1
move x:(r5+n),y0 ;y0= net contents
tst b
jeq qout0 ;avoid rep #0
rep b1 ;repeat bitAddress times
asl a ;a=1 << bitAddress
qout0
and y0,a ;a= 1 << bitAddress AND netContents
jeq bitTestDone ;if a=0 finish
increment y:score ;else increment score
bitTestDone
increment x:cnt ;increment cnt
allNTuplesDone
bset #M_HF2,x:<<M_HCR ;flag host score to be sent
waitForTxClr
btst #M_HTDE,X:<<M_HSR ;wait for tx empty
jcc waitForTxClr
movep y:score,X:<<M_HTX ;send score to host port
waitForHost
btst #M_HF0,x:<<M_HSR ;wait for host to read data
jcc waitForHost
132

bclr #M_HF2,x:<<M_HCR ;reset HF2
rti

;dataSize*WORD_LENGTH sent in 'a'
;NTuple size must be in y1
divideByNTupleSize
IF POWER2_NTUPLE_LEN ;conditional assembly
move #>4,b ;if power of 2 then use asr for division
cmp y1,b
jle NTupleSize4
rep #3
asr a
jmp NTupleSize8
NTupleSize4
rep #2
asr a
NTupleSize8
move #0,a0
clr b ;no remainder if power2 ntuple len
ELSE
move y1,x0 ;else full integer division
jsr divAX0
ENDIF
rts

divideByWordLength
;returns quotient in 'a1' ,and remainder in 'b1'
IF POWER2_WORD_LEN ;conditional assembly
move #>LOG2_WORD_LENGTH,b ;if power of 2 then use asr for
division
tst b
jeq caseRep0 ;rep #0 results in 65536 repetitions
rep #LOG2_WORD_LENGTH
asr a
caseRep0
move a0,b
rep #(23-LOG2_WORD_LENGTH) ;generate remainder
asr b
ELSE ;else full integer division
move #>WORD_LENGTH,x0
jsr divAX0
ENDIF
rts

;integer division routine. returns quotient (a1) and remainder (b1)
;send quotient in a and divisor in x0
;uses a,b,x0,y0
divAX0
tfr a,b ;save to b, to test for negative after division
move a,y:dividend ;save dividend to generate remainder later
abs a ;make dividend positive
clr a a1,y0
move y0,a0 ;a0=dividend. So |dividend| > |divisor|
asl a ;convert dividend to fraction
rep #$18 ;form 24 bit quotient
div x0,a ;form quotient in a0
133

eor x0,b ;test for negative result
jpl _saveQuo
neg a
_saveQuo
move a0,a ;quotient in a1
move a,y0 ;set up y0 for remainder calculation
;generate remainder
move y:dividend,b ;get dividend and sign extend b
move b,b0 ;use b0 for mac
asl b ;convert to fraction
mac -y0,x0,b ;b= dividend - ( (dividend/divisor) *divisor)
asr b ;convert to integer
move b0,b ;remainder in b1
rts
134

; ------------------ rdft.asm --------------------------
;tested and working September 14, 1993
;This algorithm is based on NRC realft, but with difference that 'fftr2d'
;outputs the frequency halves in reverse order to 'four1'. This means
;that k1 and k2 must be swapped but the same result can be achieved by
;altering the equations for h1i and h2i. h1r and h2r equations remain
;unchanged. h1i=c1*(-data[i2]+data[i4]) h2i=c2(-data[i1]+data[i3]).
;Original real data, size N, is split into even and odd according to index.
;Odd indexed data is loaded as imaginary, and even indexed data is loaded
;as real data as part of a size N/2 complex data vector d(n).
;Data is transformed by in place (courtesy of Motorola) FFT
;to give D(n), and the results
;descrambled to form F(n), size N/2 complex. This vector never sees the
;light of day and is immediately output
;as real vector dOut(n) size N (NRC outputs F(n) as N/2 complex and
;this is a whole lot less confusing)
;with output order as defined by Ersoy's RDFT equation
;F(n) is calculated four points at a time: F(n).real, F(n).imag
;F(N/2-n).real, F(N/2-n).imag. For 0 < n < N/2 apart from n=N/4
;When n=N/4 only two points can be calculated and these are the same as
;D(n). i.e. F[N/4].imag=D[N/4].imag F[N/4].real=D[N/4].real
;Output dOut(n) is arranged thus:
;for 0 < n < N/2
;dOut[n]=F[n].real, dOut[N+1-n]=F[n].imag
;dOut[0]=F[0].real dOut[N/2]=F[N/2].real
;D[N/2] is not generated by N/2 FFT, but due to symmetry this point
;is the same as D[0] and hence F[N/2] can be calculated
;dOut->r6=F->r0.real, dOut->r7=F->r1.real
;dOut->r6+n6=F->r1.imag, dOut->r7+n7=F->r0.imag
;dOut[0]=F[0].real dOut[N/2]=F[N/2].real
;r0 starts at d[1] and is incremented to d[N/4-1]
;r1 starts at d[N/2-1] and is decremented to d[N/4+1]
;r6 starts at dOut[1] and is incremented to dOut[N/4-1]
;r7 starts at dOut[N/2-1] and is decremented to dOut[N/4+1]
;works for 8 <= N <= 128


POINTS equ 128
RDFT_POINTS equ POINTS/2
DATA_START equ 0
COEF_START equ $100
COEF_TABLESIZE equ 256
RDFT_START equ $100

DEFINE RDFT_INTVEC '$30'

include 'sinewave.asm' ;available from Motorola's Dr. BuB
include 'fftr2d.asm' ;available from Motorola's Dr. BuB

org y:COEF_START
;ignore assembler warning 'fractional outside range' see sinewave.hlp
sinewave COEF_TABLESIZE

135

org y:DATA_START
dataY ds RDFT_POINTS
dataYRev ds RDFT_POINTS

org x:DATA_START
dataX ds RDFT_POINTS
dataXRev ds RDFT_POINTS
dOut ds POINTS
;data vector input becomes output vector on completion
;test data (single complete sine wave)
;dOut dc 0,0.00707,0.0099999999,0.00707,0,-0.00707,-0.00999999,-0.00707
;4 temporary storage variables
h1r ds 1
h1i ds 1
h2r ds 1
h2i ds 1

org p:RDFT_INTVEC
jsr rdft

org p:RDFT_START
begin
waitForInt
jmp waitForInt

rdft
move #dOut,r0
move #dataX,r4
move #-1,m0
move m0,m4
move x:(r0)+,x0 ;get first element
move x0,x:(r4) ;pack first real element
do #RDFT_POINTS-1,endComplexPack
move x:(r0)+,y0 ;get even indexed element
move x:(r0)+,x0 y0,y:(r4)+ ;get odd indexed element. Store imag element
move x0,x:(r4) ;Store real element
endComplexPack
move x:(r0),y0 ;get last element
move y0,y:(r4) ;pack last imaginary element

fftr2d RDFT_POINTS,DATA_START,COEF_START,COEF_TABLESIZE
;bit reverse output and store at 'dataYRev'and 'dataXRev'
move #0,m4 ;reverse carry indexing for fft output
move #RDFT_POINTS/2,n4 ;bit rev k lsb's of r4, set n4=2^(k-1)
move #DATA_START,r4
move #-1,m1
move #dataYRev,r1
do #RDFT_POINTS,bitRevDone
move x:(r4),x0
move x0,x:(r1) y:(r4)+n4,y1
move y1,y:(r1)+
bitRevDone

move #-1,m0 ;linear addressing
move m0,m1
136

move m0,m4
move m0,m5
move m0,m6
move m0,m7
move #dataXRev+1,r0 ;k1
move #dataXRev+RDFT_POINTS-1,r1 ;k2
;Sin(2*pi*n/(N/2)) add first increment
move #COEF_START+COEF_TABLESIZE/POINTS,r2
;Cos(2*pi*n/(N/2)) add first increment
move #(COEF_START+COEF_TABLESIZE/4)+COEF_TABLESIZE/POINTS,r3
move #COEF_TABLESIZE/POINTS,n2 ;inc equals 2pi/(N/2)
move #COEF_TABLESIZE/POINTS,n3 ;inc equals 2pi/(N/2)
; move #COEF_TABLESIZE,m2
; move #COEF_TABLESIZE,m3
move #-1,m2
move #-1,m3
move #h1r,r4
move #h2r,r5
move #1,n4 ;r4+n4 points to h1i
move #1,n5 ;r5+n5 points to h2i
move #dOut+1,r6 ;r6 points to storage for real output
move #RDFT_POINTS,n6 ;r6+n6 points to storage for imag output
move #dOut+RDFT_POINTS-1,r7 ;r7 points to storage for real output
move #RDFT_POINTS,n7 ;r7+n7 points to storage for imag output

do #RDFT_POINTS/2-1,deScrambleData
;calculate h1r,h1i,h2r,h2i
;Do (0.5*B)+(0.5*C) rather than 0.5*(B+C) to avoid overflow
move #0.5,x0
move x:(r0),y0 ;y0=D[k1].real
move x:(r1),x1 ;x1=D[k2].real
mpy x0,y0,a ;a=0.5(D[k1].real)
macr x0,x1,a ;a=0.5(D[k1].real+D[k2].real)
mpy x0,y0,a y:(r0)+,y0 a,x:(r4) ;a=0.5(D[k1].real) y0=D[k1].im store h1r increment k1
macr -x0,x1,a y:(r1)-,x1 ;a=-0.5(-d[k1].re+D[k2].re) x1=D[k2].im decrement k2
mpy -x0,y0,a a,x:(r5+n5) ;a=-0.5(D[k1].im) store h2i
macr x0,x1,a ;a=0.5(-D[k1].im+D[k2].im)
mpy x0,y0,a a,x:(r4+n4) ;a=0.5(D[k1].im)
macr x0,x1,a ;a=0.5(D[k1].im+D[k2].im)
move a,x0 ;x0=h2r

;calc new coefficients and store in real vector dOut
move y:(r3)+n3,x1 ;x1=wr, point to next wr
move x:(r4+n4),a ;a=h1i
move x:(r5+n5),y0 ;y0=h2i

mac x1,y0,a y:(r2)+n2,y1 ;a=h1i+wr*h2i dOut->r7=b y1=wi, point to next wi
macr x0,y1,a x:(r4+n4),b ;a=h1i+wr*h2i+wi*h2r b=h1i

neg b a,x:(r7+n7) ;b=-h1i dOut->r7+n7=a
mac x1,y0,b ;b=-h1i+wr*h2i
macr y1,x0,b x:(r4),a ;b=-h1i+wr*h2i+wi*h2r a=h1r

mac x1,x0,a b,x:(r6+n6) ;a=h1r+wr*h2r dOut->r6+n6=b
macr -y1,y0,a x:(r4),b ;a=h1r+wr*h2r-wi*h2i b=h1r

137

mac -x1,x0,b a,x:(r6)+ ;b=h1r-wr*h2r dOut->r6=a inc r6
macr y1,y0,b ;b=h1r-wr*h2r+wi*h2i
move b,x:(r7)- ;dOut->r6+n6=-h1i+wr*h2i+wi*h2r dOut->r7=b dec r7

deScrambleData
;set dOut[0] and dOut[N/2]
move #dataXRev,r0
move #dOut,r6
move #RDFT_POINTS,n6
move x:(r0),a
move y:(r0),y0
add y0,a
move a,x:(r6) ;dOut[0]=d[0].real+d[0].im
move x:(r0),a
sub y0,a
move a,x:(r6+n6) ;dOut[N/2]=d[0].real-d[0].im

;set dOut[N/4] and dOut[3N/4]
move #dataXRev+RDFT_POINTS/2,r0 ;r0 points to D[N/4]
move #dOut+RDFT_POINTS/2,r6 ;r6 points to dOut[N/4] r6+n6 points to
dOut[3N/4]
move x:(r0),x0
move x0,x:(r6)
move y:(r0),x0
move x0,x:(r6+n6)
rdft_fin
bset #M_HF2,x:<<M_HCR ;flag host. Processing finished
;return from int. Host freezes processor and reads data. The program
;is then restarted from 'begin' and HF2 is cleared at this point.
rti

END RDFT_START

138

Thesis 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis 2

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF SUSSEX

METAL DETECTION USING NEURAL NETWORKS

You might also like