10 1 1 120

Lockett, Roblee
Page 1 of 48
6/3/2003
GENETIC ALGORITHM BASED DESIGN AND IMPLEMENTATION OF MULTIPLIERLESS TWODIMENSIONAL IMAGE FILTERS
by Douglas J. Lockett and Christopher D. Roblee ********* Senior Capstone Design Project Submitted in partial fulfillment of the requirements for the degree of Bachelor of Science
Department of Electrical and Computer Engineering Union College Steinmetz Hall Schenectady, New York 12308 U.S.A.
Submitted May 30, 2003
Final Project Report Senior Capstone Design Project, Department of Electrical and Computer Engineering Union College, 2003. 2003 Douglas Lockett, Christopher Roblee
Lockett, Roblee
Page 2 of 48
6/3/2003
Table of Contents:
Abstract.3 1. Introduction..4 2. Theory of Multiplierless Arithmetic...5 3. Image Filters 3.1. Motivations for IIR vs. FIR....7 3.2. Edge Detection ..8 3.3. Canny Edge Detection9 4. Genetic Algorithms 4.1. Motivations...10 4.2. Basic Theory.10 4.3. Description of the Designed Genetic Algorithm..13 4.3.1. Fitness Function Definition and Crossover Selection...17 4.3.2. Magnitude Response and Relative Error...19 4.3.3. GA Parameters...19 5. Results 5.1. Magnitude Frequency Analysis ...21 5.2. Spatial Analysis24 5.3. Sample Filter Output.25 6. Comparative Analysis of Computational Complexity Between Multiplierless and Conventional Design Methodologies 6.1. FPGA Case ..28 6.2. ASIC Case31 7. Future Work...33 8. Conclusions.35 9. Appendices 9.1. Genetic Algorithm Code...38 9.2. Image Filter Code.....46
*Digital copy of report and full results available at www.vu.union.edu/~robleec/capstone

Lockett, Roblee
Page 3 of 48
6/3/2003
ABSTRACT This paper outlines the use of a genetic algorithm to design multiplierless IIR filters for applications in hardware-based image processing. A unique genetic algorithm is developed to optimize filter coefficients such that the corresponding filters frequency response matches that of an ideal system with the constraint that all coefficients are powers-of-two and the resulting filter is stable. The motivation for using power-of-two filter coefficients is to reduce the overall arithmetic complexity in any hardware based implementation by replacing digital multipliers with simpler shift operators. This approach is highly beneficial for image filtering applications that are computationally intensive. The cases considered comprise Cannys edge detection filter as well as an image blur operator. The resulting multiplierless filters are compared to analogous implementations using real multipliers on the basis of complexity (the number of shifts and additions performed), frequency response, and qualitative performance on test images. It is shown that in many cases the multiplierless systems have a definite advantage in terms of their efficiency while maintaining a desired response, making them a viable alternative as image filters. It is demonstrated that custom genetic optimization is a reliable and efficient means for designing such filters.
Lockett, Roblee
Page 4 of 48
6/3/2003
1. INTRODUCTION This report represents the culmination of several months of research into the specific areas of image processing and digital implementations. The central motivation for this work has roots in our individual interests within the disciplines of electrical and computer engineering. This project is unique in how we were able to align these diverse interests in order to develop new methodologies for pre-existing imaging applications. The need for faster, embedded imaging systems has motivated research into various hardware-based imaging solutions. Currently, there exist numerous approaches for the performance improvement of digital image processors. Our objective is to explore a new and effective means to accomplish the optimization of hardware-based, IIR image filters. We propose the use of multiplierless arithmetic reduction to replace conventional digital multiplication operations with less complex shift operations within the filter difference equation. This approach greatly simplifies and improves the speed of existing real-time and non real-time image filters. We develop a custom genetic algorithm in software that is capable of approximating various IIR filter responses by generating multiplierless coefficients. We select this course based upon the proven ability of genetic algorithms to effectively and rapidly isolate a sufficient solution for a given optimization problem. Our objective is therefore to assess a genetic algorithms ability to realize multiplierless filters for certain image processing functions. We furthermore examine qualitatively the application of these multiplierless filters to a set of test images.
Lockett, Roblee
Page 5 of 48
6/3/2003
This paper begins with a general discussion of multiplierless arithmetic and a demonstration of the advantage of using multiplierless coefficients in an IIR difference equation. This is followed by an explanation of image filtering techniques (including the Canny edge detector) and the theory of genetic optimization. We then detail the design of a custom genetic algorithm capable of searching for suitable multiplierless filters. The results of genetic optimization for certain cases are subsequently presented. This initially entails analysis of multiplierless filter response accuracies in both the frequency and spatial domains. This is followed by the results from experimentation on a series of test images. Possible multiplierless hardware configurations, including FPGA and ASIC implementations, are then investigated to ascertain their relative performance advantages over conventional multiplier-based systems. We conclude with a discussion on future research and possible enhancements to the designed genetic algorithm.
2. THEORY OF MULTIPLIERLESS ARITHMETIC Before describing the multiplierless concept, we first note how a single arithmetic shift left (ASL) corresponds to a multiplication by two. Similarly, an n-bit shift left or right (ASR) corresponds to a multiplication or division by 2n, respectively. A binary multiplication represents a computationally intensive and relatively complex operation in any processor. Binary multiplications consist of a series of shifts and additions, which in turn require additional carry and overflow-handling logic. This additional complexity and delay can greatly hamper the performance of processors that are required to perform arithmetically intensive computations. Below, in Figure 2.1a, we trace through the required steps of a sample binary multiplication.
Lockett, Roblee
Page 6 of 48
6/3/2003
13 x5 65
00001101 x 00000101 00001101 Add 00000000 Add 00001101 Add 00010000012 = 6510
13 x8 104
00001101 x 00001000 011010002 Shift left 3 = 10410
(b)
(a) Fig. 2.1: Standard, (a) multiplier-based and multiplierless (b) multiplication example.
Note how the standard multiplication by a binary number containing three ones involves three additions and four carries, all of which can be eliminated through the use of the multiplierless technique. In Figure 2.1b, we trace through the steps required in order to compute a larger product multiplierlessly. Note how this computation requires only three arithmetic left-shifts, without any additions or carries. Multiplying by a power-of-2 number saves execution clock cycles and entirely eliminates the need for adder logic. The following recursive difference equation (equation 2.1) is used as an implementation of our IIR filtering operation.
y (n) = (am ) y (n m) + (bm ) x(n m )

m =1 m =0
(2.1)
The set of numerator and denominator coefficients are defined, respectively, as A and B. When applied to a full 2D image, this operation requires a substantial number of multiplications and additions. To compute a single output value, y(n), for instance, we must conduct (N-1) + N = 2N-1 multiplications, corresponding to the operations am*y(nm) and bm*x(n-m) above. This is the motivation behind multiplierless implementation enhancement. Applications that demand real-time hardware based processing may benefit from the multiplierless implementation. Since multiplications are realized as simpler shift operations using the multiplierless approach, such systems could perform their function while requiring less runtime and hardware real estate.
Lockett, Roblee
Page 7 of 48
6/3/2003
3. IMAGE FILTERS 3.1 Motivation for IIR vs. FIR Filters
We have chosen to focus our research upon digital infinite impulse response (IIR) filters for various reasons. Foremost, it is possible to realize very sharp (narrow transition band) responses with IIR filters, which make them suitable for a broad range of applications. Finite impulse response (FIR) filters are generally easier to implement, however, since they are non-recursive and always stable (by definition). It is also much more difficult to obtain linear phase responses and to control the overall frequency responses with IIR filters. The key practical benefit of utilizing IIR filters is that they are much more efficient than FIR systems, which require higher orders (i.e. more multiplications), (Seppnen, 1999). Overall, when properly designed, IIR filters are capable of having suitable responses that are much more accurate than those of FIRs (Proakis and Manolakis, 1996). Since they are more challenging to implement and allow for higher degrees of precision and practical efficiency, we have selected IIR filter design as the focus of our proposed methodology. This is consistent with our overall objective of improving the process by which complicated filters are designed for efficient digital implementation. In order to realize this goal, we must consider the inherent limitations of IIR filters throughout the development cycle. Most importantly, we will continually face the issue of filter stability, which becomes increasingly problematic as sharper responses (higher orders) are approximated. Since this design methodology is to be oriented towards image processing applications, we must likewise assure that it generates filters with linear phase responses.
Lockett, Roblee
Page 8 of 48
6/3/2003
Through pervasive experimentation with input parameters and analysis of generated output systems, we can gauge the effectiveness of our proposed genetic optimization method. The immediate objective is to generate multiplierless filters that consistently fulfill the preconditions of linearity and stability.
3.2 Edge Detection
An edge within an image can be simply defined as any discernible contour. In the image spatial domain this correlates to any major change in the brightness of the image. Edges form the outlines of objects and are critical inputs to a wide variety of imageprocessing and computer vision applications. Image edges may likewise indicate the boundary between any overlapping objects or the boundary between an object and its background. Discerning the edges within an image allows us to define the objects contained within that image in terms of their boundaries. Edge detection and accentuation are necessary when systems need to identify, distinguish and quantify such objects for decision-making or classification purposes. Edge detection is instrumental in facilitating these tasks. Edges are detected through the use of an operator (such as a filter) as rapid brightness level (either grayscale or color) changes in the image. The task of the ideal edge detector is to obtain all reliable edge data and to display it as single-pixel contours in the filtered image (Parker, 1997). We have chosen to focus on derivative edge detectors, which essentially identify sharp intensity changes within a given image. A derivative operator monitors the rate of change of the grey levels in the image function, which is relatively large across edges and small across constant areas. To realize the derivative analysis of a 2-dimensional image,
Lockett, Roblee
Page 9 of 48
6/3/2003
we must effectively consider the partial derivatives of the image with respect to the horizontal and vertical directions. Analytically, we consider this a 2D vector sum of partial derivatives, which is a gradient as a function of the x and y directions. The resultant 2D vector contains information on edge strength and direction at a certain pixel, called edge response. By taking the magnitude of the gradient vector, we may obtain this edge strength directly. The gradient magnitude is computed as the root of the sum of the squares of the partial derivatives with respect to x and y, (Parker, 1997).
3.3 Canny Edge Detection
Canny is a prominent derivative edge detection technique, which was pioneered by John Canny in 1986, (Parker, 1997). The primary motivations for the Canny operator are the minimization of detection error rate, accuracy of edge location with respect to contours in the input image, and single-pixel edge identification (e.g. sharpness of edges). Cannys edge detector was designed to act as a convolution filter in order to smooth the image noise (assumed as a Gaussian), and to locate the edge. The optimal edge location filter is approximated as the first derivative of a Gaussian function in two directions, G(x,y) (with x and y representing the horizontal vertical directions, respectively). The edge detector convolves the input image with G in both the x and y directions and then computes the magnitude between the two resultant vectors at each pixel. It should be noted that this convolution process is analogous to the computation of the image gradient (Parker, 1997). Smoothing (noise reduction) is performed in a similar manner using the original Gaussian function, G(x,y), (Sarkar and Boyer, 1989). In practice, the convolutions associated with the Canny operator are computationally intensive, requiring substantial runtime when realized with standard multipliers.
Lockett, Roblee
Page 10 of 48
6/3/2003
4. GENETIC ALGORITHMS
This section details the motivations, theory and reasoning behind the implementation of genetic algorithms in the optimization of infinite impulse response filters. We will begin by discussing the motivations behind the application of genetic algorithms to IIR filter design.
4.1 Motivations for Genetic Algorithm Based Filter Design
In general, genetic algorithms (GAs) rely upon the law of fittest member survival to optimize a set of possible outcomes or results. Loosely based on the laws of Darwinian genetics, GAs have many unique characteristics that make them ideal for many optimization problems. As Goldberg states (1989), They [GAs] combine survival of the fittest among string structures with a structured, yet randomized information exchange to form a search algorithm with some of the innovative flair of human search. Since our goal is to optimize IIR filters to contain only power-of-two coefficients, traditional design techniques are unsuitable for the construction of such filters. Restricting coefficients to power-of-two numbers forces us to explore other methods. The proposed technique employs a genetic algorithm to search for power-of-two filter coefficients.
4.2 Basic Genetic Algorithm Theory
As previously stated, GAs employ a random search to determine the ideal solution for a given optimization problem: While randomized, genetic algorithms are no simple random walk. They efficiently exploit historical information to speculate on new search points with expected improved performance (Goldberg, 1989).
Lockett, Roblee
Page 11 of 48
6/3/2003
A GA performs this through a series of evolutionary breeding (also referred to as crossover) steps performed on a population of possible outcomes. Over the course of many crossovers, mutations, and subsequent generations of the population, it is the hope that the group of outcomes will converge to a singular ideal solution that best suits the given optimization criteria. Below in Figure 4.1 we see a flowchart detailing the basic building blocks of the most common genetic algorithms.
Initial Population of Members New Population
Fitness
Crossover Selection Random Mutation
Crossover of selected members
Mutation inserted into population
Converged?
y Fittest Member
Fig. 4.1 Flowchart of Basic Genetic Algorithm
All genetic algorithms begin with the creation of a population of possible outcomes. The initial population consists of random values that are of no significant consequence to the final result. In reality, this generation is defined pseudo-randomly with the aid of a software based random number generator. Generally, each member of the population comprises binary strings for the reason that they allow for a simple method of crossover (which will be discussed in further detail below). Next, the GA begins to determine which members are most suitable for breeding. This method is known as fitness evaluation.
Lockett, Roblee
Page 12 of 48
6/3/2003
Fitness evaluation of the initial generation (or any subsequent generation) hinges on the definition of fitness for the given application. Fitness defines which members will crossover - a chance to pass on their superior genetic material to newer generations of the population. Fitness evaluation allows those members who are superior for the given application to breed most often, and those that are genetically inferior to be discarded in the evolutionary cycle. Fitness evaluation of each member relies on a predefined fitness function. The fitness function is defined with the idea that fittest members contain characteristics that best match those of the ideal outcome. Therefore, the fitness function yields the highest fitness ratings to the best members because those members include certain criteria that exist in the ideal solution. A satisfactory fitness function accurately detects the characteristics that define the desired outcome. The members with fitter results are ultimately assigned higher probabilities of crossover. Once fitness evaluation of the population has occurred, the breeding of the fittest population members begins. In order to do so, a probability of crossover is assigned to each member. This probability is proportional to the members corresponding fitness such that the sum of all of these probabilities is equal to one. Next, members are chosen pseudo-randomly, as those with higher probabilities of crossover have the greatest chance of being selected. Conversely, those with lower probabilities are less likely to be chosen since they represent relatively unfit population members. This method of crossover selection ensures that the fittest members are chosen to exchange genetic material, ideally resulting in even fitter offspring. With most genetic algorithms, crossover takes place by swapping portions of the binary strings that compose each of the members. It is important to note that the number
Lockett, Roblee
Page 13 of 48
6/3/2003
of members chosen to crossover in each generation can be held constant, vary randomly or vary depending on the fitness of the population. Also, the amount of material (the length of the binary string portion) exchanged can be of either fixed or variable length. These characteristics depend on the application of the GA and can be determined through series of trial and error. Once the members have undergone crossover, they rejoin the population to form the offspring generation. This generation of crossed over members is then subjected to a random mutation. Mutation is an important process in every GA as it prevents the population from stagnating or from converging to a result that is fit but does not represent the optimal solution. A mutation inserted into the population effectively kick-starts a population that is left dormant and relatively unfit. In the case of a binary string-based population, mutation can occur by changing a one to a zero or vice versa. In all GAs, mutation occurs randomly and is an important factor in the effectiveness of the random search technique. Following mutation, the population iterates the processes of fitness evaluation, crossover selection, breeding and mutation. This sequence of transformation is repeated until the population is comprised of members representing a single fittest value. At this point the population is said to have converged and the single value is considered the ideal solution to the optimization problem. The GA then terminates as it has reached a point from which it cannot continue to produce fitter generations of offspring.
4.3 Description of the Designed Genetic Algorithm
The genetic algorithm designed for this project is based upon conventional GA principles and framework: the algorithm starts with a random population, evaluates
Lockett, Roblee
Page 14 of 48
6/3/2003
member fitnesses, breeds randomly selected fittest members, occasionally mutates and then repeats. It should be noted that since the optimization performed in this case is quite specific (multiplierless criterion), the designed GA must be accurately and explicitly defined. At the same time, the GA is designed in a modular fashion through the use of variable inputs in order to make important changes regarding the operation of the algorithm in a relatively straightforward manner. The designed algorithm makes several rapid departures from conventional GA design. These different approaches in design are due to the precise nature of multiplierless optimization and the notion of representing IIR filters in terms of coefficient sets. The flowchart detailing the designed GA is shown in Figure 4.2. Perhaps the most significant difference in the designed GA is that each member of the population is comprised of several power-of-two values. These values represent the coefficients of a particular multiplierless IIR filter. Furthermore, each member coefficient set is divided into two separate numerator and denominator coefficient sets (B and A, respectively) which crossover separately. Figure 4.3 shows an example of two power-oftwo coefficient member sets and how they perform crossover of their A and B coefficients. Fitness evaluation is based on the set of coefficients as a whole. This is a departure from conventional GA design in that each member of the population is traditionally represented as a single binary string. The designed GA must therefore perform crossover as depicted in Figure 4.3 due to the nature of power-of-two coefficients. If the coefficients were to be expressed in terms of their binary equivalents and crossed-over in the conventional manner then the multiplierless coefficient criterion
6/3/2003
would most likely be violated. Therefore, it is necessary to keep coefficients intact and
Parameters: -Number of Iterations -Filter Order -Mutation probability -Word length -Sampling frequency Initial Population of power-of-2 coefficients. Desired Response
exchange them in their entirety during crossover as can be seen in Figure 4.3.
New Population Remove Unstable Coefficient Sets
Fitness Evaluation
-Crossover Member Selection -Crossover Point Selection Selected Fittest Members Random Mutation Crossover of selected coefficients Newly-generated power-of-2 coefficients Mutation inserted into population
Lockett, Roblee
Remove Unstable Coefficient Sets
Population replenished with new members to maintain original size
More Iterations?
Fittest Member
Fig. 4.2 Flowchart of Designed Genetic Algorithm
No
Page 15 of 48
Yes
Lockett, Roblee
Page 16 of 48
6/3/2003
Fig. 4.3 Example of Crossover of Two 10th Order Filter Coefficient Sets
The crossover scheme depicted in Figure 4.3 is considered a two-point crossover, as two distinct ranges of crossover are selected at random for each member of the population (Mitchell, 1996). This form of crossover is utilized in order to genetically enhance the numerator and denominator coefficient sets separately. An additional difference from general GA design can be seen in its preemptive removal of unstable members (Figure 4.2). Filter stability is the initial criterion when designing filters. Accordingly, the GA is designed to remove any unstable members both before fitness evaluation and after mutation. It does this by examining the system function denominator A coefficients and assuring that the pole magnitudes remain less than one. The population is subsequently replenished with new random coefficient sets to fill vacancies left by unstable members that were removed during fitness evaluation. It is quickly noticed that due to the restrictive nature of power-of-two coefficients and the methods in which crossover takes place, it is extremely difficult for our population to approach a point of convergence where all members represent a uniform fittest coefficient set. Instead, the fittest member in the population is selected after the exhaustion of a specified number of iterations. This member represents the closest powerof-two filter approximation.
Lockett, Roblee
Page 17 of 48
6/3/2003
The designed GA also incorporates a form of elitism. Elitism is a technique implemented in our algorithm to prevent the fitness regression of the population. Since we must crossover entire coefficient values and have a limited number of power-of-two numbers to choose from (a function of word length), even a crossing of the two fittest members could result in offspring that are highly unfit (or unstable). Elitism is employed to prevent the loss of the fittest members due to crossover and mutation. This guarantees that even when parents do not successfully yield fitter offspring, they are held over to the next generation to try again. Elitism ensures that the fittest members are not lost by creating a copy of them before crossover and using them to replace the least fit members in the subsequent generation. The number of members that are held over is an input parameter of the GA.
4.3.1 Fitness Function Definition and Crossover Selection
The fitness function in the designed genetic algorithm compares the candidate multiplierless coefficient system responses from the population to a specified ideal filters response. In order to do this, the absolute difference is taken between the magnitude frequency responses of the ideal system and the multiplierless systems. This difference is then summed over all of the samples in the frequency response representation. The sum is squared to ensure that any major differences are weighted most heavily. Fitness is defined as being inversely proportional to this squared sum of differences between the ideal and candidate systems. Equation 4.1 depicts this inverse relationship, where n is the total number of samples in the ideal and approximated magnitude frequency response representations:
Lockett, Roblee
Page 18 of 48
6/3/2003
Fitness
1
n i =1
(ideal (i ) multiplierless (i )
(4.1)
The relationship depicted in equation 4.1 allows us to establish a basis of comparison between the members in the population. Those members that have the largest square of summed differences are considered less fit. These members are assigned lower probabilities of crossover. Conversely, those with lower squares of summed difference values are assigned higher probabilities of crossover since they represent the fitter members. Probability of crossover is assigned to each member based on the relative fitness amongst one another. Once the fitness characteristic described in Equation 4.1 is calculated for all members, each is divided by the sum of all the fitness grades over the entire population. This normalizes the set of fitness grades. Normalization forces the fitness to grades between the values of zero and one, which are subsequently used as a set of crossover probabilities corresponding to member fitness. Figure 4.4 shows how a hypothetical population is divided into probabilities of crossover corresponding to each set of coefficients.
Fig. 4.4 Hypothetical GA population, their fitnesses represented as normalized values between 0 and 1, and their associated crossover probabilities as a function of fitness.
A random number is generated to determine which element will be selected for breeding. This random number falls within a particular range of crossover probability. This range corresponds to a particular coefficient set, which is subsequently chosen as a breeding member.
Lockett, Roblee
Page 19 of 48
6/3/2003
4.3.2
Magnitude Response and Relative Error
The magnitude frequency response is the basis for comparison between the candidate multiplierless coefficient sets and the specified ideal filter. This decision comes as the result of scrupulous testing of various fitness functions. A fitness grading that includes only the magnitude of the frequency response is sufficient for our imaging applications. A system of relative error tracks the progression of the GA population. This relative error is defined as the difference between the magnitude frequency responses of the ideal and the fittest multiplierless systems summed over all samples. Comparing this value over the total number of iterations of the GA allows us to monitor how the fittest member converges to a fittest solution. The fittest member is a sufficient gauge of the GAs progression since we are ultimately only interested in the quality of the fittest members response. If the relative error of the fittest member decreases iteratively over the life cycle of the GA, then we know that the GA is indeed optimizing to a fittest possible solution. The input parameters described below have a profound effect on how the GA progresses and how the relative error decreases over the optimization period.
4.3.3 GA Parameters
The genetic algorithm is designed to be able to optimize several different types of filters as well as to adapt and modify its population in different ways. To do this, the GA incorporates a variety of different variables and parameters that can be altered depending on the application. The first parameter is the number of genetic iterations. This is an important variable as it determines how long the population breeds in an attempt to improve the fittest
Lockett, Roblee
Page 20 of 48
6/3/2003
member. Generally it can be said that the higher the number of iterations chosen, the fitter the members of the population become. A second and equally important input variable to the GA is the filter order. The filter order determines not only how many coefficients make up each member of the population, but also the filters ability to approximate its ideal specified counterpart. Generally, it can be said that higher order filters are necessary in order to realize sharper responses. To accommodate for this factor, it is necessary to vary the filter order depending on the application. With power-of-two coefficients, the need for higher filter order is more apparent, as it becomes increasingly difficult to realize sharp responses with a limited number of possible coefficients. A variable exists to control the frequency of population mutation. A mutation
probability is created to allow for random mutation at a probabilistic frequency. A higher mutation probability forces the population to mutate more frequently. Likewise, a lower mutation probability forces the population to mutate less frequently. Wordlength is an important factor in the designed GA, in that it determines how many possible power-of-two coefficients can be used in the system. A higher wordlength generally allows for a more precise approximation of the ideal system. The tradeoff is the increase in hardware cost due to this wordlength increase (see section 6). Forcing wordlength to be a variable in the GA allows us to determine the application effectiveness for multiplerless hardware implementation. The final input variable to the GA is sampling frequency. The number of samples used for comparison between the ideal filter and population members affects the degree of scrutiny in fitness evaluation. Using a small amout of samples decreases the number
Lockett, Roblee
Page 21 of 48
6/3/2003
of comparison points between ideal and member systems during fitness calculation. This has an advantage in terms of the speed of each GA iteration. The tradeoff is in the limited representation of the ideal system when fitness is evaluated.
5. RESULTS
This section includes significant results obtained during GA development and filtering processes. We begin by reviewing the various methods utilized in order to assess GA performance and reliability.
5.1 Magnitude Frequency Analysis
There are several important considerations in any filter design. The initial, and perhaps most important condition, is that the filter be stable. Since the GA incorporates pervasive stability checking, however, this is an unnecessary consideration during filter analysis. Throughout the development phase of the GA, we utilize frequency analysis in order to compare the approximated and ideal magnitude responses of three distinct filters. The first is an ideal high-pass filter (HPF), displayed in Figure 5.1.
(a)
(b)
Fig. 5.1: Frequency responses (a) and relative error plot (b) of ideal and approximated filters after 2000 GA iterations with m=10, n=10, mutation probability = 0.001.
Given the specified parameters, the GA produces an accurate and sharp approximation of the ideal HPF with minimal relative error after 2000 iterations. As
Lockett, Roblee
Page 22 of 48
6/3/2003
anticipated, however, the generated filter deviates somewhat within the transition band. We subsequently experiment with the various GA input parameters in hopes of generating a sharper multiplierless approximation. This is ultimately realized with a significantly higher filter order (50 m and n coefficients), as observed in Figure 5.2.
Fig. 5.2: Magnitude frequency responses of ideal and approximated filters after 15,000 GA iterations with m=50, n=50, mutation probability = 0.001.
Although the obtained response is noticeably sharper than the previous instance, its generation requires substantially more GA iterations (15,000) due to the fact that it is 50th as opposed to 10th order. Since most image processing applications do not require such sharp frequency responses, there is little practical merit within the scope of this project in using such high-order filters. We therefore concentrate on 10th-order systems with fewer GA iterations in order to realize specific filtering applications. Upon subsequent trials with varying parameter sets, ideal HPF (and LPF) experimentation demonstrates the consistent robustness of the GA as well as the accuracy of its output for a variety of inputs. After establishing the GAs capability of generating high-precision, multiplierless approximations of ideal HPFs, it is necessary to test it for the case of other image filters. The second application is derivative-based, Canny edge detection (see section 3.3). To test the multiplierless approximation of the Canny operator, we execute the GA with
Lockett, Roblee
Page 23 of 48
6/3/2003
the Gaussian and derivative-Gaussian masks as the desired responses. Since these are much smoother responses, the GA requires fewer iterations to generate fair approximations of them, as indicated in Figure 5.3 and Figure 5.4 below:
(a)
(b) Fig. 5.3: Magnitude frequency responses of ideal and approximated Gaussian (a) and derivative-Gaussian (b) filters with = 1, after 1000 GA iterations and m=10, n=10, mutation probability = 0.001; (c) Relative error plot.
(c)
(b) (a) Fig. 5.4: Magnitude frequency response (a) and relative error plot (b) of ideal and approximated Gaussian filter with = 2, after 300 GA iterations and m=10, n=10, mutation probability = 0.001.
The above results corroborate the GAs ability to generate precise and relatively loworder approximations in the frequency domain for specific target applications. In both
Lockett, Roblee
Page 24 of 48
6/3/2003
Canny trials, we note how the relative error is effectively minimized after the 200th iteration. The third case considered is that of an image-blurring system, which is most easily represented as an ideal LPF, with degree of blurring inversely proportional to cutoff frequency. To experiment with these systems, we follow steps similar to those undertaken during trials with the initial HPF. The plots shown below in Figure 5.5 are characteristic of those achieved throughout this phase of experimentation. A higher order and increased number of iterations are necessary in order to produce the optimal response seen below, since the GA approximated a much sharper filter.
(a) (b) Fig. 5.5: Magnitude frequency response (a) and relative error plot (b) of ideal and approximated blurring filter (ideal LPF), after 2000 GA iterations and m=15, n=15, mutation probability = 0.001, cutoff frequency = 0.66 radians.
It appears that the relative error is effectively minimized after 600 iterations, yielding fair approximations of both pass and stop bands with some minor ripple.
5.2 Spatial Analysis
Although frequency analysis is a critical stage, one must also consider other response characteristics in order to produce a reliable filter. It was therefore necessary to confirm that the GA-approximated system had an impulse response comparable to that of the ideal filter. We considered impulse responses for the Canny system, which is equivalent
Lockett, Roblee
Page 25 of 48
6/3/2003
to the derivative-Gaussian mask ( = 1) in the spatial domain. The impulse responses of the normalized ideal, and GA-approximated 10th order systems are displayed in Figure 5.6.
(a)
(b)
Fig. 5.6: Ideal (a) and approximated (b) Canny impulse responses (derivative-Gaussian) in spatial domain for = 1.
Inspection reveals that the ideal and approximated impulse responses are very similar in shape. The latter is of the same waveform and has almost equal positive and negative amplitudes. Both ideal and approximated responses have a width of 5.
5.3 Sample Filter Output
Upon generating a reliable set of multiplierless filters, it is necessary to confirm their effectiveness through trials on suitable testbench images. As an example of filter performance, the results from several of trials of the multiplierless Canny operator are displayed in the figures below; they are representative of this phase of experimentation.
Lockett, Roblee
Page 26 of 48
6/3/2003
(a)
th
(b)
Fig. 5.7: Sample input (a) and output (b) of 10 order, multiplierless Canny edge detection filter ( = 1).
We see how the Canny filter accurately detects all of the edges within the black and white bitmap image above. The edges are precise and very narrow, accentuated as single-pixel contours. The next trial involves two grayscale photographs (Figures 5.8 and 5.9).
(a)
(b)
Fig. 5.8: Sample input (a) and output (b) of 10th order, multiplierless Canny edge detection filter ( = 1).
(a)
(b)
Fig. 5.9: Sample input (a) and output (b) of 10th order, multiplierless Canny edge detection filter ( = 1).
Lockett, Roblee
Page 27 of 48
6/3/2003
Qualitative analysis of these selected test images indicates that the 10th order multiplierless filter functions as a suitable edge detector. Next, we will show how the approximated blurring filter performed on test images. The filter output in Figure 5.10 is clearly a blurred version of the input image. We can vary the degree of blurriness by decreasing the cutoff frequency specified as input to the GA. This would require a higher order and increased number of iterations to approximate, however. All of the images presented within suggest that the generated filters are not only fair approximations of their corresponding ideals, but are likewise capable of producing consistent and qualitatively accurate output for various applications.
(b) (a) Fig. 5.10: Sample input (a) and output (b) of 15th order, multiplierless blurring filter, cutoff frequency = 0.66 radians.
6. COMPARATIVE ANALYSIS OF COMPUTATIONAL COMPLEXITY BETWEEN MULTIPLIERLESS AND CONVENTIONAL DESIGN METHODOLOGIES
We now attempt to quantify the differences in complexity between the multiplier and multiplierless-based implementations of the filtering algorithms discussed in section 3. The two systems are compared through hypothetical implementations on identical target devices. Since we are interested in relative performance, we only consider the costs of
Lockett, Roblee
Page 28 of 48
6/3/2003
those components of the algorithms that are different from one another. This greatly simplifies the analysis, as the proposed algorithms differ only by their multiply/shift subsystems (see difference equation block diagram in Figure 6.1). Hence, we obtain a relative hardware cost figure by eliminating any constants between the two systems. We base all calculations upon the requirements of a 10th order IIR filter throughout this procedure.
Fig. 6.1: Block diagram of multiplierless IIR system (cascade model)
6.1 FPGA Case
We initially consider a generic field programmable gate array device (FPGA) implementation of the filter. A standard FPGA chip integrates thousands of macro cells in order to realize custom combinatorial and sequential logic. Macro cells are logic blocks that may consist of flip flops, AND gates and look-up tables (LUTs). They are linked together via switch matrices to implement programmed logic functions. Since they represent granular logic units within an FPGA, our primary analysis is conducted at the macro cell level. The ultimate objective in any FPGA filter implementation is to minimize the number of required macro cells. A conventional IIR filter requires a series of adder, multiplier and delay operators, each comprising a number of linked macro cells. The number of macro cells required for a given operator is a function of word length (n), as indicated in table 6.1. Since word
Lockett, Roblee
Page 29 of 48
6/3/2003
length is a function of the degree of filter precision and order (number of coefficients), it is an important gauge of overall filter quality in terms of digital logic. Hence, we use word length as the independent variable for subsequent calculations.
Table 6.1: Macro cells required per operator based on word length, n. Operator Shifter Multiplier Adder Unit Delay Required Number of Macro Cells n 2n2 2n n
Since unit time delays are implemented by standard n-bit registers in both multiplierless and multiplier-based configurations, we may assume that each requires only a single macro cell (for word lengths up to 32 bits). Analysis of the block diagram in Figure 6.1 indicates that we require 19 shift/multiplier, 18 adder and 18 delay operators to implement the 10th-order difference equation. So, for the multiplier-based system we require:
Costmultiplier_10 = 19*2n2 + 18*2n + 18 = 38n2 + 36n + 18 macro cells
Similarly, for the shift-based multiplierless system, we require:

Costmultiplierless_10 = 19*n + 18*2n + 18 = 55n + 18 macro cells
In general, when comparing the multiply/shift subsystems only, the multiplierless system requires 19n/38n2 = 1/(2n) as many macro cells as the multiplier-based implementation. Therefore, our 16-bit multiplier system (n = 16) would require a total of 38*162 + 36*32 + 18 = 10,322 macro cells, whereas the equivalent multiplierless system would require only 19*16 + 18*32 + 18 = 898 macro cells (8.7%). The multiplier and shift subsystems would require 38*162 = 9,728, and 19*16 = 304 macro cells,
Lockett, Roblee
Page 30 of 48
6/3/2003
respectively. Hence, there is a substantial relative hardware cost benefit with the multiplierless FPGA implementation. Within these calculations, it is important to account for the fact that a multiplierless system will generally require a higher order than that of a multiplier-based system in order to realize a given magnitude frequency response. Therefore, we also compute the hardware cost for a multiplier-based system with a significantly lower order (3) to form a fairer basis of comparison. Accordingly, for a 3rd-order multiplier-based system, it can be shown that we require a total of:
Costmultiplier_3 = 5*2n2 + 4*2n + 4 = 10n2 + 8n + 4 macro cells
From the above formulas (and table 6.2), we see that a 10th order multiplierless system still outperforms a lower order (3) multiplier-based system. The hardware costs in number of macro cells for different variants of this IIR system are recorded in table 6.2.
Table 6.2: Hardware costs (in macro cell counts) for different implementations of IIR system. Multiplier-based Word Multiplierless Length Total Multiplier Cost Total Cost Total Shifter Total Cost Cost 3rd Order 10th 3rd 10th 10th Order Order Order Order n=8 640 2,432 708 2,738 152 458 n = 16 n = 24 2,560 5,760 9,728 21,888 2,689 5,892 10,322 22,770 304 456 898 1,338
The multiplierless system cost is a linear function (order of n) of word length, whereas that of the multiplier-based approach is geometric (order of n2). Hence, a multiplierless implementation represents an even better alternative for applications that require larger operands or higher levels of precision.
Lockett, Roblee
Page 31 of 48
6/3/2003
6.2 ASIC Case
Although FPGA devices serve as highly practical filter implementations, a fixed-logic realization may be required for more complex, higher bandwidth, real-time applications. While designing for application-specific integrated circuits (ASICs), one can optimize the logic at the gate or transistor level, as opposed to the macrocell level in FPGAs. ASICs are hard-coded, however, and are therefore limited to specific filtering applications. A different filter could be realized only through the use of a memory lookup table, which would hold the coefficient values. The tradeoff in this case is that the fixed algorithm would need to accept a dynamic range of coefficient values, thereby limiting the degree of potential performance optimization Since there are various full-custom, transistor-level optimization techniques available, it difficult to fully quantify the performance differences between our multiplierless and multiplier-based algorithms for the case of a fixed ASIC implementation. We can, however, perform some basic analysis on hypothetical systems. Assuming that power-oftwo coefficients are integrated into the hard-coded circuitry of the chip, we completely eliminate the need for any gate logic to perform the multiplications. Since a given multiply operation is merely a constant shift, the bit signals of the multiplicand (input) can be rerouted to effectively realize a wired shift by the amount specified in the static coefficient. This represents a tremendous boost in chip performance, since the rerouted (shifted) input can be sent directly to the output register without the need for any shift logic or additional clock cycles. This system is illustrated below in Figure 6.2, with a sample power-of-two multiplication using the previous 8-bit input and 16-bit word lengths.
Lockett, Roblee
Page 32 of 48
6/3/2003
x(n), 8 bits
1 1
Multiplicand = 234
1 0 1 0 1 0 . 0
Power of 2 coefficient = 0.125 Shift Right 3

0 1 0 0 0 0 0
Wired shift-by-3 logic
y(n),16
bits
Product/output register: product = 29.25
Fig. 6.2: Demonstration of wired shift in fixed ASIC configuration
The multiplier-based fixed ASIC implementation could likewise be optimized to achieve much greater performance than that of an FPGA. It would, however, require much more than a single wired shift to realize a multiplication, since most binary coefficients would contain multiple 1s. Hence, a multiplication would still require a series of shifts and additions. The number of required additions for a multiplication would be equal to j = number of 1s within the coefficient. Therefore, in order to implement a single real multiplier subsystem as part of the10th-order difference equation, we would require j additions and j wired shifts. The multiplierless system would require only one wired shift, saving j additions. Assuming that J is an array of the number of 1s in each binary coefficient, we can easily derive formulas for the approximate hardware costs of the two 10th-order ASIC IIR systems:
Costmultiplier =
[(
19 i
J (i ) + 18 adders + (18 delay registers)
) ]
Costmultiplierless = 18 adders + 18 delay registers
The relative performance gain of multiplierless over multiplier-based systems is directly proportional to the J values. It is evident, however, that regardless of J, the multiplierless implementation is significantly less complicated.
Lockett, Roblee
Page 33 of 48
6/3/2003
For the programmable lookup table (LUT) case, we would also realize a substantial reduction in complexity using the multiplierless approach, although the chip would be unable to utilize instantaneous wired shifts (dynamic shifts) depicted in figure 6.2. In this implementation, we could simply replace each multiplier operator with a shift register. The relative performance gain between the two systems depends primarily upon the transistor-level implementations of multiplier operators and shift registers. Regardless of the implementation, however, shifts would be much less intensive than their corresponding multiplications. Hence, the multiplierless ASIC implementation is far more efficient than the multiplier-based ASIC for both fixed and programmable coefficients.
7. FUTURE WORK
The majority of this research focused upon the development of a customized genetic algorithm suitable for the generation of power-of-two IIR filter coefficients. After designing such an algorithm, we are able to approximate a range of desired filter responses to facilitate the efficient implementation of image processing systems. There exists the potential, however, for a substantial amount of continued research within several areas of this project. Within the scope of the genetic algorithm itself, there remain several sub-elements, which can be further investigated and ultimately enhanced. Foremost, in order to generate more precise magnitude response approximations, we could initially consider the effects of frequency band weighting throughout the evolutionary cycle. By dividing the magnitude frequency response into several (e.g. pass, transition and stop) bands, each
Lockett, Roblee
Page 34 of 48
6/3/2003
could be evaluated and perhaps optimized independently. The fitness of each frequency band could be weighted differently according to application-specific criteria. Member fitness would be subsequently computed as the aggregate of the weighted band fitness levels. This would likely prove advantageous for filtering applications in which certain frequency band responses are more critical than others. For instance, in an application where the transition band must be very sharp but ripple in the stop and pass bands is of little consequence, one could choose to weight the transition band as a more substantial contributor to overall member fitness. Another possible GA enhancement involves redefining the crossover process, which deviates significantly from those of classical algorithms. The current scheme is unconventional in that it crosses over entire decimal coefficients as opposed to segments of binary strings. An efficient multiplierless implementation of traditional, binary string crossover could therefore be attempted. This would require additional logic to assure that any two binary coefficients selected to cross over either exchange strings of all 0s or strings with single 1s. Without this, some coefficients would obtain additional 1s, thereby losing their multiplierless form. Successful realization of this would allow for the implicit generation of power-of-2 coefficients in the crossover process and could possibly eliminate the need for elitism. In order to improve GA efficiency, we propose some minor changes to the decision block at the end of the loop where the number of remaining iterations is evaluated. In all of our experiments we observe that the fittest member relative error eventually converges to some minimum level. It would therefore be beneficial to replace the existing for-loop structure with a while-loop, which would evaluate the rate of error decline (slope) at the
Lockett, Roblee
Page 35 of 48
6/3/2003
end of each iteration. Once this rate falls below a parameterized value, the loop would terminate and the fittest member at that point displayed (additional iterations have a marginal effect on overall fitness). This would prevent the occurrence of extraneous iterations and consequently decrease GA run time. Considering the GAs marked success with smooth magnitude frequency responses, there remains the tremendous potential for the further development of lower order filter applications employing our methodology. Therefore, in order to fully exploit the GAs potential, its capacity to approximate responses beyond those considered within this paper should be investigated. The algorithms demonstrated ability to generate very sharp magnitude responses motivates further experimentation of more precise imaging and non-imaging applications. Since these systems require more extensive and complicated circuitry to implement, it would prove most useful to utilize our GA to approximate them through multiplierless reduction.
8. CONCLUSIONS
Our work saw the successful development of a genetic algorithm with several unique, application-oriented attributes, which is capable of optimizing filter coefficients such that the corresponding filter frequency response matches that of an ideal system with the constraint that all coefficients are powers-of-two and the resulting filter is stable. It has been shown how the genetically optimized multiplierless filters consistently yield image results comparable to those of their ideal counterparts. In many cases, these multiplierless systems have a definite advantage in efficiency while maintaining desired responses. In every case, we have demonstrated how the multiplierless approach allows
Lockett, Roblee
Page 36 of 48
6/3/2003
for substantial reductions in hardware cost and computational intensity. We may therefore conclude that multiplierless-based image filtering is a viable design alternative, which is reliably implemented through the use of genetic algorithms.
Lockett, Roblee
Page 37 of 48
6/3/2003
REFERENCES
Goldberg, D. E., 1989, Genetic Algorithms in Search Optimization and Machine Learning, Addison-Wesley, Boston. Mitchell, M., 1996, An Introduction to Genetic Algorithms, M.I.T. Press, Cambridge, MA. Parker, J. R., 1997, Algorithms For Image Processing And Computer Vision, John Wiley and Sons Inc., New York. Proakis, J. G. and Manolakis, D. G., 1996, Digital Signal Processing: Principles, Algorithms, and Applications, Third Edition, Prentice Hall Inc., Upper Saddle River, New Jersey. Sarkar, S. and Boyer, K. L. 1989, On Optimal Infinite Impulse Response Edge Detection Filters, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, No. 11, pp. 11541164. Seppnen, J., 1999, Audio Signal Processing Basics, Introductory Lectures, Audio Research Group, Tampere University of Technology, Web Resource http://www.cs.tut.fi/sgn/arg/intro/basics.html accessed March 3, 2003.

10 1 1 120

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1 1 120

Uploaded by

Copyright:

Available Formats

Lockett, Roblee

Submitted May 30, 2003

*Digital copy of report and full results available at www.vu.union.edu/~robleec/capstone

00001101 x 00001000 011010002 Shift left 3 = 10410

y (n) = (am ) y (n m) + (bm ) x(n m )

3. IMAGE FILTERS 3.1 Motivation for IIR vs. FIR Filters

Initial Population of Members New Population

Crossover Selection Random Mutation

Crossover of selected members

Mutation inserted into population

Fig. 4.1 Flowchart of Basic Genetic Algorithm

New Population Remove Unstable Coefficient Sets

Remove Unstable Coefficient Sets

Population replenished with new members to maintain original size

Fig. 4.2 Flowchart of Designed Genetic Algorithm

Magnitude Response and Relative Error

Fig. 6.1: Block diagram of multiplierless IIR system (cascade model)

6.1 FPGA Case

Similarly, for the shift-based multiplierless system, we require:

6.2 ASIC Case

Power of 2 coefficient = 0.125 Shift Right 3

Wired shift-by-3 logic

Product/output register: product = 29.25

Fig. 6.2: Demonstration of wired shift in fixed ASIC configuration

J (i ) + 18 adders + (18 delay registers)

Costmultiplierless = 18 adders + 18 delay registers

You might also like