You are on page 1of 13

Function Minimization

By Andrew McLennan
Supervisor: Lorenzo Moneta

Index
Introduction to Minimization
Project Description
Minimization for Histograms
Examining Areas of Convergence
Introduction of extra noise
Pure Minimization
Next Step
Conclusion
Appendix 1
Appendix 2a
Appendix 2b
Appendix 3
Appendix 4
Appendix 5a
Appendix 5b
Appendix 6

3
3
4
4
5
6
7
7
8
9
9
10
11
12
12
13

I would like to take this opportunity to thank everyone who has helped me to get this far at CERN. Special thanks go to
my supervisor Lorenzo Moneta for his teaching and direction during this project. To Andras Zsenei for his consistent
help in the office. To everyone in my department for their support during this summer. To my university tutors Geraint
Jones and Andy Wathen for their teaching and references. To all my friends here at CERN who have made my time so
here so enjoyable and taught me so many new things about their cultures. Especially Jan, Nino, Dana, David, Carlos,
John, Stian, Diana, Florian, Laura, Lucia, Cristina, Aoife and Laura who I will hopefully go and visit soon. And finally, I
would like to thank my friends and family back in England for always being there for me.

10/3/2005

Andrew McLennan University of Oxford

Page 2 of 13

Introduction to Minimization
A large class of problems in many different fields of research can be reduced to the problem of finding the smallest value
taken on by a function of one or more variable parameters. For example, the minimum of
obtained at x = 3.

f ( x) = (x ! 3) is zero and is

The classic example of minimization which occurs so often in scientific research here at CERN, however, is the
estimation of unknown parameters in a theory, by minimizing the different between theory and experimental data.
Before we can tackle the minimization problem, we have to state what assumptions we are allowed to make. It is
assumed that the function F(x) is not actually known analytically but is only defined by the value it takes at a given
position. It is also assumed that we are allowed to specify a range which the parameters are allowed to take. Any
additional information we wish to provide such as the numerical values of the derivatives dF/dx at any point should be
given when available, but in general should not be assumed.
The function should get repeatedly evaluated at different points until its minimum is found and the method which finds
the minimum (within a given tolerance) after the fewest evaluations is generally considered to be the best.*
There are several difficult situations which the minimization methods have to overcome. These include
Finding the global minimum opposed to a local minimum
Finding the minimum point located somewhere within a large plateau
Finding the minimum point without having to check every point in the allowable range
We will test our methods to check whether the above situations are handled correctly

Project Description
The goal of my project here at CERN is to examine the relationship between the amount of information we need to
supply in order for the minimization methods to converge to the correct result as well as comparing the differences
between the minimization methods Minuit, GMinuit and Fumili. Minuit comes from the class TMinuit and GMinuit is
the name of the packer for the new C++ version of Minuit. Both Minuit and GMinuit are general minimization methods
based on Fletchers unified approach to Variable Metric Methods (VMM) which combines rank-one and rank-two
formulas to deal with different types of minimization problems.** Fumili comes from the class TFumili and is a
specialized minimization technique based on Chi Squared minimization and is designed to work very quickly for certain
functions provided we supply good starting information.
My aim has been to produce software tools to systematically analyze the length of time each method takes to run as well
as graphical displays of the regions in which each method converges. It is important that both Minuit and GMinuit work
provided the same initial conditions apply for backwards compatibility, but ideally GMinuit should work faster in general
and over a larger range of input parameters. It is also important to insure against islands being formed within the range of
input parameters.***
Minimization within ROOT**** can be used in several ways:
In order to produce a line of best fit on data within a histogram
With pure minimization of a function of our choosing.
Given a histogram, the production of a line of best fit is very useful. Minimization methods are designed to take this data
and use it to efficiently produce a function which best matches the data points. This is a difficult mathematical problem
and hence why many different methods exist. Pure function minimization is what is behind the production of a line of
best fit and hence should be tested also. I therefore restricted my investigation to looking at these two cases using nontrivial functions to get the most out of the methods.
Initially I didnt know what results I would produce and so a lot of experimenting was required. This allowed me to
become very familiar with the methods I was examining and helped me to evolve and tailor my programs at each stage so
that I could produce interesting and useful results to aid in the fixing of bugs within GMinuit.
*
Occasionally other considerations may be important, such as the amount of storage required by the method or the amount of computation
required to implement the method, but normally the dominating factor will be the time spent in evaluating the function
**
A general minimization method is one which can handle all functions. More details about these and many more minimization methods can
be found on the Minuit website under Documentation. http://www.cern.ch/minuit
***
An island is a region of input parameters of non-convergence for the method, located inside another region of convergence.
****
ROOT was produced at CERN for use in analyzing the data from high energy physics. See http://root.cern.ch for more details.

10/3/2005

Andrew McLennan University of Oxford

Page 3 of 13

Minimization for histograms


The first example of minimization I considered was the fitting of a histogram in order to produce a line of best fit. I
constructed a function consisting of many Gaussian peaks together with some linear background noise and used this to
populate a one-dimensional histogram.* The function ranged over 0-1000 with the peaks located at random places. This
type of histogram is regularly seen here at CERN and so is a good place to start my testing.
As the histogram was filled using randomly chosen points, I decided that the best way to test the pure efficiency of the
fitting algorithms would be to pre-locate each of the peaks using a TSpectrum object. This information was then passed
to each method as their initial search parameters and the length of time each one took to converge was recorded.
The output of this program can be seen in Appendix 1 and Table 1 below contains the CPU running times of each
method.

Table 1
CPU Running Times
TMinuit
0.64s
Fumili
0.15s
GMinuit
0.47s
(new interface)

As can be seen from the output, all three methods produced a good fit to the data but
more importantly each one produced the same fit to the data implying that this fit is
actually the best fit available.

Examining the lengths of time each method took to run reveals something interesting.
As suspected, Fumili works by far the fastest. This is because we supplied good
initial information and Fumili is a specialized method for problems such as this,
GMinuit
while Minuit and GMinuit are general methods. GMinuit ran faster than TMinuit
0.66 s
(interface as TMinuit)
when used in combination with an optimized fitter interface in the ROOT framework,
exploiting object oriented features of the new C++ version of Minuit. When the same
interface as in TMinuit is used, comparable running times are then obtained.
Further tests on the range of convergence are now needed to fully examine the differences between the running times and
range of convergence of the two methods before Minuit can be retired in favor of the new GMinuit.

Examining areas of convergence


From the above program and the general requirements of each minimization methods, we can expect that when good
information is provided, all methods will converge to the correct and same results. If,
however, information is supplied which is not good then it is not known whether the
methods will fit correctly or not. To test the methods when different initial information
is supplied I produced a program to systematically cycle through the input parameters
in order to see at which points the method works and which points it fails.
To begin with I decided to start with a simpler example than the one above and to use
only two peaks located at 8.0 and 11.0 on the range of 0-20. The program was created
in such a way that changing the method of minimization involved only changing the
name within the TVirtualFitters SetDefaultFitter() method. For each iteration of the
program, my program set the initial suggested location of the means for the two
Gaussian peaks, before it performed the fit. It cycled through each of the suggested means from (-5,-5) to (25,25), going
up in integer intervals. The real means were located at 8.0 and 11.0 and so I expected that at least around these regions,
all three methods should work.
Together with identifying where each method converged correctly, I also produced two histograms of the length of time
each method took given a specific input parameter for each method. One was for when the fit was good and the other was
for when the method didnt fit the histogram correctly. I chose to display the reciprocals of the actual times in order to be
able to see the places of very fast convergence more easily.
The results of this program for Minuit and GMinuit provided me with very useful information. As can been seen clearly
in the two pictures below, Minuit converged to the correct results in far more places than GMinuit did. This suggested
there was a major bug which needed to be fixed otherwise GMinuit wouldnt be fully backwards compatible with the
older version. Another worrying result from this program was the existence of islands within areas of convergence and
bands of convergence and non-convergence. This wasnt expected especially for such a simple example with parameters
so close to the actual minimum.
Looking at the lengths of time each method took to converge, we find that for GMinuit we almost always knew within 2
seconds whether the method has converged correctly or not. Minuit does in some cases work faster than GMinuit, which
is a problem, but generally there exists a greater number of places where Minuit converges much more slowly.
*

10/3/2005
To populate the histogram I allowed the
Andrew
computer
McLennan
to randomlychoose
University
points from
of Oxford
this function and used these values
Page
to fill4the
of histogram.
13

By looking more closely at the points where Minuit performs better than GMinuit, it was possible to track down what
was causing some of these problems in the code and hence fix it. Running this program again using the fixed version of
GMinuit now produced results which were far more closely related to Minuit as can be seen below and in Appendix 2b.

Minuit

Old GMinuit

Fixed GMinuit

Islands and bands still exist, but to solve this will require a much more in-depth look at the reasons and paths the
algorithm takes during its execution. The full output results of my program for Minuit and GMinuit can be found in
Appendix 2a. More research into the problem of this was therefore definitely needed.
Fumili on the other hand converged correctly in a far smaller region than either Minuit or GMinuit but for this function at
least no islands or bands existed. The more interesting result however is that when Fumili does fit correctly the running
time of the method is far less than either of the other two algorithms, but when bad initial information is supplied the
running time can be exceptionally large. Hence as suspected, provided we supply good information to Fumili it will work
very well for functions such as this, but if we dont have adequate information then a general minimization method will
usually work faster. The full output results for Fumili can be found in Appendix 3.

Introduction of extra noise


To allow me to further my investigation into these methods, I decided that generalizing my code would allow me much
more freedom in the future when changing the function to be examined. I increased the range of the function to 0-1000
with the peaks now located at 350.0 and 750.0 respectively. I also realized that currently the programs were taking much
too long to run using the standard ROOT interface and hence I adapted the code so that I could produce compiled C code
which was, in turn, more stable.
The main step I took, however, was to
have the comparison of Minuit and
GMinuit both located on the same
results output. I decided this would
make it easier to spot where the
methods differed and hence help my
investigation. Locations which are
coloured Black indicate parameters
where neither Minuit nor GMinuit
converged correctly. Blue represents
areas where both converged correctly.
Green
areas
where
GMinuit
outperformed Minuit and Red where
GMinuit failed but the original Minuit
worked.
Another feature I added to my program
was the ability to add some extra
random noise to the histogram. By being able to select and compare the differences between different percentage levels
of noise I was able to compare the methods under more realistic experimental conditions. My final adaptation was to
have two histograms displaying the frequency of convergence for the different distances away from the actual means.
This would show that as you move the initial parameters away from the actually means, the less likely it was that either
method would converge.

10/3/2005

Andrew McLennan University of Oxford

Page 5 of 13

The above picture represents a section of the output from the updated program. As can be seen, when no extra noise is
introduced and even when using the version of GMinuit we had previously fixed, there are still several regions and bands
where the method fails.
When we next run the program with different levels of extra noise, we get some very interesting results. Not only are the
regions of convergence for the methods reduced as the amount of noise is increased, but points where the method
previously didnt converge correctly now in some cases does converge. Bands on non-convergence still exist for this
function even as the level of noise is increased, but the position of the bands actually changes. For 5% extra noise we see
that GMinuit performs with a greater uniformity that Minuit as less islands are created. Finally, we notice for this
example at least, that as the amount of extra random noise is increased, GMinuit starts to out perform Minuit. Table 2
shows the actual number of points where the two method converge

Table 2
Noise
0%
2.5%
5.0%

Minuit
847
805
165

GMinuit
814
806
176

Hence, not only does the accuracy of the initial information we supply have to be more
accurate as noise level increases, but it seems GMinuit was designed to work well as
the noise level increases. It would be interesting to have some further investigations
with different levels and types of noise to see whether GMinuit actually does work
better as noise increases. The full output results for the 3 different noise levels can be

found in Appendix 4.

Pure Minimization
The final part of my project was to move away from using
minimization for fitting histograms to testing the minimization
methods for Minuit and GMinuit directly. I did this by again
adapting my program to deal with more complicated two
dimensional functions such as Rosenbrocks Curved Valley
problem* with the aim of being able to change the function
under investigation easily. I designed the program so that it
was easy to change the viewing range of the function, the
amount of detail to print out, the position about which to
concentrate the investigation and also whether I wanted just
the basic information about this position or the extra graphical
outputs. All these parameters can be set at runtime allowing
the user to view general information around the actual
minimum and then to target specific regions or points without
having to go and change any of the underlying code.
Running the program without setting any of the input parameters produced the output results in Appendix 5a. As can be
seen, there are several places where GMinuit should work but doesnt. Some of these places are actually quite close to
the actual minimum and hence even though we have fixed one problem in the code, others must exist. Again the bottom
two histograms represent the number of correctly and incorrectly minimized positions which are of different distances
away from the actual minimum. The second output screen shows the proportional number of correctly and incorrectly
minimized points with respect to the total number of points available at that level. It can be clearly seen that as the
information you supply diverges from the actual minimum, the number of points which will actually converge correctly
tends to zero. This happens quite well with Minuit but GMinuit again struggles.
Using this information and the code, it was possible to go into
detail for the points where GMinuit failed and fix two more
problems making a total of 3 fixes. Running the same program
again but now using this fixed version of GMinuit results in the
output in Appendix 5b. It can now clearly be seen that GMinuit
works well for this function and that it even out performs Minuit
in several places. This encouraging result shows that my
investigation was going in the correct direction at least and
hopefully it should be possible to find all the problems causing
the discrepancies seen.

More information about Rosenbrocks Curved Valley problem can be found in Appendix 5a.

10/3/2005

Andrew McLennan University of Oxford

Page 6 of 13

Next step
To determine whether all the differences between Minuit and GMinuit had
been found, I tested one further function. The Goldstein and Prices function
with four minima** is another difficult problem for minimization methods
due to it having more than one local minimum. My program again adapted to
the new information, allowing me to specify more exactly how much
information I would like to see outputted as well as having an extra indicator
for the case when Minuit and GMinuit both failed but failed to different
values. I felt that this was also an important case to consider as it shows that
the method must have taken different routes to get to its final result. Upon
running my program for this function I expected there to only be a few places
where the function failed. Unfortunately, however, it was obvious there are
still major problems with GMinuit due to the large quantity or red markers dotted all over the input region. Not only did
Minuit outperform GMinuit, both methods left many islands and bands of non-convergence within areas of convergence
and vise-versa. Also, by looking at the second output results, the distance away from the actual minimum did not seem to
affect the probability of whether a set of input parameters would converge.
To get the most out of my program, it should be run with input parameters. The following is an example of the program:
root Project5GoldsteinAndPriceFunction.C+( range , extra , print , X , Y ), where
range is the distance either side of our chosen point to be displayed,
extra is whether we would like only the basic output or the full detailed program output,
print decides how much information each method should display, going from 0 to 3,
X & Y are the (x,y) coordinates about which my program should look.
The default execution parameters would effectively be equivalent to
root Project5GoldsteinAndPriceFunction.C+( 10 , 0 , 0 , 0 , -1.0 )

Conclusion
During this project, I have been able to investigate the properties of different minimization strategies. I have been able to
show graphically where each method works and compare the new and old versions of Minuit. Even though three fixes
have resulted from my investigation I have been able to show that there must be more still hidden within the code
somewhere. The most important result I have found, however, is that both the new and old versions of Minuit fail in
places where they were not expected to. It would be very interesting to be able to look at the code in far more detail and
follow the routes they take for these islands compared to the surrounding areas to find out what the reason is for the
differences. Then, it may be possible to produce more rigorous methods for finding minimizations and fits to histogramed
data.
Thus, to take this project further I would initially look at fixing the problems which are causing GMinuit to fail in so
many places where Minuit actually worked for the Goldstein and Prices function. This would be done in the same way
as before by looking at the locations which caused anomalies, seeing what results they were actually giving and trying to
use this information together with stepping through the code until the problem is discovered.
Next I would adapt my program to higher dimensions, specifically 4 dimensions, as there exists several difficult test
functions for which it would be interesting to see how GMinuit performed. As it is only really possible to display 2
dimensions effectively, I would want it to be possible to fix certain dimensions and vary others. It should therefore be
possible to specify this in the programs input parameters and at which specific values the fixed dimensions should be set
to. The information I would then get for these new test functions would either provide more information and points at
which need to be compared (so as to fix bugs within GMinuit) or increase our confidence in its algorithm. Finally, the
last extension I would make would be to display both the function and region of convergence on the same graph, as this
would make the investigation easier to visualize. Each comparison point would then be tied directly to an area of the
function being minimized.
As this investigation was very open ended with no previous information about what results to expect, I had to produce
my program so that it could evolve and develop as more information was discovered. This can be seen with the
progression of the programs I produced. The final program I produced is thus stable and has the ability to provide a firm
base to continue the investigation. In the end I decided to only compare versions of Minuit leaving the Fumili
investigation for another time. However, this wouldnt be hard to incorporate into my program if someone so wished. I
felt that comparing Minuit versions (which are generalized method) to Fumili (which is a specialized method), would not
actually produce any comparable results.
*

More information about Goldstein and Prices Function with four minima can be found in Appendix 5b.

10/3/2005

Andrew McLennan University of Oxford

Page 7 of 13

Appendix 1
The output of my program for comparing Minuit, Fumili and GMinuit when good initial information is supplied. As can
be seen, each method fits the function but the running time of each method differs greatly.
The horizontal axis contains the bins for the range of the function.
The vertical axis is the number of elements for each bin.

10/3/2005

Andrew McLennan University of Oxford

Page 8 of 13

Appendix 2a

Minuit

Old GMinuit

Appendix 2b

New GMinuit

10/3/2005

Andrew McLennan University of Oxford

Page 9 of 13

Appendix 3

Fumili

10/3/2005

Andrew McLennan University of Oxford

Page 10 of 13

Appendix 4
Results of
minimization
Neither
Both
Minuit only
GMinuit only

0% noise
Noise
0%
2.5%
5.0%

Minuit
847
805
165

GMinuit
814
806
176

2.5% noise

5.0% noise
10/3/2005

Andrew McLennan University of Oxford

Page 11 of 13

Appendix 5a
Rosenbrocks Curved Valley

F ( x, y ) ( 100 y x 2

) + (1

x) 2

Minimum:

F (1.0,1.0) = 0
Possible starting point:

F (!0.1,0,. ) = 14.1.
This problem is probably the best known
test problem for minimization methods.
It consists of a narrow parabolic valley
with very steep sides. The floor of the
valley follows approximately the
2

parabola y = x + 1 200 and stepping


methods tend to perform at least as well
as gradient methods for this function.

Before the two problems were fixed

Appendix 5b

After the two problems were fixed

10/3/2005

Andrew McLennan University of Oxford

Page 12 of 13

Appendix 6

Goldstein and Price function with four minima

F ( x, y ) + 1 + (x + y + 1), 19 14 x + 0 x ,

))(

14 y + 6 xy + 0 y , 00 + (, x 0 y ), 18 0, x + 1, x , + 48 y 06 xy + , 7 y ,

Local minima:

F (1.2,0.8) = 840
F (1.8,0.2) = 84
F (!0.6,!0.4) = 30

Minimum:

F (0,!1.0) = 3
F (!0.4,!0.6) = 35

))

Possible starting point:


This is another standard test function for minimization methods. It is an eighth-order polynomial in two variables
which is well behaved near each minimum, but has four minima. An interesting place to start looking would be at the
above starting point as it lies in between the two lowest minima.

10/3/2005

Andrew McLennan University of Oxford

Page 13 of 13

You might also like