Professional Documents
Culture Documents
By Andrew McLennan
Supervisor: Lorenzo Moneta
Index
Introduction to Minimization
Project Description
Minimization for Histograms
Examining Areas of Convergence
Introduction of extra noise
Pure Minimization
Next Step
Conclusion
Appendix 1
Appendix 2a
Appendix 2b
Appendix 3
Appendix 4
Appendix 5a
Appendix 5b
Appendix 6
3
3
4
4
5
6
7
7
8
9
9
10
11
12
12
13
I would like to take this opportunity to thank everyone who has helped me to get this far at CERN. Special thanks go to
my supervisor Lorenzo Moneta for his teaching and direction during this project. To Andras Zsenei for his consistent
help in the office. To everyone in my department for their support during this summer. To my university tutors Geraint
Jones and Andy Wathen for their teaching and references. To all my friends here at CERN who have made my time so
here so enjoyable and taught me so many new things about their cultures. Especially Jan, Nino, Dana, David, Carlos,
John, Stian, Diana, Florian, Laura, Lucia, Cristina, Aoife and Laura who I will hopefully go and visit soon. And finally, I
would like to thank my friends and family back in England for always being there for me.
10/3/2005
Page 2 of 13
Introduction to Minimization
A large class of problems in many different fields of research can be reduced to the problem of finding the smallest value
taken on by a function of one or more variable parameters. For example, the minimum of
obtained at x = 3.
f ( x) = (x ! 3) is zero and is
The classic example of minimization which occurs so often in scientific research here at CERN, however, is the
estimation of unknown parameters in a theory, by minimizing the different between theory and experimental data.
Before we can tackle the minimization problem, we have to state what assumptions we are allowed to make. It is
assumed that the function F(x) is not actually known analytically but is only defined by the value it takes at a given
position. It is also assumed that we are allowed to specify a range which the parameters are allowed to take. Any
additional information we wish to provide such as the numerical values of the derivatives dF/dx at any point should be
given when available, but in general should not be assumed.
The function should get repeatedly evaluated at different points until its minimum is found and the method which finds
the minimum (within a given tolerance) after the fewest evaluations is generally considered to be the best.*
There are several difficult situations which the minimization methods have to overcome. These include
Finding the global minimum opposed to a local minimum
Finding the minimum point located somewhere within a large plateau
Finding the minimum point without having to check every point in the allowable range
We will test our methods to check whether the above situations are handled correctly
Project Description
The goal of my project here at CERN is to examine the relationship between the amount of information we need to
supply in order for the minimization methods to converge to the correct result as well as comparing the differences
between the minimization methods Minuit, GMinuit and Fumili. Minuit comes from the class TMinuit and GMinuit is
the name of the packer for the new C++ version of Minuit. Both Minuit and GMinuit are general minimization methods
based on Fletchers unified approach to Variable Metric Methods (VMM) which combines rank-one and rank-two
formulas to deal with different types of minimization problems.** Fumili comes from the class TFumili and is a
specialized minimization technique based on Chi Squared minimization and is designed to work very quickly for certain
functions provided we supply good starting information.
My aim has been to produce software tools to systematically analyze the length of time each method takes to run as well
as graphical displays of the regions in which each method converges. It is important that both Minuit and GMinuit work
provided the same initial conditions apply for backwards compatibility, but ideally GMinuit should work faster in general
and over a larger range of input parameters. It is also important to insure against islands being formed within the range of
input parameters.***
Minimization within ROOT**** can be used in several ways:
In order to produce a line of best fit on data within a histogram
With pure minimization of a function of our choosing.
Given a histogram, the production of a line of best fit is very useful. Minimization methods are designed to take this data
and use it to efficiently produce a function which best matches the data points. This is a difficult mathematical problem
and hence why many different methods exist. Pure function minimization is what is behind the production of a line of
best fit and hence should be tested also. I therefore restricted my investigation to looking at these two cases using nontrivial functions to get the most out of the methods.
Initially I didnt know what results I would produce and so a lot of experimenting was required. This allowed me to
become very familiar with the methods I was examining and helped me to evolve and tailor my programs at each stage so
that I could produce interesting and useful results to aid in the fixing of bugs within GMinuit.
*
Occasionally other considerations may be important, such as the amount of storage required by the method or the amount of computation
required to implement the method, but normally the dominating factor will be the time spent in evaluating the function
**
A general minimization method is one which can handle all functions. More details about these and many more minimization methods can
be found on the Minuit website under Documentation. http://www.cern.ch/minuit
***
An island is a region of input parameters of non-convergence for the method, located inside another region of convergence.
****
ROOT was produced at CERN for use in analyzing the data from high energy physics. See http://root.cern.ch for more details.
10/3/2005
Page 3 of 13
Table 1
CPU Running Times
TMinuit
0.64s
Fumili
0.15s
GMinuit
0.47s
(new interface)
As can be seen from the output, all three methods produced a good fit to the data but
more importantly each one produced the same fit to the data implying that this fit is
actually the best fit available.
Examining the lengths of time each method took to run reveals something interesting.
As suspected, Fumili works by far the fastest. This is because we supplied good
initial information and Fumili is a specialized method for problems such as this,
GMinuit
while Minuit and GMinuit are general methods. GMinuit ran faster than TMinuit
0.66 s
(interface as TMinuit)
when used in combination with an optimized fitter interface in the ROOT framework,
exploiting object oriented features of the new C++ version of Minuit. When the same
interface as in TMinuit is used, comparable running times are then obtained.
Further tests on the range of convergence are now needed to fully examine the differences between the running times and
range of convergence of the two methods before Minuit can be retired in favor of the new GMinuit.
10/3/2005
To populate the histogram I allowed the
Andrew
computer
McLennan
to randomlychoose
University
points from
of Oxford
this function and used these values
Page
to fill4the
of histogram.
13
By looking more closely at the points where Minuit performs better than GMinuit, it was possible to track down what
was causing some of these problems in the code and hence fix it. Running this program again using the fixed version of
GMinuit now produced results which were far more closely related to Minuit as can be seen below and in Appendix 2b.
Minuit
Old GMinuit
Fixed GMinuit
Islands and bands still exist, but to solve this will require a much more in-depth look at the reasons and paths the
algorithm takes during its execution. The full output results of my program for Minuit and GMinuit can be found in
Appendix 2a. More research into the problem of this was therefore definitely needed.
Fumili on the other hand converged correctly in a far smaller region than either Minuit or GMinuit but for this function at
least no islands or bands existed. The more interesting result however is that when Fumili does fit correctly the running
time of the method is far less than either of the other two algorithms, but when bad initial information is supplied the
running time can be exceptionally large. Hence as suspected, provided we supply good information to Fumili it will work
very well for functions such as this, but if we dont have adequate information then a general minimization method will
usually work faster. The full output results for Fumili can be found in Appendix 3.
10/3/2005
Page 5 of 13
The above picture represents a section of the output from the updated program. As can be seen, when no extra noise is
introduced and even when using the version of GMinuit we had previously fixed, there are still several regions and bands
where the method fails.
When we next run the program with different levels of extra noise, we get some very interesting results. Not only are the
regions of convergence for the methods reduced as the amount of noise is increased, but points where the method
previously didnt converge correctly now in some cases does converge. Bands on non-convergence still exist for this
function even as the level of noise is increased, but the position of the bands actually changes. For 5% extra noise we see
that GMinuit performs with a greater uniformity that Minuit as less islands are created. Finally, we notice for this
example at least, that as the amount of extra random noise is increased, GMinuit starts to out perform Minuit. Table 2
shows the actual number of points where the two method converge
Table 2
Noise
0%
2.5%
5.0%
Minuit
847
805
165
GMinuit
814
806
176
Hence, not only does the accuracy of the initial information we supply have to be more
accurate as noise level increases, but it seems GMinuit was designed to work well as
the noise level increases. It would be interesting to have some further investigations
with different levels and types of noise to see whether GMinuit actually does work
better as noise increases. The full output results for the 3 different noise levels can be
found in Appendix 4.
Pure Minimization
The final part of my project was to move away from using
minimization for fitting histograms to testing the minimization
methods for Minuit and GMinuit directly. I did this by again
adapting my program to deal with more complicated two
dimensional functions such as Rosenbrocks Curved Valley
problem* with the aim of being able to change the function
under investigation easily. I designed the program so that it
was easy to change the viewing range of the function, the
amount of detail to print out, the position about which to
concentrate the investigation and also whether I wanted just
the basic information about this position or the extra graphical
outputs. All these parameters can be set at runtime allowing
the user to view general information around the actual
minimum and then to target specific regions or points without
having to go and change any of the underlying code.
Running the program without setting any of the input parameters produced the output results in Appendix 5a. As can be
seen, there are several places where GMinuit should work but doesnt. Some of these places are actually quite close to
the actual minimum and hence even though we have fixed one problem in the code, others must exist. Again the bottom
two histograms represent the number of correctly and incorrectly minimized positions which are of different distances
away from the actual minimum. The second output screen shows the proportional number of correctly and incorrectly
minimized points with respect to the total number of points available at that level. It can be clearly seen that as the
information you supply diverges from the actual minimum, the number of points which will actually converge correctly
tends to zero. This happens quite well with Minuit but GMinuit again struggles.
Using this information and the code, it was possible to go into
detail for the points where GMinuit failed and fix two more
problems making a total of 3 fixes. Running the same program
again but now using this fixed version of GMinuit results in the
output in Appendix 5b. It can now clearly be seen that GMinuit
works well for this function and that it even out performs Minuit
in several places. This encouraging result shows that my
investigation was going in the correct direction at least and
hopefully it should be possible to find all the problems causing
the discrepancies seen.
More information about Rosenbrocks Curved Valley problem can be found in Appendix 5a.
10/3/2005
Page 6 of 13
Next step
To determine whether all the differences between Minuit and GMinuit had
been found, I tested one further function. The Goldstein and Prices function
with four minima** is another difficult problem for minimization methods
due to it having more than one local minimum. My program again adapted to
the new information, allowing me to specify more exactly how much
information I would like to see outputted as well as having an extra indicator
for the case when Minuit and GMinuit both failed but failed to different
values. I felt that this was also an important case to consider as it shows that
the method must have taken different routes to get to its final result. Upon
running my program for this function I expected there to only be a few places
where the function failed. Unfortunately, however, it was obvious there are
still major problems with GMinuit due to the large quantity or red markers dotted all over the input region. Not only did
Minuit outperform GMinuit, both methods left many islands and bands of non-convergence within areas of convergence
and vise-versa. Also, by looking at the second output results, the distance away from the actual minimum did not seem to
affect the probability of whether a set of input parameters would converge.
To get the most out of my program, it should be run with input parameters. The following is an example of the program:
root Project5GoldsteinAndPriceFunction.C+( range , extra , print , X , Y ), where
range is the distance either side of our chosen point to be displayed,
extra is whether we would like only the basic output or the full detailed program output,
print decides how much information each method should display, going from 0 to 3,
X & Y are the (x,y) coordinates about which my program should look.
The default execution parameters would effectively be equivalent to
root Project5GoldsteinAndPriceFunction.C+( 10 , 0 , 0 , 0 , -1.0 )
Conclusion
During this project, I have been able to investigate the properties of different minimization strategies. I have been able to
show graphically where each method works and compare the new and old versions of Minuit. Even though three fixes
have resulted from my investigation I have been able to show that there must be more still hidden within the code
somewhere. The most important result I have found, however, is that both the new and old versions of Minuit fail in
places where they were not expected to. It would be very interesting to be able to look at the code in far more detail and
follow the routes they take for these islands compared to the surrounding areas to find out what the reason is for the
differences. Then, it may be possible to produce more rigorous methods for finding minimizations and fits to histogramed
data.
Thus, to take this project further I would initially look at fixing the problems which are causing GMinuit to fail in so
many places where Minuit actually worked for the Goldstein and Prices function. This would be done in the same way
as before by looking at the locations which caused anomalies, seeing what results they were actually giving and trying to
use this information together with stepping through the code until the problem is discovered.
Next I would adapt my program to higher dimensions, specifically 4 dimensions, as there exists several difficult test
functions for which it would be interesting to see how GMinuit performed. As it is only really possible to display 2
dimensions effectively, I would want it to be possible to fix certain dimensions and vary others. It should therefore be
possible to specify this in the programs input parameters and at which specific values the fixed dimensions should be set
to. The information I would then get for these new test functions would either provide more information and points at
which need to be compared (so as to fix bugs within GMinuit) or increase our confidence in its algorithm. Finally, the
last extension I would make would be to display both the function and region of convergence on the same graph, as this
would make the investigation easier to visualize. Each comparison point would then be tied directly to an area of the
function being minimized.
As this investigation was very open ended with no previous information about what results to expect, I had to produce
my program so that it could evolve and develop as more information was discovered. This can be seen with the
progression of the programs I produced. The final program I produced is thus stable and has the ability to provide a firm
base to continue the investigation. In the end I decided to only compare versions of Minuit leaving the Fumili
investigation for another time. However, this wouldnt be hard to incorporate into my program if someone so wished. I
felt that comparing Minuit versions (which are generalized method) to Fumili (which is a specialized method), would not
actually produce any comparable results.
*
More information about Goldstein and Prices Function with four minima can be found in Appendix 5b.
10/3/2005
Page 7 of 13
Appendix 1
The output of my program for comparing Minuit, Fumili and GMinuit when good initial information is supplied. As can
be seen, each method fits the function but the running time of each method differs greatly.
The horizontal axis contains the bins for the range of the function.
The vertical axis is the number of elements for each bin.
10/3/2005
Page 8 of 13
Appendix 2a
Minuit
Old GMinuit
Appendix 2b
New GMinuit
10/3/2005
Page 9 of 13
Appendix 3
Fumili
10/3/2005
Page 10 of 13
Appendix 4
Results of
minimization
Neither
Both
Minuit only
GMinuit only
0% noise
Noise
0%
2.5%
5.0%
Minuit
847
805
165
GMinuit
814
806
176
2.5% noise
5.0% noise
10/3/2005
Page 11 of 13
Appendix 5a
Rosenbrocks Curved Valley
F ( x, y ) ( 100 y x 2
) + (1
x) 2
Minimum:
F (1.0,1.0) = 0
Possible starting point:
F (!0.1,0,. ) = 14.1.
This problem is probably the best known
test problem for minimization methods.
It consists of a narrow parabolic valley
with very steep sides. The floor of the
valley follows approximately the
2
Appendix 5b
10/3/2005
Page 12 of 13
Appendix 6
F ( x, y ) + 1 + (x + y + 1), 19 14 x + 0 x ,
))(
14 y + 6 xy + 0 y , 00 + (, x 0 y ), 18 0, x + 1, x , + 48 y 06 xy + , 7 y ,
Local minima:
F (1.2,0.8) = 840
F (1.8,0.2) = 84
F (!0.6,!0.4) = 30
Minimum:
F (0,!1.0) = 3
F (!0.4,!0.6) = 35
))
10/3/2005
Page 13 of 13