Professional Documents
Culture Documents
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Stephen Scott
Boosting
(Adapted from Ethem Alpaydin and Rob Schapire and Yoav Freund)
sscott@cse.unl.edu
1 / 19
Introduction
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
2 / 19
Outline
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
3 / 19
Bagging
Boosting
Bagging
[Breiman, ML Journal, 1996]
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Experiment
Stability
Boosting
Bagging Experiment
[Breiman, ML Journal, 1996]
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Experiment
Stability
Boosting
5 / 19
Bagging Experiment
Results
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Experiment
Stability
Boosting
6 / 19
Data Set
waveform
heart
breast cancer
ionosphere
diabetes
glass
soybean
eS
29.0
10.0
6.0
11.2
23.4
32.0
14.5
eB
19.4
5.3
4.2
8.6
18.8
24.9
10.6
Decrease
33%
47%
30%
23%
20%
27%
27%
Bagging Experiment
(contd)
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Experiment
Stability
Boosting
7 / 19
eS
26.1
6.3
4.9
35.7
16.4
16.4
eB
26.1
6.3
4.9
35.7
16.4
16.4
Decrease
0%
0%
0%
0%
0%
0%
Outline
Bagging
Experiment
Stability
Boosting
8 / 19
Boosting
[Schapire & Freund Book]
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
9 / 19
Boosting
Algorithm Idea [pj Dj ; dj hj ]
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Repeat for j = 1, . . . , L:
1
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
10 / 19
Boosting
Algorithm Pseudocode (Fig 17.2)
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
11 / 19
each round, the weights of incorrectly classified examples are increased so that, effectively,
hard examples get successively higher weight, forcing the base learner to focus its attention
on them.
Boosting
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
e t
if ht (xi ) = yi
Zt
=
12 / 19
$ T
%
t=1
&
t ht (x) .
Boosting
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
1
2
1 Introduction
1 and Ove
ln(1/j ) =
h1
2
10
9
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
13 / 19
D2
h2
1
2
ln
j
Boosting
Schapire & Freund Example [Dj = pj ; hj = dj ; j =
1
2
ln(1/j ) =
1
2
ln
1.2 Boosting
CSCE
478/878
Lecture 7:
Bagging and
Boosting
1j
j
]
9
Table 1.1
The numerical calculations corresponding to the toy example in figure 1.1
1
10
D1 (i)
e1 yi h1 (xi )
D1 (i) e1 yi h1 (xi )
0.10
1.53
0.15
0.10
1.53
0.15
0.10
1.53
0.15
0.10
0.65
0.07
0.10
0.65
0.07
0.10
0.65
0.07
0.10
0.65
0.07
0.10
0.65
0.07
0.10
0.65
0.07
0.10
0.65
0.07
D2 (i)
e2 yi h2 (xi )
D2 (i) e2 yi h2 (xi )
0.17
0.52
0.09
0.17
0.52
0.09
0.17
0.52
0.09
0.07
0.52
0.04
0.07
0.52
0.04
0.07
1.91
0.14
0.07
1.91
0.14
0.07
0.52
0.04
0.07
1.91
0.14
0.07
0.52
0.04
D3 (i)
e3 yi h3 (xi )
D3 (i) e3 yi h3 (xi )
0.11
0.40
0.04
0.11
0.40
0.04
0.11
0.40
0.04
0.05
2.52
0.11
0.05
2.52
0.11
0.17
0.40
0.07
0.17
0.40
0.07
0.05
2.52
0.11
0.17
0.40
0.07
0.05
0.40
0.02
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
1 = 0.30, 1 0.42
Z1 0.92
2 0.21, 2 0.65
Z2 0.82
3 0.14, 3 0.92
Z3 0.69
Calculations are shown for the ten examples as numbered in the figure. Examples on which hypothesis ht makes
a mistake are indicated by underlined figures in the rows marked Dt .
14 / 19
Boosting
Schapire & Freund Example [Dj = pj ; hj = dj ; j =
1
2
ln(1/j ) =
1
2
ln
1j
j
]
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
D3
h3
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
Figure 1.1
An illustration of how AdaBoost behaves on a tiny toy problem with m = 10 examples. Each row depi
round, for t = 1, 2, 3. The left box in each row represents the distribution Dt , with the size of each example
in proportion to its weight under that distribution. Each box on the right shows the weak hypothesis ht ,
darker shading indicates the region of the domain predicted to be positive. Examples that are misclassifie
have been circled.
15 / 19
Boosting
Example (contd)
CSCE
478/878
Lecture 7:
Bagging and
Boosting
H final = sign
0.42
+ 0.65
+ 0.92
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Not in original
hypothesis class!
Example
Experimental Results
Miscellany
Boosting
Experimental Results
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
12
and Overview
Scatter
plot: Percent classification error1ofIntroduction
non-boosted
vs
boosted on 27 learning tasks
Introduction
30
80
Outline
25
Bagging
Example
C4.5
Algorithm
Stumps
Boosting
60
40
Experimental Results
20
15
10
Miscellany
20
5
0
17 / 19
20
40
60
Boosting stumps
80
10
15
20
25
Boosting C4.5
30
Figure 1.3
Comparison of two base learning algorithmsdecision stumps and C4.5with and without boosting. Each point
in each scatterplot shows the test error rate of the two competing algorithms on one of 27 benchmark learning
problems. The x-coordinate of each point gives the test error rate (in percent) using boosting, and the y-coordinate
gives the error rate without boosting when using decision stumps (left plot) or C4.5 (right plot). All error rates
20
Boosting
40
60
Boosting stumps
80
10
15
20
25
Boosting C4.5
30
Figure 1.3
Experimental
Results (contd)
Comparison of two base learning algorithmsdecision stumps and C4.5with and without boosting. Each point
CSCE
478/878
Lecture 7:
Bagging and
Boosting
in each scatterplot shows the test error rate of the two competing algorithms on one of 27 benchmark learning
problems. The x-coordinate of each point gives the test error rate (in percent) using boosting, and the y-coordinate
gives the error rate without boosting when using decision stumps (left plot) or C4.5 (right plot). All error rates
have been averaged over multiple runs.
Stephen Scott
30
30
25
25
Bagging
Boosting
Algorithm
Boosting C4.5
Outline
C4.5
Introduction
20
15
20
15
10
10
Example
Experimental Results
Miscellany
10
15
20
25
Boosting stumps
30
10
15
20
25
Boosting stumps
30
Figure 1.4
Comparison of boosting using decision stumps as the base learner versus unboosted C4.5 (left plot) and boosted
C4.5 (right plot).
18 / 19
Boosting
Miscellany
CSCE
478/878
Lecture 7:
Bagging and
Boosting
Stephen Scott
Introduction
Outline
Bagging
Boosting
Algorithm
Example
Experimental Results
Miscellany
19 / 19