Professional Documents
Culture Documents
Math Essentials
Jeff Howbert
Winter 2012
Jeff Howbert
Winter 2012
HOWEVER
z To get really useful results, you need good
mathematical
at e at ca intuitions
tu t o s about ce
certain
ta ge
general
ea
machine learning principles, as well as the inner
workings of the individual algorithms.
Jeff Howbert
Winter 2012
Winter 2012
Notation
aA
z|B|
z || v ||
z
z
z
z n
z
Jeff Howbert
Winter 2012
Notation
x, y, z, vector (bold, lower case)
u, v
z A, B, X matrix (bold, upper case)
z y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
z dy
y / dx derivative of y with respect
p
to single
g
variable x
z y = f(( x ) function on multiple
p variables, i.e. a
vector of variables; function in n-space
z y / xi
partial derivative of y with respect to
element i of vector x
z
Jeff Howbert
Winter 2012
Intuition:
z In some process, several outcomes are possible.
When the process is repeated a large number of
times, each outcome occurs with a characteristic
relative frequency
frequency, or probability.
probability If a particular
outcome happens more often than another
outcome we say it is more probable
outcome,
probable.
Jeff Howbert
Winter 2012
Winter 2012
Probability spaces
A probability space is a random process or experiment with
three components:
, the set of possible outcomes O
F,
F the
th sett off possible
ibl events
t E
Jeff Howbert
Winter 2012
Axioms of probability
1.
Non-negativity:
for any event E F,
F p( E ) 0
2
2.
3.
Jeff Howbert
Winter 2012
10
Discrete space
| | is finite
Analysis involves summations ( )
Continuous
C
ti
space | | is
i infinite
i fi it
Analysis involves integrals ( )
Jeff Howbert
Winter 2012
11
Jeff Howbert
Winter 2012
12
Jeff Howbert
Winter 2012
13
Jeff Howbert
Winter 2012
14
Jeff Howbert
example: p( O = 5
58
8 ) > p( O = 6
62
2 )
Winter 2012
15
Probability distributions
z
Discrete:
example:
sum of two
fair dice
Continuous:
example:
waiting time between
eruptions of Old Faithful
(minutes)
Jeff Howbert
probability
Winter 2012
16
Random variables
z
p( X = 1 ) = p( X = 2 ) = 3 / 8
Jeff Howbert
Winter 2012
17
Scenario
Several random p
processes occur ((doesnt matter
whether in parallel or in sequence)
Want to know probabilities for each possible
combination
bi ti off outcomes
t
Can describe as joint probability of several random
variables
Example: two processes whose outcomes are
represented by random variables X and Y. Probability
that process X has outcome x and process Y has
outcome y is denoted as:
p(( X = x, Y = y )
Jeff Howbert
Winter 2012
18
Jeff Howbert
Winter 2012
19
Jeff Howbert
Winter 2012
20
Jeff Howbert
Winter 2012
21
p
probability
0.2
0 15
0.15
0.1
0.05
0
American
sport
Asian
Y = manufacturer
Jeff Howbert
SUV
minivan
European
sedan
X = model type
Winter 2012
22
Jeff Howbert
Winter 2012
23
Expected value
Given:
z A discrete random variable X,
X with possible
values x = x1, x2, xn
z Probabilities p( X = xi ) that X takes on the
various values of xi
z A function yi = f( xi ) defined on X
The expected value of f is the probability-weighted
average value of f( xi ):
E( f ) = i p( xi ) f( xi )
Jeff Howbert
Winter 2012
24
Jeff Howbert
Winter 2012
25
Jeff Howbert
Winter 2012
26
Mean ()
f( xi ) = xi
= E( f ) = i p( xi ) xi
Average value of X = xi, taking into account probability
of the various xi
Most
M t common measure off center
t off a distribution
di t ib ti
Compare to formula for mean of an actual sample
1 n
= xi
N i =1
Jeff Howbert
Winter 2012
27
Variance (2)
f( xi ) = ( xi - )
2 = i p( xi ) ( xi - )2
Average value of squared deviation of X = xi from
mean , taking into account probability of the various xi
Most
M t common measure off spread
d off a distribution
di t ib ti
is the standard deviation
Compare to formula for variance of an actual sample
1 n
2
2
=
(
x
)
i
N 1 i =1
Jeff Howbert
Winter 2012
28
Covariance
f( xi ) = ( xi - x ), g( yi ) = ( yi - y )
cov( x,
x y ) = i p( xi , yi ) ( xi - x ) ( yi - y )
high (pos
sitive)
covaria
ance
no co
ovariance
1 n
cov( x, y ) =
( xi x )( yi y )
N 1 i =1
Jeff Howbert
Winter 2012
29
Correlation
z
x y
Winter 2012
30
Complement rule
Given: event A, which can occur or not
p( not A ) = 1 - p( A )
not A
Winter 2012
31
Product rule
Given: events A and B, which can co-occur (or not)
p( A,
A B ) = p( A | B ) p( B )
(same expression given previously to define conditional probability)
(not A, not B)
( A,
A B)
(A, not B)
(not A, B)
Winter 2012
32
Jeff Howbert
Winter 2012
33
p( A ) = p( A,
A B ) + p( A,
A not B )
(same expression given previously to define marginal probability)
(not A, not B)
( A,
A B)
(A, not B)
(not A, B)
Winter 2012
34
Independence
Given: events A and B, which can co-occur (or not)
p( A | B ) = p( A )
or
p( A,
A B ) = p( A ) p( B )
(not A, not B)
(not A, B)
B
(A, not B)
( A, B )
Winter 2012
35
Independence:
Outcomes on multiple
p rolls of a die
Outcomes on multiple flips of a coin
Height of two unrelated individuals
Probability of getting a king on successive draws from
a deck, if card from each draw is replaced
D
Dependence:
d
Height of two related individuals
Duration of successive eruptions of Old Faithful
Probability of getting a king on successive draws from
a deck,, if card from each draw is not replaced
p
Jeff Howbert
Winter 2012
36
Jeff Howbert
Winter 2012
37
Bayes rule
A way to find conditional probabilities for one variable when
conditional probabilities for another variable are known.
p( B | A ) = p( A | B ) p( B ) / p( A )
where p( A ) = p( A, B ) + p( A, not B )
(not A, not B)
( A, B )
(A, not B)
Jeff Howbert
(not A, B)
Winter 2012
38
Bayes rule
posterior probability likelihood prior probability
p( B | A ) = p( A | B ) p( B ) / p( A )
(not A, not B)
( A, B )
(A, not B)
Jeff Howbert
(not A, B)
Winter 2012
39
Event B: It rains.
We know:
p( B ) = 5 / 365 = 0.0137 [ It rains 5 days out of the year. ]
p( not B ) = 360 / 365 = 0.9863
p( A | B ) = 0.9 [ When it rains, the weatherman has forecast
rain 90% of the time. ]
p( A | not B ) = 0.1
0 1 [When it does not rain
rain, the weatherman has
forecast rain 10% of the time.]
Jeff Howbert
Winter 2012
40
1.
2.
3.
Jeff Howbert
Winter 2012
41
Jeff Howbert
Winter 2012
42
Winter 2012
43
vector
Jeff Howbert
Refund Marital
Status
Taxable
Income Cheat
Yes
Single
125K
No
No
Married
100K
No
No
Single
70K
No
Yes
Married
120K
No
No
Divorced 95K
Yes
No
Married
No
Yes
Divorced 220K
No
No
Single
85K
Yes
No
Married
75K
No
No
Single
90K
Yes
60K
10
matrix
Winter 2012
44
Vectors
Definition: an n-tuple of values (usually real
numbers).
n referred to as the dimension of the vector
n can be any positive integer
integer, from 1 to infinity
z Can be written in column form or row form
Column form is conventional
Vector elements referenced by subscript
z
x1
x= M
x
n
Jeff Howbert
x T = ( x1 L xn )
T
Winter 2012
45
Vectors
z
Jeff Howbert
Winter 2012
46
Vector arithmetic
z
z = x + y = (x1 + y1 L xn + yn )
result is a vector
z
y = ax = (a x1 L axn )
result is a vector
Jeff Howbert
Winter 2012
47
Vector arithmetic
z
a = x y = xi yi
i =1
result is a scalar
y
z
a = x y = x y cos ( )
Jeff Howbert
Winter 2012
48
Matrices
Definition: an m x n two-dimensional array of
values (usually real numbers).
m rows
n columns
z Matrix referenced by two-element subscript
first element in
a11 L a1n
subscript is row
A= M O M
second element in
a
L
a
mn
m1
subscript is column
example: A24 or a24 is element in second row,
fourth column of A
z
Jeff Howbert
Winter 2012
49
Matrices
A vector can be regarded as special case of a
matrix, where one of matrix dimensions = 1.
z Matrix transpose (denoted T)
swap columns and rows
z
A =
4 6 3 1 8
Jeff Howbert
6
7
AT = 1 3
1
0
3
Winter 2012
50
Matrix arithmetic
z
C= A+B =
a11 + b11 L a1n + b1n
M
O
M
a + b
L
a
+
b
m
1
m
1
mn
mn
bij = d aij
Jeff Howbert
B = d A =
d a11 L d a1n
O
M
M
d a
L
d
a
m1
mn
Winter 2012
51
Matrix arithmetic
z
Matrix-matrix multiplication
vector-matrix multiplication
p
jjust a special
p
case
TO THE BOARD!!
z
z
Multiplication is associative
A(BC)=(AB)C
Multiplication is not commutative
A B B A (generally)
Transposition rule:
( A B )T = B T A T
Jeff Howbert
Winter 2012
52
Matrix arithmetic
RULE: In any chain of matrix multiplications, the
column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
z Examples
A3x5
B5x5
C3x1
Right:
A B AT CT A B AT A B C CT A
Wrong:
ABA
CAB
A AT B CT C A
z
Jeff Howbert
Winter 2012
53
Vector projection
z
x
|| y || cos( )
projx( y )
Orthogonal projection of
y onto x is the vector
projx( y ) = x || y || cos( ) / || x || =
[ ( x y ) / || x ||2 ] x (using dot product alternate form)
Jeff Howbert
Winter 2012
54
Jeff Howbert
Winter 2012
55