You are on page 1of 10

Statistical Techniques For Robotics

Shivam Gautam
sgautam@andrew.cmu.edu
Section 2
A 2.1. Realizability: There exists a hypothesis in the hypothesis space that is always correct at any time
instance. Realizability ensures that

All answers are generated by a target mapping


There always exists a perfect hypothesis

It is important because if we assume realizability then we can place bounds on the number of
mistakes of the learner. If the assumption of realizability is relaxed, then we have to find bounds on
the regret.
A2.2 Hypothesis Class: It is a set of hypothesis in the hypothesis space, each of which represent a belief
about the correct answer. Formally, it is a set of target mapping functions which could represent
classifiers, predictors or regressors. This observation to possible outcome outcomes mapping can be
represented as-

Since the hypothesis class includes predictors and regressors, the hypothesis class can be infinite.
It is important because it restricts the search space to find the correct hypothesis to solve a problem.
Selection of a hypothesis space biases the search for the correct hypothesis in the hypothesis class and
enables learning. The question of finite or infinite hypothesis space is important as this may be the
define whether we obtain the perfect realizable hypothesis (in case of the finite hypothesis spaces) to
the learner just estimating the closest possible hypothesis to the correct hypothesis (in case of infinite
hypothesis).
A2.3 Regret can be negative in the instance of when the learning algorithm performs better with respect
to a hypothesis. The regret with respect to a single hypothesis class can be defined as

If the hypothesis h is selected such that it performs poorly than our learner, then the loss of hypothesis
h will be more than the learner, leading to negative regret.
Example:
Iteration

Actual Value

Expert 1 Prediction

Learner Prediction

1
2
3

1
1
0

0
0
1

1
1
0

Learner Regret
with Respect to
Expert 1
-1
-2
-3

Since after every iteration, the expert makes a mistake and the learner does the exact opposite of what
the expert says, the regret of the learner is negative.
The regret with respect to the best expert can be negative if the realizibility constraint is relaxed as the
best performing expert is not guaranteed to be always right, thereby increasing its loss. If the learner is
such that it performs better than the best performing expert, then the value of regret which would
follow from the formula would be negative.

Iteration

Actual Value

Expert 1

Expert 2
Prediction

Learner
Prediction

1
2
3

1
1
0

1
0
0

0
0
1

1
1
0

Learner
Regret with
Respect to
Expert 1
0
-1
-1

Here we can see that the best performing expert, expert 1, gets two out of three predictions correct
making it much better than the expert 2. However, the Learner which gets everything correct (by simply
taking the not of what expert 2 says) performs much better than the best performing expert. However,
if this were the realizable case, the learner could at best get as many correct outputs as the expert who
is always correct. This effectively lower bounds the regret to 0.

A2.4
Considering, the worst case -the learner would make atleast one mistake at every time step since there
would exist atleast one hypotheses that would have made a mistake. Hence after N timesteps the best
hypothesis will have to have made atleast one mistake.
Therefore, the time to make m* mistakes by the best hypothesis would have to be N x m* + (N-1). The
number of timesteps before reaching the next mistake can be a maximum of N-1.
Hence the bound is M<Nm* + N -1

A2.5 (1)

(2) If the number of prediction rounds T is much smaller compared to the number of experts N, we
should choose a high value of n. This follows from the optimal value of n derived above. If the number of
rounds are less, the number of mistakes, m*, committed so far would also be small. Since the value of N
is very large as compared to m*, the optimal value of n would be much larger.

(3) To ensure that the average regret tends to 0 as time tends to infinity, if m* is of the order of O(T),
we would want n to be sublinear in T i.e a value of (T)^0.5 would be a good choice. This is due to the fact
the n x m* term would contribute a collective O(T^0.5) and the lnN/n term, assuming N is O(T) would
also contribute O(T^0.5).

If m* is sublinear, we would want the sum of powers of n and m* to be less than 1 which would make
the entire algorithm sublinear in T.
A2.6
(1) The adversary can select a strategy in which he weighs each expert according to a weight to estimate
the prediction and then always selects the current answer as the opposite to this predicted weighted
answer.

(2) The loss for the WMA algorithm against the worst adversary will always be 1 as the worst adversay
can deterministically predict the answer from the WMA case.
However, since the RWMA algorithm does not stick to the hypothesis with the maximum weight but
rather selects the hypothesis based on a probability distribution on the weights. This means that the
probability of selection of the hypothesis with the highest weight is high but it is not the one that is
always picked. This can help us defeat the worst adversary because he cannot necessarily guess our
randomness (unless he has access to our random number generator!).
The expected loss is strictly better for RWMA as we have assumed that at least one of the expert is
correct once i.e. there exists a probability of selecting this correct expert. Without this assumption, both
RWMA and WMA would have had the same maximum loss as none of the experts is correct ever.

Section 3
3.3 Weighted Majority Algorithm

Deterministic Nature

The deterministic nature is chosen such that it is optimistic in nature and conforms with the Expert 0.
Since the learner starts off with equal weights on all experts and wrong predictions from the pessimisitic
and even-off expert, the initial regret is 1.
After downweighing these two experts, the regret of the algorithm decreases. When the odd-even
hypothesis agrees with the nature on odd examples, it is unchanged (relatively upweighted with respect
to the pessimistic one). However, this expert makes a mistake in the next iteration which results in the
kinks in the graph.
The optimistic expert meanwhile has been correct always and its weight remains unchanged. After a few
mistakes by the odd-even expert, its weight is such that it dominates the WMA algorithm which
decreases the regret of the algorithm as the weight increases.

Adversarial Nature

The adversarial nature is based on disproving the weighted majority algorithm- it always produces a
counter to the WMA. As a result, the loss of the learner keeps increasing linearly as it always makes a
mistake at every iteration.
The average regret is computed with respect to the best learner- which is why the oscillations can be
observed in the graph. Since at every iteration, a minimum of two of the three experts are incorrect, the
best expert keeps shifting. The value appears to settle to 0.5 as the odd-even expert commits a mistake
at every iteration- effectively reducing its weight to a negligible amount which results in the regret being
measured with respect to either the optimistic or the pessimistic expert; both of which are right half of
the time.
Stochastic Environment
Since this environment is randomly generated, neither of the three experts have a notion of what the
correct output is. Hence each of their losses keeps increasing erratically.
The average regret however keeps on decreasing as it is computed with respect to the best expert.
Since all three experts perform poorly, their performance is comparable and the best hypothesis does
not outperform the other two.

3.4 Randomized Weighted Majority Algorithm


Deterministic Environment

In the RWMA for the deterministic nature, the loss for the pessimistic expert keeps on increasing as
expected, whereas the optimistic expert is always correct. The learner however is correct whenever it
takes a weighted majority of the inputs and the inputs are such that they weigh up to the right answer.
However, when RWMA makes some mistakes intermittently as the RWMA probabilistically selects the
incorrect pessimistic hypothesis sometimes. However the average regret of algorithm keeps on
decreasing as when the number of iterations increase, the incorrect experts gets downweighted a lothence decreasing the probability with which it would be selected.
Adversarial Environment

For the adversarial case, the RWMA performs much better than the WMA as it can randomly select a
hypothesis that might not be what the adversary has planned for.
Stochastic Environment

Since this environment is randomly generated, neither of the three experts have a notion of what the
correct output is. Hence each of their losses keeps increasing erratically.
The average regret however keeps on decreasing as it is computed with respect to the best expert.
Since all three experts perform poorly, their performance is comparable and the best hypothesis does
not outperform the other two.

3.5 More Experts and Observations and Features


The extra observation that was added to the nature was Form- the number of wins/losses in the last ten
games.
The experts that I added were
1. Expert No 3- A random expert which predicts the outcome randomly
2. Expert No 4- An expert that predicts based on the form of the team in the last ten games
3. Expert No 5- An expert that predicts based on sampling from a multinomial distribution based
on form
As expected, expert no 3 fails drastically as it uses no contextual information about the observations.
Expert No 4 makes perfect predictions always, and hence its loss is always zero.
Expert No 5 makes probabilistically correct predictions but this does not stop it from making it mistakes
at certain time steps. This is still better than the Optimistic, Pessimistic and the Odd even expert which
do not make use of the form of the team that the deterministic nature relies on. The average regret

reduces for the learner by the addition of these extra experts as the form is used in deterministically
getting the label from the nature.

Addition of 3 New Experts for the Deterministic WMA algorithm

Addition of 3 New Experts for the Deterministic RWMA algorithm


Collaboration : Mohak Bhardwaj and Rushat Gupta Chadha

You might also like