Professional Documents
Culture Documents
• Easy to implement
• Gradient update:
1 X >
w+ = w (xi w yi )xi
m i
Requires an entire pass through the data!
CS 584 [Spring 2016] - Ho
Tackling Compute Problems: Scaling to Large n
• Streaming implementation
If n ! 1,
then "est ! 0
w+ = w rw L(fw (xi , yi ))
• Is convergence necessary?
• Batch: O(nd)
Batch is doable if n is moderate
• Stochastic: O(d) but not when n is huge
• Mini-batch: O(|Sk|d)
• Experiment with the learning rates using small sample of training set