You are on page 1of 32

Statistical Learning

& Inference

Lecturer: Liqing Zhang


Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
Books and References
– Trevor Hastie Robert Tibshirani Jerome Friedman , The Elements
of statistical Learning: Data Mining, Inference, and Prediction, 2001,
Springer-Verlag

– V. Cherkassky & F. Mulier, Learning From Data, Wiley,1998


– Vladimir N. Vapnik, The Nature of Statistical Learning Theory,
2nd ed., Springer, 2000
– M. Vidyasagar, Learning and generalization: with applications to
neural networks, 2nd ed., Springer, 2003
– G. Casella & R. Berger, Statistical Inference, Thomson, 2002
– T. Cover & J. Thomas, Elements of Information Theory, Wiley

2018/10/25 Statistical Learning and Inference 2


Overview of the Course
 Introduction
 Overview of Supervised Learning
 Linear Method for Regression and Classification
 Basis Expansions and Regularization
 Kernel Methods
 Model Selections and Inference
 Support Vector Machine
 Bayesian Inference
 Unsupervised Learning

2018/10/25 Statistical Learning and Inference 3


Why Statistical Learning?
 我门被信息淹没,但却缺乏知识。---- R. Roger
 恬静的统计学家改变了我们的世界;不是通过发现新的事实或者
开发新技术,而是通过改变我们的推理、实验和观点的形成方式。
---- I. Hacking
 问题:为什么现在的计算机处理智能信息效率很低?
– 图像、视频、音频
– 认知、交流
– 语言、语音、文本
– 生物、基因、蛋白

2018/10/25 Statistical Learning and Inference 4


Cloud Computing
Cloud Computing Service Layers
Services Description
Services – Complete business services such as PayPal,
Services OpenID, OAuth, Google Maps, Alexa

Application Application – Cloud based software that eliminates the


need for local installation such as Google Apps,
Application
Focused Microsoft Online

Development – Software development platforms used to


Development build custom cloud based applications (PAAS & SAAS)
such as SalesForce

Platform – Cloud based platforms, typically provided


Platform using virtualization, such as Amazon ECC, Sun Grid

Storage – Data storage or cloud based NAS such as


Infrastructure Storage CTERA, iDisk, CloudNAS

Focused
Hosting – Physical data centers such as those run by
Hosting IBM, HP, NaviSite, etc.
心电采集和初
个人用户需要更多的功能: 步诊断
•疾病监护/心肺疾病
问题心电发
•康复训练

•健身指导等

调动社区医院空闲资源
反馈治疗建

自动诊断和辅助诊断
数据共享(远程医生)
咨询系统

社区医院
远程诊疗与监护
中心

人工诊断和治 数据发送
疗建议 社区医院
诊断结果反馈 医院医生也是远程种 社区医院也需要更多的功能:
新的用户 •心电、呼吸、血压
•慢性病康复训练
•健身指导等
ML: SARS Risk Prediction
SARS
Risk

White Count

RBC Count
Albumin
Gender

Blood pO2
Chest X-Ray
Age

Blood Pressure

Pre-Hospital In-Hospital
Attributes Attributes
2018/10/25 Statistical Learning and Inference 8
ML: Auto Vehicle Navigation
Steering Direction

2018/10/25 Statistical Learning and Inference 9


Protein Folding

2018/10/25 Statistical Learning and Inference 10


The Scale of Biomedical Data

2018/10/25 Statistical Learning and Inference 11


General Procedure in SL

Problem Definition

Predictions Data Acquisition


ML
Procedure
Model Training
Feature Analysis
EX. Pattern Classification
 Objective: To recognize horse in images

 Procedure: Feature => Classifier => Cross+Valivation

2018/10/25 Statistical Learning and Inference 13


Classifier

Horse

Non Horse

2018/10/25 Statistical Learning and Inference 14


Function Estimation Model
 The Function Estimation Model of learning
examples:
– Generator (G) generates observations x (typically in Rn),
independently drawn from some fixed distribution F(x)
– Supervisor (S) labels each input x with an output value y
according to some fixed distribution F(y|x)
– Learning Machine (LM) “learns” from an i.i.d. l-sample of
(x,y)-pairs output from G and S, by choosing a function that
best approximates S from a parameterised function class f(x,),
where  is in  the parameter set

2018/10/25 Statistical Learning and Inference 15


Function Estimation Model
x y
G S

LM ^y

 Key concepts: F(x,y), an i.i.d. k-sample on F,


functions f(x,) and the equivalent representation of
each f using its index 

2018/10/25 Statistical Learning and Inference 16


The Problem of Risk Minimization
 The loss functional (L, Q)
– the error of a given function on a given example
L : x, y, f   L y, f x,  
Q : z,    Lz y , f z x ,  
 The risk functional (R)
– the expected loss of a given function on an example
drawn from F(x,y)
– the (usual concept of) generalisation error of a given
function
R     Q  z ,  dF  z 
2018/10/25 Statistical Learning and Inference 17
The Problem of Risk Minimization
 Three Main Learning Problems
– Pattern Recognition:
y 0,1and L y, f x,   1y  f x, 

– Regression Estimation:
y   and L y, f x,     y  f x,  
2

– Density Estimation:
y  0,1 and L px,    log px, 
2018/10/25 Statistical Learning and Inference 18
General Formulation
 The Goal of Learning
– Given an i.i.d. k-sample z1,…, zk drawn from a fixed
distribution F(z)
– For a function class’ loss functionals Q (z ,), with 
in 
– We wish to minimise the risk, finding a function *

 *  arg min R 


 

2018/10/25 Statistical Learning and Inference 19


General Formulation
 The Empirical Risk Minimization (ERM) Inductive
Principle
– Define the empirical risk (sample/training error):
1 k
Remp     Qzi ,  
k i 1
– Define the empirical risk minimiser:
 k  arg min Remp  
 
– ERM approximates Q (z ,*) with Q (z ,k) the Remp
minimiser…that is ERM approximates * with k
– Least-squares and Maximum-likelihood are realisations
of ERM

2018/10/25 Statistical Learning and Inference 20


4 Issues of Learning Theory
1. Theory of consistency of learning processes
• What are (necessary and sufficient) conditions for consistency
(convergence of Remp to R) of a learning process based on the ERM
Principle?
2. Non-asymptotic theory of the rate of convergence of learning
processes
• How fast is the rate of convergence of a learning process?
3. Generalization ability of learning processes
• How can one control the rate of convergence (the generalization ability)
of a learning process?
4. Constructing learning algorithms (i.e. the SVM)
• How can one construct algorithms that can control the generalization
ability?

2018/10/25 Statistical Learning and Inference 21


Change in Scientific Methodology
TRADITIONAL NEW

 Formulate hypothesis  Design large experiments


 Design experiment  Collect large data
 Collect data  Put data in large database
 Analyze results  Formulate hypothesis
 Review hypothesis  Evaluate hypothesis on
database
 Repeat/Publish
 Run limited experiments
 Review hypothesis
 Repeat/Publish
2018/10/25 Statistical Learning and Inference 22
Learning & Adaptation
 Any method that incorporates information from training
samples in the design of a classifier employs learning.
 Due to complexity of classification problems, we cannot
guess the best classification decision ahead of time, we need
to learn it.
 Creating classifiers then involves positing some general
form of model, or form of the classifier, and using examples
to learn the complete classifier.

2018/10/25 Statistical Learning and Inference 23


Supervised learning
 In supervised learning, a teacher provides a
category label for each pattern in a training set.
These are then used to train a classifier which can
thereafter solve similar classification problems by
itself.
– Such as Face Recognition, Text Classification, ……

2018/10/25 Statistical Learning and Inference 24


Unsupervised learning
 In unsupervised learning, or clustering, there is no
explicit teacher or training data. The system forms
natural clusters of input patterns and classifiers
them based on clusters they belong to .

– Data Clustering, Data Quantization, Dimensional


Reduction, ……

2018/10/25 Statistical Learning and Inference 25


Reinforcement learning
 In reinforcement learning, a teacher only says to
classifier whether it is right when suggesting a
category for a pattern. The teacher does not tell
what the correct category is.

– Agent, Robot, ……

2018/10/25 Statistical Learning and Inference 26


Classification
 The task of the classifier component is to use the feature
vector provided by the feature extractor to assign the object
to a category.
 Classification is the main topic of this course.
 The abstraction provided by the feature vector
representation of the input data enables the development of
a largely domain-independent theory of classification.
 Essentially the classifier divides the feature space into
regions corresponding to different categories.

2018/10/25 Statistical Learning and Inference 27


Classification
 The degree of difficulty of the classification problem
depends on the variability in the feature values for objects in
the same category relative to the feature value variation
between the categories.
 Variability is natural or is due to noise.
 Variability can be described through statistics leading to
statistical pattern recognition.

2018/10/25 Statistical Learning and Inference 28


Classification
 Question: How to design a classifier that can cope
with the variability in feature values? What is the
best possible performance?
Object Representation in Feature Space
S(x)>=0
Class A S(x)<0 Class B
Noise and Biological Variations Cause Class Spread
X2
(area)
Classification error due to class
S(x)=0
overlap
Objects
(perimeter) X1
2018/10/25 Statistical Learning and Inference 29
Examples
 User interfaces: modelling subjectivity and affect,
intelligent agents, transduction (input from camera,
microphone, or fish sensor)
 Recovering visual models: face recognition, model-
based video, avatars
 Dynamical systems: speech recognition, visual
tracking, gesture recognition, virtual instruments
 Probabilistic modeling: image compression, low
bandwidth teleconferencing, texture synthesis
 ……

2018/10/25 Statistical Learning and Inference 30


Course Web
 http://bcmi.sjtu.edu.cn/statLearnig/

 Teaching Assistant:
招浩华<haoh.zhao@sjtu.edu.cn>

2018/10/25 Statistical Learning and Inference 31


Assignment
 To write a report on the topic you are working on,
including:
– Problem definition
– Model and method
– Key issues to be solved
– Outcome

2018/10/25 Statistical Learning and Inference 32

You might also like