Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Statistical Learning
& Inference
Lecturer: Liqing Zhang

Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
Books and References
– Trevor Hastie Robert Tibshirani Jerome Friedman , The Elements
of statistical Learning: Data Mining, Inference, and Prediction, 2001,
Springer-Verlag
– V. Cherkassky & F. Mulier, Learning From Data, Wiley,1998

– Vladimir N. Vapnik, The Nature of Statistical Learning Theory,
2nd ed., Springer, 2000
– M. Vidyasagar, Learning and generalization: with applications to
neural networks, 2nd ed., Springer, 2003
– G. Casella & R. Berger, Statistical Inference, Thomson, 2002
– T. Cover & J. Thomas, Elements of Information Theory, Wiley
2018/10/25 Statistical Learning and Inference 2

Overview of the Course
 Introduction
 Overview of Supervised Learning
 Linear Method for Regression and Classification
 Basis Expansions and Regularization
 Kernel Methods
 Model Selections and Inference
 Support Vector Machine
 Bayesian Inference
 Unsupervised Learning

Why Statistical Learning?
 我门被信息淹没，但却缺乏知识。---- R. Roger
 恬静的统计学家改变了我们的世界；不是通过发现新的事实或者
开发新技术，而是通过改变我们的推理、实验和观点的形成方式。
---- I. Hacking
 问题：为什么现在的计算机处理智能信息效率很低？
– 图像、视频、音频
– 认知、交流
– 语言、语音、文本
– 生物、基因、蛋白

Cloud Computing
Cloud Computing Service Layers
Services Description
Services – Complete business services such as PayPal,
Services OpenID, OAuth, Google Maps, Alexa
Application Application – Cloud based software that eliminates the

need for local installation such as Google Apps,
Application
Focused Microsoft Online
Development – Software development platforms used to

Development build custom cloud based applications (PAAS & SAAS)
such as SalesForce
Platform – Cloud based platforms, typically provided

Platform using virtualization, such as Amazon ECC, Sun Grid
Storage – Data storage or cloud based NAS such as

Infrastructure Storage CTERA, iDisk, CloudNAS
Focused
Hosting – Physical data centers such as those run by
Hosting IBM, HP, NaviSite, etc.
心电采集和初
个人用户需要更多的功能：步诊断
•疾病监护/心肺疾病
问题心电发
•康复训练
送
•健身指导等
调动社区医院空闲资源
反馈治疗建
议
自动诊断和辅助诊断
数据共享（远程医生）
咨询系统
社区医院
远程诊疗与监护
中心
人工诊断和治数据发送
疗建议社区医院
诊断结果反馈医院医生也是远程种社区医院也需要更多的功能：
新的用户 •心电、呼吸、血压
•慢性病康复训练
•健身指导等
ML: SARS Risk Prediction
SARS
Risk
White Count
RBC Count
Albumin
Gender
Blood pO2
Chest X-Ray
Age
Blood Pressure
Pre-Hospital In-Hospital
Attributes Attributes
ML: Auto Vehicle Navigation
Steering Direction

Protein Folding

The Scale of Biomedical Data

General Procedure in SL
Problem Definition
Predictions Data Acquisition

ML
Procedure
Model Training
Feature Analysis
EX. Pattern Classification
 Objective: To recognize horse in images
 Procedure: Feature => Classifier => Cross+Valivation

Classifier
Horse
Non Horse

Function Estimation Model
 The Function Estimation Model of learning
examples:
– Generator (G) generates observations x (typically in Rn),
independently drawn from some fixed distribution F(x)
– Supervisor (S) labels each input x with an output value y
according to some fixed distribution F(y|x)
– Learning Machine (LM) “learns” from an i.i.d. l-sample of
(x,y)-pairs output from G and S, by choosing a function that
best approximates S from a parameterised function class f(x,),
where  is in  the parameter set

Function Estimation Model
x y
G S
LM ^y
 Key concepts: F(x,y), an i.i.d. k-sample on F,

functions f(x,) and the equivalent representation of
each f using its index 

The Problem of Risk Minimization
 The loss functional (L, Q)
– the error of a given function on a given example
L : x, y, f   L y, f x,  
Q : z,    Lz y , f z x ,  
 The risk functional (R)
– the expected loss of a given function on an example
drawn from F(x,y)
– the (usual concept of) generalisation error of a given
function
R     Q  z ,  dF  z 
The Problem of Risk Minimization
 Three Main Learning Problems
– Pattern Recognition:
y 0,1and L y, f x,   1y  f x, 
– Regression Estimation:
y   and L y, f x,     y  f x,  
2
– Density Estimation:
y  0,1 and L px,    log px, 
General Formulation
 The Goal of Learning
– Given an i.i.d. k-sample z1,…, zk drawn from a fixed
distribution F(z)
– For a function class’ loss functionals Q (z ,), with 
in 
– We wish to minimise the risk, finding a function *
 *  arg min R 

 

General Formulation
 The Empirical Risk Minimization (ERM) Inductive
Principle
– Define the empirical risk (sample/training error):
1 k
Remp     Qzi ,  
k i 1
– Define the empirical risk minimiser:
 k  arg min Remp  
 
– ERM approximates Q (z ,*) with Q (z ,k) the Remp
minimiser…that is ERM approximates * with k
– Least-squares and Maximum-likelihood are realisations
of ERM

4 Issues of Learning Theory
1. Theory of consistency of learning processes
• What are (necessary and sufficient) conditions for consistency
(convergence of Remp to R) of a learning process based on the ERM
Principle?
2. Non-asymptotic theory of the rate of convergence of learning
processes
• How fast is the rate of convergence of a learning process?
3. Generalization ability of learning processes
• How can one control the rate of convergence (the generalization ability)
of a learning process?
4. Constructing learning algorithms (i.e. the SVM)
• How can one construct algorithms that can control the generalization
ability?

Change in Scientific Methodology
TRADITIONAL NEW
 Formulate hypothesis  Design large experiments

 Design experiment  Collect large data
 Collect data  Put data in large database
 Analyze results  Formulate hypothesis
 Review hypothesis  Evaluate hypothesis on
database
 Repeat/Publish
 Run limited experiments
 Review hypothesis
 Repeat/Publish
Learning & Adaptation
 Any method that incorporates information from training
samples in the design of a classifier employs learning.
 Due to complexity of classification problems, we cannot
guess the best classification decision ahead of time, we need
to learn it.
 Creating classifiers then involves positing some general
form of model, or form of the classifier, and using examples
to learn the complete classifier.

Supervised learning
 In supervised learning, a teacher provides a
category label for each pattern in a training set.
These are then used to train a classifier which can
thereafter solve similar classification problems by
itself.
– Such as Face Recognition, Text Classification, ……

Unsupervised learning
 In unsupervised learning, or clustering, there is no
explicit teacher or training data. The system forms
natural clusters of input patterns and classifiers
them based on clusters they belong to .
– Data Clustering, Data Quantization, Dimensional

Reduction, ……

Reinforcement learning
 In reinforcement learning, a teacher only says to
classifier whether it is right when suggesting a
category for a pattern. The teacher does not tell
what the correct category is.
– Agent, Robot, ……

Classification
 The task of the classifier component is to use the feature
vector provided by the feature extractor to assign the object
to a category.
 Classification is the main topic of this course.
 The abstraction provided by the feature vector
representation of the input data enables the development of
a largely domain-independent theory of classification.
 Essentially the classifier divides the feature space into
regions corresponding to different categories.

Classification
 The degree of difficulty of the classification problem
depends on the variability in the feature values for objects in
the same category relative to the feature value variation
between the categories.
 Variability is natural or is due to noise.
 Variability can be described through statistics leading to
statistical pattern recognition.

Classification
 Question: How to design a classifier that can cope
with the variability in feature values? What is the
best possible performance?
Object Representation in Feature Space
S(x)>=0
Class A S(x)<0 Class B
Noise and Biological Variations Cause Class Spread
X2
(area)
Classification error due to class
S(x)=0
overlap
Objects
(perimeter) X1
Examples
 User interfaces: modelling subjectivity and affect,
intelligent agents, transduction (input from camera,
microphone, or fish sensor)
 Recovering visual models: face recognition, model-
based video, avatars
 Dynamical systems: speech recognition, visual
tracking, gesture recognition, virtual instruments
 Probabilistic modeling: image compression, low
bandwidth teleconferencing, texture synthesis
 ……

Course Web
 http://bcmi.sjtu.edu.cn/statLearnig/
 Teaching Assistant:
招浩华<haoh.zhao@sjtu.edu.cn>

Assignment
 To write a report on the topic you are working on,
including:
– Problem definition
– Model and method
– Key issues to be solved
– Outcome

Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Uploaded by

Copyright:

Available Formats

Statistical Learning

Lecturer: Liqing Zhang

– V. Cherkassky & F. Mulier, Learning From Data, Wiley,1998

2018/10/25 Statistical Learning and Inference 2

2018/10/25 Statistical Learning and Inference 3

2018/10/25 Statistical Learning and Inference 4

Application Application – Cloud based software that eliminates the

Development – Software development platforms used to

Platform – Cloud based platforms, typically provided

Storage – Data storage or cloud based NAS such as

2018/10/25 Statistical Learning and Inference 9

2018/10/25 Statistical Learning and Inference 10

2018/10/25 Statistical Learning and Inference 11

Predictions Data Acquisition

 Procedure: Feature => Classifier => Cross+Valivation

2018/10/25 Statistical Learning and Inference 13

2018/10/25 Statistical Learning and Inference 14

2018/10/25 Statistical Learning and Inference 15

 Key concepts: F(x,y), an i.i.d. k-sample on F,

2018/10/25 Statistical Learning and Inference 16

 *  arg min R 

2018/10/25 Statistical Learning and Inference 19

2018/10/25 Statistical Learning and Inference 20

2018/10/25 Statistical Learning and Inference 21

 Formulate hypothesis  Design large experiments

2018/10/25 Statistical Learning and Inference 23

2018/10/25 Statistical Learning and Inference 24

– Data Clustering, Data Quantization, Dimensional

2018/10/25 Statistical Learning and Inference 25

2018/10/25 Statistical Learning and Inference 26

2018/10/25 Statistical Learning and Inference 27

2018/10/25 Statistical Learning and Inference 28

2018/10/25 Statistical Learning and Inference 30

2018/10/25 Statistical Learning and Inference 31

2018/10/25 Statistical Learning and Inference 32

You might also like