Fintech ML Using Azure

Hands-On Machine Learning
for Finance Professionals
Sandeep Khurana
Introduction to Machine Learning
What is machine learning, a brief history
What are different machine learning models?
What do they do? When to use them?
How to identify Features and how to tune Parameters?
Machine Learning case study using Azure ML Studio.
(c) QuantLeap Consulting

Agenda
1. Motivation for ML
2. ML algorithms
1. Classification
2. Regression
3. Introduction to Azure ML Studio
4. Environment setup
1. Data Preparation
2. Modeling of Problem
5. Introduction to Kaggle
Hands-on Case study
• Credit Risk classification

Sec 1: Introduction to
Machine Learning
Intelligence
Human Intelligence(Core Processes)* Artificial Intelligence**
Sensing and Perception Sensing and interpreting
Hearing Text2Speech
Speech Speech2Text
Vision Computer Vision
Language NLP
Temperature, Texture etc (Touch) Haptics
Knowledge Representation Knowledge Representation
Memory Memory
Learning Machine Learning
Reasoning/Calculation/Probabilistic Calculator/Programming/Decision
Assessment and Judgment trees/Bayesian
Control of Action Robotics
Motor nerves Motion control
*Cognitive Neuro-science (Gazzaniga, Ivry, Mangun) ** Artificial Intelligence: A modern approach (Russell and Norvig)
DE-CONSTRUCTING INTELLIGENCE
NATURAL LANGUAGE
to enable it to communicate successfully in English
PROCESSING
KNOWLEDGE REPRESENTATION knowledge representation to store what it knows or hears
to use the stored information to answer questions and to

AUTOMATED REASONING
draw new conclusions
machine learning to adapt to new circumstances and to
MACHINE LEARNING
detect and extrapolate patterns.
COMPUTER VISION to perceive objects
ROBOTICS to manipulate objects and move about

Source: Artificial Intelligence: A modern approach (Russell and Norvig)
MACHINE LEARNING
Learning
Learning Humans Machines
Training What is right and what is wrong Classified (labeled) text/images
What works or does not work
Training More “solved examples” the better Larger learning dataset > Better accuracy
Testing Solve unseen problems to assess learning Test algorithm on unseen data
Decompose and Apply past associations between components Apply past associations between components
Aggregate Text/Image of image/text to current image/text of image/text to current image/text
Matching/Classification Recall from memory the closest association to Recall from memory the closest association to
instance at hand instance at hand
Matching/Classification More clues to memory increase recall ability More features increase accuracy
Learning Dimensions in Machine Learning

Intuition
• Common Sense
Theory
• Math
Execution
• Programming
Application
• Domain knowledge
Interpretation
• Algorithm
Machine
Supervised Learning
Learning Unsupervised Learning
Reinforcement Learning
Supervised Learning
Dimensionality Association Time Series

Classification Regression
Reduction Rule Analysis
Anomaly
Prediction
Detection
Classification Algorithms
Naïve Logistic Decision Random

Bayes Regression Trees Forest
Neural
SVM K-NN
Networks
Unsupervised Learning
Learning “what normally happens”
No output
Clustering: Grouping similar instances
Other applications: Summarization, Association Analysis
Example applications
Customer segmentation in CRM
Image compression: Color quantization
Reinforcement Learning
Topics:
Policies: what actions should an agent take in a particular situation
Utility estimation: how good is a state (used by policy)
No supervised output but delayed reward

Credit assignment problem (what was responsible for the outcome)
Applications:
Game playing
Robot in a maze
Multiple agents, partial observability, ...
Classification
Example: Credit scoring
Differentiating between low-

risk and high-risk customers
from their income and savings
Discriminant: IF income > θ1 AND savings > θ2

THEN low-risk
ELSE high-risk
Classification: Applications
Binary/Categorical/Discrete Classification
Many problems can be structured as classification problems
Binary Recursion
Medical diagnosis: From symptoms to illnesses

Predict voting: From variables on facebook usage – Democrats vs Republicans
Web Advertising: Predict if a user clicks on an ad on the Internet
Finance/Accounting
Go/No Go decision modeling- Purchases, Market Entry, Credit Default
Prediction: Regression
Example: Price of a used car
x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
y = wx+w0
Approaches
Terminology: Machine Learning
Confusion
ROC curve
Matrix
N-Fold
Monte Carlo
Cross
simulation
validation
Sec 2: Azure Machine
Learning Studio
Azure ML modeling
Create account
Steps Build Model
Azure ML Studio Setup Deploy

Azure Machine Learning Studio
Common sense principles in ML problem formulation
Goal
• Often not clear or clearly articulated
• The “y”
Law of Minimum Force
• Occam’s Razor
• Tools are means to an end- fancy algorithms don’t impress, accurate results do
• Accuracy-simplicity tradeoff
Haste makes Waste
• Don’t rush in on ‘any’ data
• Seek data you need, not what you have
• Study underlying variable-target correlations
• Study summary statistics
Representativeness of Training Data
Formulating ML Problems: Data
Need for Pre-processing
Data format
Proxy variables
• Eg Clickstream data
• Eg rental values for socio-economic classification
Calculated variables
• Eg Difference vs Ratio
Features
• Use features that generalize across contexts
• Eg industry-standard fin ratios
05
Formulating DL Problems: Data
How about forgetting some data?

• Unlearn rows
• Unlearn columns
Feature processing
• Missing data
• Interaction terms
• Transformations
• Domain-specific features
• Variable-specific features
Updating data sources, model
Labeled data
• Train using data for all labels
• …and labels must be accurate. GIGO
Creating value from data
Experiment
• The first may not be the best.
• Threshold decisions.
• Multiple small modeling decisions
• Iterations
ML solves not just the same problem but similar problem

• Extracting essence of decision vs filtering
Model choice
Model integration
• Ensemble
7
Sec 3: Kaggle
Separate presentation
Sec 4: Support Vector
Machines
Classification Tasks
•Learning Task
–Given: Expression profiles of leukemia patients and healthy

persons.
–Compute: A model distinguishing if a person has leukemia

from expression data.
•Classification Task
–Given: Expression profile of a new patient + a learned model
–Determine: If a patient has leukemia or not.

Tennis Example
= do not play tennis

Temperature = play tennis
Humidity
Introduction: Linear Separators
Binary classification can be viewed as the task of separating classes in
feature space
wTx + b = 0
wTx + b > 0
wTx + b < 0
f(x) = sign(wTx + b)
Linear Separators
Which of the linear separators is optimal?
• All hyperplanes in Rd are

parameterized by a vector (w) and a
constant b.
• Can be expressed as w•x+b=0
(remember the equation for a
hyperplane from algebra!)
• Our aim is to find such a hyperplane
f(x)=sign(w•x+b),
that correctly classify our data.
Selection of a Good Hyper-Plane
Objective: Select a `good' hyper- ρ
plane using only the data!

r
Intuition: assuming linear

separability
(i) Separate the data
(ii) Place hyper-plane `far' from

data
Maximizing the margin
Recall: the distance from a

point (𝑥0 , 𝑦0 ) to a line A𝑥 + ρ
𝐵𝑦 + 𝑐 = 0 is r
𝐴𝑥0 +𝐵 𝑦0 +𝑐
𝐴2 +𝐵2
Distance from example xi to

the separator is
𝑤 𝑇 𝑥𝑖 + 𝑏
r= 𝑤
Classification Margin
Examples closest to the

r
hyperplane are support
vectors.
Margin ρ of the separator
is the distance between
support vectors.
Maximum Margin Classification
Maximizing the margin

is good according to
intuition and
Implies that only

support vectors
matter; other training
examples are
ignorable.
Maximum Margin Classification
Maximizing the margin is

good according to
intuition and
Implies that only support

vectors matter; other
training examples are
ignorable.
Soft Margin Classification
What if the training set is

not linearly separable?
ξi
Slack variables ξi can be
ξi
added to allow
misclassification of
difficult or noisy examples,
resulting margin called
soft.
Non-linear SVMs
Datasets that are linearly separable with some

x
noise work out great:
0
0 x
But what are we going to do ifx2 the dataset is just too hard?
0 x
How about… mapping data to a higher-dimensional space:

Non-linear SVMs: Feature spaces
General Idea: The original feature space can always be mapped to

some higher-dimensional feature space where the training set is
separable.
Φ: x → φ(x)
Nonlinear SVM - Overview
SVM locates a separating hyperplane in the feature space and

classify points in that space
It does not need to represent the space explicitly, simply by

defining a kernel function
The kernel function plays the role of the dot product in the feature
space.
Properties of SVM
Flexibility in choosing a similarity function

Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating
hyperplane
Ability to handle large feature spaces
complexity does not depend on the dimensionality of the
feature space
Overfitting can be controlled by soft margin approach
Nice math property: a simple convex optimization problem which
is guaranteed to converge to a single global solution
Feature Selection
Weakness of SVM
It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically

decrease the performance
It only considers two classes: How to do multi-class classification with SVM?
- Answer:
1) With output arity m, learn m SVM’s
SVM 1 learns “Output==1” vs “Output != 1”
SVM 2 learns “Output==2” vs “Output != 2”
SVM m learns “Output==m” vs “Output != m”
2)To predict the output for a new input, just predict with each SVM and
find out which one puts the prediction the furthest into the positive region.
Sec 5: Case Study
Hands-on Exercise on Azure Machine Learning studio
Overfitting
Thanks!
Contact us:
Sandeep Khurana
Founder
Quant-Leap Consulting
Hyderabad
QLCLLP@gmail.com
www.quantleapconsulting.com

Fintech ML Using Azure

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fintech ML Using Azure

Uploaded by

Copyright:

Available Formats

Hands-On Machine Learning

for Finance Professionals

What is machine learning, a brief history

What are different machine learning models?

What do they do? When to use them?

How to identify Features and how to tune Parameters?

Machine Learning case study using Azure ML Studio.

(c) QuantLeap Consulting

• Credit Risk classification

KNOWLEDGE REPRESENTATION knowledge representation to store what it knows or hears

to use the stored information to answer questions and to

COMPUTER VISION to perceive objects

ROBOTICS to manipulate objects and move about

Learning Dimensions in Machine Learning

Dimensionality Association Time Series

Naïve Logistic Decision Random

No supervised output but delayed reward

Example: Credit scoring

Differentiating between low-

Discriminant: IF income > θ1 AND savings > θ2

Medical diagnosis: From symptoms to illnesses

Example: Price of a used car

Terminology: Machine Learning

Azure ML Studio Setup Deploy

Need for Pre-processing

How about forgetting some data?

ML solves not just the same problem but similar problem

–Given: Expression profiles of leukemia patients and healthy

–Compute: A model distinguishing if a person has leukemia

–Given: Expression profile of a new patient + a learned model

–Determine: If a patient has leukemia or not.

= do not play tennis

• All hyperplanes in Rd are

Objective: Select a `good' hyper- ρ

plane using only the data!

Intuition: assuming linear

(i) Separate the data

(ii) Place hyper-plane `far' from

Recall: the distance from a

Distance from example xi to

Examples closest to the

Maximizing the margin

Implies that only

Maximizing the margin is

Implies that only support

What if the training set is

Datasets that are linearly separable with some

How about… mapping data to a higher-dimensional space:

General Idea: The original feature space can always be mapped to

SVM locates a separating hyperplane in the feature space and

It does not need to represent the space explicitly, simply by

Flexibility in choosing a similarity function

- A relatively small number of mislabeled examples can dramatically

It only considers two classes: How to do multi-class classification with SVM?

1) With output arity m, learn m SVM’s

SVM 1 learns “Output==1” vs “Output != 1”

SVM 2 learns “Output==2” vs “Output != 2”

SVM m learns “Output==m” vs “Output != m”

You might also like