You are on page 1of 23

Wel-Come

To

Faculty Development Programme


on
Recent Trends in Machine Learning

Sachin Subhash Patil


Q.I.P. Ph.D. Scholar, CSE Dept., WCE

Under the Guidance of


Dr. Shefali Pratap Sonavane
Assoc. Prof., IT Dept., WCE

1
Topics of Discussion

• Title and its Significance


• Challenges
• Research Contributions:
o Over Sampling Techniques
o Safe-Level based Synthetic Sample (SSS)
o Lowest versus Highest (LVH)
o Addressing Data Characteristics
• Outcomes
2
Title of Session
I.
H.
T.

Handling of Imbalanced Big Data Sets


Classification Using Enriched Over
Imbalanced
Data Set
Sampling
B.D.S.
Techniques
Classifier
Imprecise
Precise
Classification
Classification

3
Imbalanced Data Sets
Significance

• Class Imbalance problem


• Classifiers ignore the minority instances while
forming rule sets
• Representation of boundaries within class
structures
• Skewed data partition

5
Significance

• The numerous real-world applications are


affected:
- Software defect detection
- Threat supervision
- Medical judgment
- Web authorization

• Misclassifying rare classes can result in


heavy costs
6
Common Approaches

• At Data Level: Re-Sampling


- Oversampling
- Undersampling
- Active Sampling

• At Algorithmic Level:
- Adjusting the misclassification costs
- Adjusting the decision threshold at the
tree leaf
7
Quiz-1
 Which of these is not a type of imbalance scenario?
a) 95:5 b) 80:20
b) 75:25 d) 60:40
Ans.: b
 Data Level techniques? (multiple possibilities)
a) Cost Sensitive b) Under Sampling
c) Over Sampling d) Classifier based
Ans.: b and c
 Real-world extreme imbalanced data set example is?
a) Fraud detection b) Birth rate ratio
Ans.: a
Research Challenges

1. Analysing the structure of classes


2. Extreme class imbalance
3. Classifier’s output adjustment
4. Multi-class imbalanced classification
5. Multi-class Classifiers
6. Multi-instance imbalanced classification
9
Research Challenges

7. Regression in imbalanced scenarios


8. Semi-supervised and unsupervised
learning from imbalanced data
9. Learning from imbalanced data streams
10.Imbalanced Big Data

10
Research Challenges

1. Analysing the structure of classes:


o Predefined group based on neighbourhood:
- Safe
- Borderline
- Rare
- Outliers
o Incorporating the background knowledge about
objects into the training procedure of classifiers
o Selecting difficult samples to concentrate

11
Research Challenges

o Justifying the role of noisy/outlier samples


o Adaptive methods adjusting the size of analysed
neighborhood according to local densities
 K-NN strongly implies uniform distribution of data

2. Extreme class imbalance:


o Characterizing a reduced imbalance ratio with
decomposing
o Methods to reconstruct a potential class structure

12
Research Challenges

3. Classifier’s output adjustment:


o Analysing the characteristics of each classified
example
4. Multi-class imbalanced classification:
o Class overlapping with more than two groups
o Unclear defined borders
o Change in difficulty of each sample w.r.t. different
classes
5. Multi-class Classifiers:
o Classification without decomposition/ resampling
(using algorithm-level solutions)
13
Research Challenges

oEnhanced distance-based classifiers and density


based methods (Hellinger-distance to decision trees)
oExploring local competencies of classifiers and
creating sectional decision areas
6.Multi-instance imbalanced classification:
o Labelling the bags of objects and handling bags
 Does not imply that, the bag consists only of
objects from a given class
o Global schemes for tackling between and within
class imbalance
14
Research Challenges

o New measures for assessing the quality of training


bags and selecting the most useful ones
7. Regression in imbalanced scenarios:
o Branch of ML, yet to be explored from the
imbalanced perspective
o Developing more flexible cost-sensitive regression
solutions
o Adapting penalty as per the degree of importance
8. Semi-supervised/Un-supervised learning:
o Clustering imbalanced data with various perspectives:

15
Research Challenges

 Process of group discovery on its own or


 Method for reducing the complexity of problem or
 Solution to analysis of the minority class structure
o New indexes to measure how well the discovered groups
reflect the actual skewed distributions
o Novel unsupervised methods for assessing the
distributions and potential difficulty of unlabeled objects
o Active learning strategies to point the most difficult
objects effecting on learned decision boundaries

16
Research Challenges

9. Learning from imbalanced data streams:


o Adaptive methods for skewed real time objects
o Changes with stream progress (I.R., class status)
o New class emergence and/or fading of the old
ones
o Active learning methods reducing the cost of
supervision
o Algorithms to extract drift’s templates
(reappearing sources)

17
Research Challenges

10. Imbalanced Big Data:


o Big imbalanced data types like graphs, tensors,
video sequences, xml structures, hyperspectral
images, associations etc (eg. social networks or
computer vision)
o Designing both preprocessing and direct learning
algorithms
o Handling heterogeneous and atypical data
(Spark, Hadoop)

18
Research Challenges

o Global-scale data partitioning methods (supervising O.S.


process)
o Interpretable classifiers handling massive and skewed data
o When dealing with imbalanced big data we face one of
two possible scenarios:
i. When majority class is massive and minority class is of a
small sample size
- Directly related to the problem of extreme imbalance
ii.When imbalance is present but representatives from
both classes are abundant

19
Research Challenges
- Need of an in-depth analysis of the structure of minority
class and its examples
- Analyse the appearance of new types of examples or
changes in properties of already described types
- To address complex scenarios requiring local analysis of
each difficult region and their individual solutions

20
Research Challenges
- Need of an in-depth analysis of the structure of minority
class and its examples
- Analyse the appearance of new types of examples or
changes in properties of already described types
- To address complex scenarios requiring local analysis of
each difficult region and their individual solutions

21
Quiz-2
 Can a sample may overlapped with more than two classes?
a) false b) true
Ans.: b
 Dose the streaming data progresses skewness?
a) May be b) May not
c) Yes d) No
Ans.: a and b (mostly)

• Multi-instance imbalanced classification terms too:


a) Multi-classifier b) Labelling the bags of objects
c) Both d) None of these
Ans.: b
Research Contributions

23

You might also like