You are on page 1of 40

Data Science Analytics

Discussion
Anthony J. Scriffignano, Ph.D.
SVP/Chief Data Scientist
11-April, 2017
We need to seriously think
about the implications of
trivial inference from data

The Jetsons image is licensed under Creative Commons


2 2
Thinking About the Environment
Whats changing?
Whats New?
When do we have enough?

3
A lifetime journey in data

Reading about events


that happened in the
past, listening to
Reading about things in the now
people who are not
cant tell who is communicating with
present.
Information was created and whom does anybody really know
shared in what is going on?
Courtesy Get Smart (19651970)

4
The burning platform
Too much data?
Different objectives?

5
When is enough enough?
DATA IN DISCOVERABLE EXISTING BUT
HAND DATA INACCESSIBLE DATA

Identify the Scenario Assess Decision Elasticity


Relative size DispositiveThreshold Bias
Key question Estimate Opportunity cost
Triangulate

@Scriffignano1 6
Challenging assumptions: Taking a More Sophisticated
View of Big Data

VOLUME VELOCITY VERACITY VARIETY VALUE

Judging Triangulation, Entity


Data sensing Simultaneity of Non-regressive extraction, Disambiguation
truth methods discovery

Data that exists Opportunity


Curation, Data Time of Malfeasance
but is cost of
at rest vs. data curation vs. innovation,
unavailable, curation, single
in motion time of creation regulatory
unstructured use data

All true data is More data is Value


More is not
The myth of not necessarily being deteriorates at
necessarily
real time data simultaneously disregarded an alarming
better
true than used rate

7
Problem Formulation Examples
Bringing Data Science into the room
Black cat problems
Emerging relationships
Quantum observation / thinking

8
Emerging Digital Technologies: Behaviors
Implications
Emerging Trends Description Opportunities Risks
Adoption of Semantic Detecting conversations Falling behind bad
Natural Language Disambiguation: of interest in the ocean guys, who may use
Processing automated parsing and of social chatter increasingly
analysis of unstructured sophisticated
and structured text, and Foundational capability cloaking
spoken language or, in a for detecting techniques in digital
more advanced state, relationships between communications
across multiple languages entities of interest and
their behaviors
Export of Fintech Social decision-making Supports regulatory New malfeasance,
innovation into efforts e.g. privacy especially with
other areas Blockchain and related alternate digital
technologies: promotes Exporting techniques identity
transparency and security outside of finance
of transactions Dispositive
threshold
Emergence of AI Relegation of decision Significantly faster, Cybersecurity is
in place of human making to algorithms, e.g. cheaper and more even more critical
decision makers in trading, but potentially in efficient outcomes Economically
healthcare, transportation, optimal and socially
energy, etc. Drives scale /
consistency beneficial?
Perpetuating
heuristic error
Focusing on smaller parts of the problem space

EPISTEMOLOGY
CHALLENGES
Who is speaking?
The John Smith problem About whom?
How do they feel?
In what context?
The Ann Taylor problem

PROGRESSIVE DECOMPOSITION
The Sybil problem
Caroline Smith
Caroline M Smith University of Iowa
302 N Liberty St. 21 E Market St.
Albion, IA Iowa City, IA
Addr Type: Residential Addr. Type: Commercial

Carrie Smith
Tenderheart Daycare
2635 Cleveland Dr.
Adel, IA
Addr. Type: Commercial
Carrie Smith
Monolith Corporation
1716 Locust St.
Des Moines, IA
Dispositive threshold
Addr. Type: Commercial

10
Challenging when the prior art needs to evolve.

11
Bricolage its a good thing.

Epistemology

THE CHALLENGE
Who is
speaking?
The John Smith problem
What would it look like?

About
The Ann Taylor problem
whom?

The Sybil problem How do


Caroline M Smith
Caroline Smith
University of Iowa they feel?
302 N Liberty St. 21 E Market St.
Albion, IA Iowa City, IA
Addr Type: Residential Addr. Type: Commercial

Carrie Smith
Tenderheart Daycare
2635 Cleveland Dr.
Adel, IA In what
Addr. Type: Commercial
Carrie Smith
Monolith Corporation
context?
1716 Locust St.
Des Moines, IA
Addr. Type: Commercial

12
End state vision

Simple mean of sentiment

Weighted mean of sentiment

Standard deviation of sentiment

13
New Answers

Are there identifiable


modes? Do they change
when speaking about
their own enterprise?

Is the leadership leading


or lagging mean
sentiment? How does this
change over time? How
do leaders influence?

14
To avoid getting distracted by all things social, the
science involves continuous evolution and focus on
specific use cases that drive value
USE CASES CONFOUNDING DERIVING EMPIRICAL MEASURES
CHARACTERISTICS THAT INFORM USE CASES
Sarcasm
ABC corporation is a wonderful
Entity Sentiment Context /
Extraction Attribution Behavior company, if you dont do business
with them.

Neologism
Be sure to like us on FaceBook and
use #shallow when you Tweet. Feedback

Grammar variations
FBI is Hunting Terrorists With
Explosives.

Punctuation / Intrusion of foreign language


Hi mom! vs. Hi, mom?

Intentional mis-spelling
RU There?

D&B proprietary information, do not distribute or copy without permission

15
Watch this space
Inter- and intra-language correlation : deciding when things mean the same thing
Inter- and intra-language transformation : transforming inference among languages
Changing behavior to attract/obviate grapheme analysis : reacting to changing language
Emerging metalanguage (e.g. textspeak) : reacting to language about language
A language of things : reacting to new languages used by automation
Using language to hide language : reacting to attempts to obscure via language
Unicode is not universal : understanding the limitations of automation

16
Emerging Digital Technologies: Relationships
Implications
Emerging Trends Description Opportunities Risks

Connected Discovery of Complex Discovering Changing the


Space Counterparty Relationships: relationships between environment by
construction of an n-dimensional entities of interest and measuring it (e.g.
space where dyadic relationships their behavioral fraud) and
are established into a connected patterns creating smarter
graph malfeasance
Confirmation bias
based on available
data
Large Scale Intelligence models which rely on Extending discovery of Garbage in,
Machine computational intelligence: relationships beyond garbage out
Learning automated learning, classification rule-based approaches
Exporting human
and storage New valuable insights, bias into training
esp. counter-intuitive sets
and paradoxical
Digital Scientific concepts become Discovery of patterns A powerful
Everything computable as data is digitized in and solutions that are weapon in hands
new and more nuanced ways theoretically of a bad guy
(e.g. digital X-ray, fitness invisible
Laws will always
monitors, autonomous vehicles) lag evolution
The Black Cat Problem

Dealing with Black Cat problems


Signals
Systemic measures
Anomaly detection
Isotropism
Character / quality measures
Data sensing
Triggers
but also some very ominous

18
Visualizing extremely complex, changing relationships
addresses questions never before feasible
Dyadic relationships across Asking new questions never before feasible
multiple perspectives
Observing key
measures over time Some examples
Understanding Signals derived from changes to business information
Discovering and investigating clusters of unusual behavior
Exploring the impact of new regulation
Applying standard measures to a highly dynamic environment
Exploring the impact of new market forces
Blending with
Studying the real or potential impact of supply chain interruptions
similarly Investigating emerging capabilities (e.g. reputational risk)
constructed graphs
Events
News What If
Internal Social
Time scenarios
Data signals
Market
Data Abstracting dimensions

19
New Thinking: Relationships
Partner 1
Customer 1 Cloud
Cloud

Some examples
Understanding Signals derived Ink Blot, City 1
from changes to business
information
Gov Agency
Discovering and investigating DUNS D1.1, D1.2,
Cloud
clusters of unusual behavior D1.3,D1.n
Exploring the impact of new
regulation (e.g. privacy)
Customer 2 Understanding intra-regional
Cloud
opportunities (e.g. cross-border)
Exploring the impact of new
market forces (e.g. Brexit)
Studying the real or potential
impact of supply chain
Partner 2
interruptions (e.g. disasters)
Cloud
Investigating emerging
capabilities (e.g. reputational
risk) Ink Blot, City 2
DUNS
D1.1, D1.2,
D1.3,D1.n

20
Rhombohedral distortion: is that a thing?

Geofencing is often There are slight differences between the


based on squares / abstraction and the real world
rectangles and/or radius
Latitude / Longitude assume an assumptions
abstracted spherical (non-Euclidian)
pretext For certain types of analytics,
correcting for the distortion
yields more accurate results

21
Understanding language as a changing phenomenon
leads to greater computational capability with
regard to changing behavior
TRADITIONAL VIEW

Money laundering
Bust- out
Shell Company
MORE NUANCED VIEW
Corporate Theft Identify
Trade Rings Cybersecurity - inside
out/outside in
Data sovereignty
Permissible use
Discovering prior behavior
vs. emerging behavior in
extremely large sets of data

@Scriffignano1 22
Manifesting new relationships in our behavior
Emergency preparedness
Border protection

Data sovereignty
National security: cyber-everything
Data Privacy
Balance of Trade
Cybersecurity Transferring data across borders

Integrated global value chain


New data assets

Expressed Consent
Compliance with industry
Intellectual Property standards and best practices

@Scriffignano1 23
The Evoving Mindset of Data Science
Where do we need to focus
Future challenges

24
Emerging Digital Technologies: Things
Implications
Emerging Trends Description Opportunities Risks
Changing Nature of Neuromorphic computing: Solutions to previously Bad guys get access
Computing Biologically-inspired techniques intractable problems to unprecedented
aiming to mimic human thought enabled by increased computing power
parallelism, and ability and techniques
Quantum computing: Non- to compute on non-
Boolean physical and algorithmic binary concepts Need to rethink
devices cybersecurity
Computing as a commodity:
ubiquitous access to unlimited
computational power on a pay-as-
you-go model
Analytics / Data Science as a
Service: Attempts to deliver
capabilities down-market or in a
more scalable way
Internet of Things Objects connected on a network, Gleaning previously Cybersecurity and
sending and receiving data, unobserved patterns of privacy challenges
distinctions by use (e.g. Industrial) behaviors and
relationships Authentication /
validation
Quantum, Neuromorphic Computing
Technology to go beyond digital / Boolean computation
Neural Networks integrated computing models inspired by biological analogs
Quantum Computing Continuous, non-Boolean computational engines and algorithms
Substitution of Addition/Multiplication/AND & OR with Translation/Rotation & Expectation
Potential applications:
Quicker non-deterministic searches from Quantum Algorithms
Reconstruction of implied data/metadata models
Quantum Pattern Discovery (e.g. anisotropism, neosophism)

26
Quantum Computing Information stored in
qubits; qubits can be in
base states (|0>, |1>) or
any superposition or
entanglement (inner
product) of these.
Each additional qubit adds
infinitely many more
superpositions
Calculations are
performed by unitary
transformations
essentially linear
translations or rotations
on the qubit states
27
Food for Thought

Traditional
thinking Quantum
thinking

Deterministic
Add/change/delete/search Non-Deterministic approach to
Workflow monotonicity
Value Chain dynamics Probabilistic workflow / heuristics
Regressive analysis Value Chain order-n understanding
Non-Regressive analysis, especially with
new data/new behavior

28
Machine Learning / Learning from Machines

29
What do you have to believe?

Can you vs. may you use information

unlearning

Veracity adjudication

Provinance / decision synthesis

Recreating prior conditions for forensics


Watch this space

31
Reflecting on the Data Journey of Large Multinational Organizations

From early adoption to From ingestion


integrated value chain to curation

Technology Process

People Mindset

From analytic focus From focus on data


to holistic focus to focus on meaning

32
The Evolving Leadership Mindset for Data-Inspired Organizations

Awareness of a skills gap


Recognition of significant risk and opportunity
Initial Some degree of feeling overwhelmed by the new
Focus Silos of evolution, internal competition due to new roles/responsibilities

Focus on evolving the existing workforce and hiring new skills


As landscape rapidly changes, skills required shift from tools to competencies
Emerging
More focus on governance, provenance, security, malfeasance
Mindset More educated customers placing higher demands on the organization

Breaking down silos of data and innovation


Modern Ability to measure and respond in a truly agile fashion
Reflective leaders who constantly re-evaluate and constantly learn
Mindset
Analytical, data-based decisions evolve through new methods, learning

33
Reflecting on the Journey -- Things to Consider
Too Much Data?
Start with a problem or question, not a tool or dataset
Understand the going-in assumptions
Continuously evaluate how the environment is changing

New Types of Data


Select methods (especially non-regressive) carefully
Use new methods and visualizations for a reason, not for an
expedient
Be aware of Black Cat problems

The Burning Platform


Continuously evaluate new skills and capabilities
Continuously evaluate new ways of knowing, breaking down
problems into smaller pieces, reducing complexity
The decision to do nothing is a decision to sink further behind

34
Some examples of new methods to consider:
Evaluation Epistemologies

Heuristic Evaluation computer algorithms based on empirical observation and data gathering to watch how a group of similarly
instructed, similarly incented people perform a task.
Prescriptive Analytics computer algorithms used in processing real time transactional data to anticipate or refine the understanding of an
event.
Identify fraud understanding instances where an individual or organization is acting in the stead of another in order to realize financial or other gains.
Insurance claims fraud understanding instances where individuals or organizations misstate the time, detail, or other specifics of an event or condition in order to make claims against
an insurance instrument.
Progressive Decomposition systematically and empirically reducing problems with high order complexity into two or more lower-order
complexity problems recursively.

35
Some examples of new methods to consider:
Unstructured Data

Thematic coding qualitative method for synthesizing text information in an empirically consistent way to derive emergent themes.
Entity Extraction -- A treatment for data with unclear or non-existent ontologies whereby discoverable attributes can be sub-aggregated
from the corpus of data.
Semantic Disambiguation Transforming data which has unclear or ambiguous content into dispositive results for a particular use.
Ortholinguistic Disambiguation Transforming data which has mixed writing systems, spellings, or other representations into a common
representation.
Recursive Discovery A learning process for ingesting data (typically from the web) while managing the adjudication of truth, retention
of provenance, and permissible use.
Data Sensing A collection of methods to react to data in motion to ensure ingestion when appropriate.
Geospatial Inference Resolving data or metadata to a physical context (typically Cartesian)

36
Some examples of new methods to consider:
Non-Regressive Data Science Methods

Neural Networks (do not need to define target and predictor variables as required in regression models)
Machine Learning
Binary Classification Model primarily concerned with dyadic relationships
Multiclass Classification Model focused on complex interconnectedness and dependencies
Deep Learning (a branch of Machine Learning moving machine learning closer to its original goal: artificial intelligence for
text mining, speech and image recognition, etc.)
Leading Deep Learning software tools (Theano, Torch, Pylearn2, Blocks, Caffe, etc.) focus on recursive processes to systematically
refine automated inference
Deep Believe Networks (a generative graphical model, composed of multiple layers of latent variables: some applications for
acoustic modeling and visual data classification) focus on adjudication of veracity
Quantum algorithms model the behavior of not yet readily available multi-state machines that relax the binary concepts of
0 and 1. These algorithms use vast amounts of memory to simulate quantum machines the result is a narrow set of
capabilities that approach simultaneity in decision-making.
Cognitive Computing learning methods focused on curation and synthesis, leaving the conclusions to human agents.

37
Innovating for good

Inspecting the flood footprint


Looking at known and derived data Making data computable
Using existing and derived
data
Assessing multiple
hypotheses
Strict empirical process
Modular, reusable tools
Training
Detecting cars
Detecting
radiation
inference
Spatial

Detecting
non- cars

Detecting
Roads
Uncovering Truth and Meaning 38 38
from Data
Totally New Questions and Challenges

39
Thank You
Anthony Scriffignano, Ph.D., SVP / Chief Data Scientist
scriffignanoa@dnb.com

@SCRIFFIGNANO1

You might also like