You are on page 1of 60

Oracle Data Mining for Text, Clustering, and Classification:

Case Study of a Recommendation Engine


Mark Hornick Pablo Tamayo
Senior Manager, Development Consulting MTS
mark.hornick@oracle.com pablo.tamayo@oracle.com
Data Mining Technologies Group
Copyright © 2009 Oracle Corporation
Introduction

Recommendation Engine at
Oracle OpenWorld Conference
2008
2009
Recommend conference sessions to attendees

Enhance session enrollment application

Use Oracle Data Mining and Oracle Data Miner UI


K-means, Naïve Bayes, Text Mining, Code Generation

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


High Level Objectives

Help attendees find relevant sessions

Maximize individual OOW experience / value

Increase session attendance

Copyright © 2009 Oracle Corporation


Technical Objectives and Constraints

Recommend 2009 sessions before


any history of who registered for any 2009 sessions
Use no session ratings data from attendees
Recommend sessions by relative preference
Recommend exhibitors and demos for attendees
Identify top N related sessions to a given session

Use an automated data mining-based solution

Copyright © 2009 Oracle Corporation


Approach

Deduction
Query refinement
Users specify what they want to retrieve

Induction
Model-based recommendation engine
Recommend sessions most relevant to attendee profile
Improve likelihood of finding sessions of interest

…enhance Schedule Builder tool with


Oracle Data Mining-generated session recommendations

Copyright © 2009 Oracle Corporation


Enrollment Application – Schedule Builder
Oracle Data Mining

Automatically sifts through data to


find hidden patterns, discover new insights,
and make predictions

Wide range of capabilities


Predict customer behavior (Classification)
Predict or estimate a value (Regression)
Group similar documents (Clustering and Text Mining)
Identify factors that determine an outcome (Attribute Importance)
Find profiles of targeted people or items (Decision Trees)
Determine important relationships and “market baskets” (Associations)
Extract higher-level text features (Feature Extraction)
Find fraud or “rare events” (Anomaly Detection)
…and others
Oracle Data Miner user interface supporting guided analytics

Copyright © 2009 Oracle Corporation


Approach – 30,000 ft.

2008 Data
- Sessions
- Attendees Model
- Attendance
Build
Apply

2009 Data
- Sessions Ranked Session
- Attendees Recommendations
for each Attendee
New attendee registers
and completes survey
Approach – 30,000 ft.

2009 Session
recommendations
filtered by user
Attendee logs into criteria
Schedule Builder

Ranked Sessions
retrieved

Ranked Session
Recommendations
for Attendees
Success Metrics

Conversion rate
% attendees who used at least 1 recommendation
Enrollment vs. actual attendance

Test Metrics
Enrichment curve
Global measure of merrit

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Conference Session
Recommendation Problem

Sessions are single use


No two are exactly alike conference to conference
Sessions have no history and no future
Don‟t know who will attend a given session
until after the session
No rating information available, attendance only
Infer preferences using higher level projections
Session themes
Attendee profiles

Copyright © 2009 Oracle Corporation


Conference Data
OOW ‟08

Sessions (1850+)
Title, abstract, track(s)

Attendees (41700+)
Survey questions, position, product usage

Attendance (206700+)
Who attended which sessions

Copyright © 2009 Oracle Corporation


Attendee Interests
from OOW‟08 registration survey
Applications Technology Industry
Business Intelligence Automotive
Fusion
Chemicals
Agile Security
Communications
BEA SOA, BPM, Web Services, App Server
Consumer Good
EBS Content Management, Collaboration, Web 2.0
Education and Research
Hyperion Predictive Analytics, Data Mining
Engineering, Construction and Real Estate
Primavera Database
Financial Services
PeopleSoft Enterprise Management Healthcare
Siebel Identity Management High Tech
JD Edwards Warehousing Industrial Manufacturing
On Demand Performance / Scalability, GRID / RAC Life Sciences
App Integration Architecture High Availability Media and Entertainment
Development and Management Middleware Natural Resources
Strategy Oil and Gas
Product Area Development Professional Services
Customer Relationship Management .Net Public Sector
Governance, Risk, and Compliance Database Retail
Master Data Management Java Travel and Transportation
Fulfillment (order management / logistics) Fusion Development
Supply Chain Management / Planning Service-Oriented Architecture
Human Capital Management Tools Development and Management
Procurement
Oracle Services
Project Management
Oracle Consulting
Business Intelligence
Oracle Support
Product Lifecycle Management
Asset Lifecycle Management
Oracle University …and others
Oracle Linux Support
Enterprise Performance Management
Oracle Advanced Customer Services
Financial Management
Oracle On Demand
Data Preparation

Sessions
Concatenate relevant columns to facilitate text mining
Attendance
Remove duplicates
Attendees
Synonyms in attribute values, e.g., state = OH and Ohio
Incomplete data, e.g., region = null
Multi-valued attributes requiring parsing,
e.g., member of user groups separated by „;‟ or „/‟
Map data columns between 2008 and 2009
e.g., Advanced customer services split between Apps and Tech
Free form columns, e.g., job title = Vice President, V.P., VP

Copyright © 2009 Oracle Corporation


Free Form Fields
Job Title Example

create table ATTENDEE09_PREP as



case when a.job_title like ''%Manager%'' then 1 else 0 end job_title_manager,
case when a.job_title like ''%President%'' then 1 else 0 end job_title_president,
case when a.job_title like ''%Vice%'„ then 1 else 0 end job_title_vice,
case when a.job_title like ''%V.P.%'„ then 1 else 0 end job_title_president,
case when a.job_title like ''%V.P.%'' then 1 else 0 end job_title_vice,

from ATTENDEE09

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Methodology

Build
Cluster classification
Sessions model to predict
clusters for
attendees, then
score attendees
for each cluster

2008 Attendees 2008 Sessions 2008 Attendees 2008 Session Clusters


(themes)
Ranked
Session
Rec‟s

.86
Vector multiply each
attendee‟s cluster
x =
scores against each .73
session‟s cluster
scores for total
.66


order ranking of
recommendations


New 2009 Attendee New 2009 Sessions
Cluster Scores Cluster Scores
Vector Vectors
New 2009 Attendees New 2009 Sessions
Model Building and Scoring Details

Cluster sessions
Concatenate all session-related text
Text Mining data preparation – create text index
Lexer with stemming
Custom “stopword” list

Copyright © 2009 Oracle Corporation


Session S291749
Title: Integrating
integrate Oracle Accounts
account Payable with Oracle
Imaging and Process Management
Track Type: TECHNOLOGY;
Content Management, Collaboration and
Web 2.0; Content Management,
Collaboration and Web 2.0

integrate
Abstract: In this session, learn how to integrate
Oracle Imaging and Process Management with your
Oracle Financials Accounts
account Payable system by
utilize Oracle Imaging and Process Management
utilizing
and Oracle BPEL Process Manager. See how a
paperless, Web-based solution was developed
develop to
automate the processing
process of invoices.
invoice

1. Perform Stemming (example)


Session S291749
Title: Integrating
integrate X Accounts
Oracle X Oracle
account Payable with X
Imaging andX Process Management
Track Type: TECHNOLOGY;
X
Content Management, Collaboration and
Web 2.0; Content Management,
X
Collaboration and Web 2.0

XX XX integrate
Abstract: In this session, learn how to integrate
X X XX
Oracle Imaging and Process Management with your
X
Oracle Financials Accounts X
account Payable system by
X X
utilize Oracle Imaging and Process Management
utilizing
X X X XX
and Oracle BPEL Process Manager. See how a
X X
paperless, Web-based solution was developed
develop to
X
automate the processing X
process of invoices.
invoice

1. Perform stemming (example)


2. Remove stopwords
Creating a Text Index, Stoplist, Lexer
Using Oracle Text

CREATE INDEX session09_txt_idx


ON session09_txt (session_txt)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS
('LEXER OOW_LEXER
STOPLIST OOW_STOPLIST');

ctx_ddl.create_preference('oow_lexer', 'BASIC_LEXER');
ctx_ddl.set_attribute('oow_lexer','index_stems','ENGLISH');
ctx_ddl.set_attribute('oow_lexer','index_text','true');

ctx_ddl.create_stoplist('oow_stoplist', 'BASIC_STOPLIST');
ctx_ddl.add_stopword('oow_stoplist', 'your'); /*…*/
ctx_ddl.add_stopword('oow_stoplist', 'oracle');

Copyright © 2009 Oracle Corporation


Session Term Scores Example

Integrate .23
Account .04
Payable .26
Imaging .62
Process .09
Management .05
Technology .17
Content .08
Collaboration .43

Copyright © 2009 Oracle Corporation


TF-IDF
(term-frequency – inverse document frequency)

Statistical measure evaluates importance of


a given word to a document in a corpus

Word importance increases proportionally to


the number of times a word appears in
document, but offset by frequency of word
in corpus

Copyright © 2009 Oracle Corporation


TF-IDF Example
One way to compute

Consider
A session, S1, title and abstract containing 100 words
Word „mining‟ appears 6 times in S1
Term frequency (TF) for „mining‟ in S1 is 6/100, or 0.06

Of 1850 sessions, say 25 contain the word „mining‟


Inverse document frequency is calculated as
ln(1850 / 25) = 4.3

TF-IDF score for „mining‟ in S1 is 0.06 * 4.3, or 0.26

Copyright © 2009 Oracle Corporation


Session Term Scores Example

Integrate .23
Specify the maximum
Account .04 number of terms
Payable .26 to represent entire corpus
Imaging .62 to represent the document
Process .09
Management .05
Technology .17
Content .08
Collaboration .43

Copyright © 2009 Oracle Corporation


Model Building and Scoring Details

Cluster sessions
Concatenate all session-related text
Text Mining data prep – create text index
Lexer with stemming
Custom stop word list
1000 max terms in corpus
30 max terms per document
Build k-Means model with 20 clusters (themes)
Score 2008 and 2009 sessions to identify theme probabilities

Copyright © 2009 Oracle Corporation


Clustering Results for 2008 Sessions

Theme (Cluster Name) ClusterID Count


INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 103
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 94
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 82
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 53
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 127
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 148
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 112
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 92
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 66
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 77
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 125
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 62
SOA-BPM-SERVER-APPLICATION-FUSION 32 121
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 33
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 95
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 52
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 76
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 80
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 80
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 69
Model Building and Scoring Details

Classify attendee interests in themes


Build Naïve Bayes model using 2008 attendees
Predict 2009 attendee interest in each of the 20 themes

New 2009 Attendees

Copyright © 2009 Oracle Corporation


Theme (Cluster Name) ClusterID Probability

“Joe the DBA” INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE


DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT
18
19
20
0.0005
0.3997
0.0002
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005
ATTEND_ID INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190
COMPANY_REVENUE DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245
DB_REL_ODB_10G
DB_REL_ODB_8I RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010
DB_REL_ODB_9I
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502
DEV_EN_11G_PREVIEW
DEV_EN_BORLAND_JBUILDER CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009
DEV_EN_ECLIPSE
DEV_EN_MS_DOT_NET
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098
DEV_EN_MS_VISUAL_STUDIO HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031
DEV_EN_ORA_APPS_EXPRES
DEV_EN_ORA_FORMS SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000
DEV_EN_ORA_JDEV_10G MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038
DEV_EN_ORA_SQL_DEV
DEV_EN_OTHER EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031
DEV_EN_OTHER_JAVA_IDE JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260
DEV_EN_SQL_EDITORS
DEV_EN_TEXT_EDITOR TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188
DEV_EN_TOAD
DEV_EN_VI
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278
GEOGRAPHIC_REGION 12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075
INDUSTRY
ORACLE_PARTNER
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994
Attributes
ORA_EBS
DB_REL_ODB_10G 1
Attendee

ORA_JDE
ORA_PS
ORA_SIEBEL DEV_EN_TEXT_EDITOR 1
PROFIT_MAGAZINE_SUBSCRIPTION
UG_MEM_APOUC DEV_EN_VI 1 Predict themes
UG_MEM_EOUC
GEOGRAPHIC_REGION Americas
UG_MEM_HEUG
UG_MEM_IOUG
UG_MEM_OAUG INDUSTRY Aerospace (clusters) for
UG_MEM_ODTUG
UG_MEM_OHUG
ORACLE_PARTNER Yes “Joe”
UG_MEM_QIUG
UG_INFO_APOUC
JOB_TITLE_DBA 1
UG_INFO_EOUC
UG_INFO_HEUG
JOB_TITLE_SENIOR 1
UG_INFO_IOUG
UG_INFO_OAUG
UG_INFO_ODTUG
UG_INFO_OHUG
UG_INFO_QIUG
UG_INFO_DO_NOT_SEND_ORA_INFO
JOB_TITLE_MANAGER
JOB_TITLE_PARTNER
JOB_TITLE_PROJECT_LEAD
JOB_TITLE_MARKETING
JOB_TITLE_PRESIDENT
JOB_TITLE_VICE
JOB_TITLE_DIRECTOR
JOB_TITLE_ARCHITECT
JOB_TITLE_ANALYST
JOB_TITLE_DBA
JOB_TITLE_DEVELOPER
JOB_TITLE_SALES
JOB_TITLE_PROD_MGR
JOB_TITLE_CHIEF_OFFICER
JOB_TITLE_CONSULTANT
JOB_TITLE_SENIOR
JOB_TITLE_STUDENT
How Does This Session Rank for Joe?

Title: Integrating Oracle Accounts Payable with Oracle


Imaging and Process Management
Track Type: TECHNOLOGY;
Content Management, Collaboration and
Web 2.0; Content Management,
Collaboration and Web 2.0

Abstract: In this session, learn how to integrate


Oracle Imaging and Process Management with your
Oracle Financials Accounts Payable system by
utilizing Oracle Imaging and Process Management
and Oracle BPEL Process Manager. See how a
paperless, Web-based solution was developed to
automate the processing of invoices.
Cluster Probabilities for Session S291749

Theme (Cluster Name) ClusterID Probability


INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0023
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.0021
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.9534
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0020
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0020
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.0027
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.0018
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.0032
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0018
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0022
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0026
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0049
SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0037
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0015
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0016
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0016
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0027
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0022
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0037
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0019
Computing this Session‟s Score
Specifically for Joe…
Session
Joe's Cluster S291749 Cluster
Theme (Cluster Name) ClusterID Probability Probability Product
INTELLIGENCE-HYPERION-ESSBASE-BUSINESS-PERFORMANCE 18 0.0005 x 0.0023 = 0.000001
DEVELOP-JAVA-DEVELOPMENT-DATABASE-EDITION 19 0.3997 x 0.0021 = 0.000848
CONTENT-2.0-COLLABORATION-WEB-MANAGEMENT 20 0.0002 x 0.9534 = 0.000216
PLM-AGILE-PRODUCT-CONTACT-CENTER 23 0.0005 x 0.0020 = 0.000001
SIEBEL-UTILITIES-CRM-CUSTOMER-INDUSTRIES 24 0.0005 x 0.0020 = 0.000001
INDUSTRIES-SERVICES-PUBLIC-SECTOR-MANUFACTURING 25 0.2190 x 0.0027 = 0.000587
DATABASE-11G-DATA-TECHNOLOGY-FEATURES 26 0.4245 x 0.0018 = 0.000780
RAC-DATABASE-MANAGER-GRID-AVAILABILITY 27 0.3010 x 0.0032 = 0.000960
ANALYTIC-INTELLIGENCE-APPLICATIONS-ANALYTICAL-BUSINESS 28 0.0502 x 0.0018 = 0.000088
CHAIN-SUPPLY-PLANNING-FULFILLMENT-SUITE 29 0.0009 x 0.0022 = 0.000002
CAPITAL-PEOPLESOFT-MANAGEMENT-TALENT-RELATIONSHIP 30 0.0098 x 0.0026 = 0.000025
HYPERION-FINANCIAL-PERFORMANCE-9-PLANNING 31 0.0031 x 0.0049 = 0.000015
SOA-BPM-SERVER-APPLICATION-FUSION 32 0.0000 x 0.0037 = 0.000000
MEETING-SIG-IOUG-DATABASE-APPLICATION 33 0.0038 x 0.0015 = 0.000006
EDWARDS-JD-ENTERPRISEONE-WORLD-A9.1 34 0.0031 x 0.0016 = 0.000005
JD-EDWARDS-ENTERPRISEONE-QUEST-OOW 35 0.0260 x 0.0016 = 0.000041
TOOLS-PEOPLESOFT-APPLICATIONS-PEOPLETOOLS-INTEGRATION 36 0.0188 x 0.0027 = 0.000051
SECURITY-COMPLIANCE-RISK-GOVERNANCE-IDENTITY 37 0.0278 x 0.0022 = 0.000062
12-SUITE-RELEASE-BUSINESS-PROCUREMENT 38 0.0075 x 0.0037 = 0.000028
OAUG-SIG-SUITE-TRANSPORTATION-USERS 39 0.0994 x 0.0019 = 0.000191
SCORE: 0.003908
Recommendation Score Query

select attend_id, session_id, score


from (
select a.attend_id, s.session_id,
sum(a.probability * s.probability) score
from SESSION_TXT09_SCORES_T20 s,
ATTENDEE09_SCORES_T20) a
where a.prediction= s.cluster_id
group by a.attend_id, s.session_id

Probability
)
order by attend_id, score desc

Session 1

Session N

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Evaluating Recommendations
Producing Training (Build) and Test Datasets

„08 Session Data


Build Test

Cross-sell / Up-sell Space:


Recommend new sessions
‟08 Attendee Data

to same attendees
Build

Build the
models
using
these datasets

Projection Mining Space:


Test the Recommend new
Test

models sessions to new


using attendees
these datasets

Typical space for recommendations:


Recommend same sessions
to new attendees
Evaluating Results:
Session Recommendation Curve
Model scores as a function of rank

Dot == Scored Session

Threshold separating high


from low confidence
recommendations

Linear behavior of
recommendations

Represents the location of “hits”


(attendee attended session)
Enrichment Curve
Running calculation where enrichment is
maximum deviation from 0

Point of maximum
Enrichment Score
Recommendation

enrichment

Represents the location


of “hits”
Attendee W1152645 NE = 2.88 Lift = 3.07 ROC = 0.79

Model-ranked sessions
Model score
Model-ranked sessions Model-ranked sessions

Attendee W1144260 NE = 1.63 Lift = 2.47 ROC = 0.71

Model-ranked sessions
Model score

Model-ranked sessions Model-ranked sessions

Attendee W1134872 NE = 1.07 Lift = 1.55 ROC = 0.51

Model-ranked sessions
Model score

Model-ranked sessions Model-ranked sessions


Global Measure of Merit
Random recommendations
obtain an enrichment score
of 1

PM Model
Random Model
P(NE)

NE
Normalized Enrichment
Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Recommending Exhibitors and Demos
Recommending Exhibitors and Demos

Use clustering model from session data


Score exhibitors and demo text against 20 themes
Use existing attendee theme scores to compute
recommendation scores for each exhibitor and demo

New 2009 Attendees 2009 Exhibitors


and Demos

Copyright © 2009 Oracle Corporation


Computing Related Sessions
Computing Related Sessions

Data preparation
Focus on tracks, tags, categories
Tokenize targeted terms from title and abstract fields
E.g., “Oracle Data Mining”  “OracleDataMining”

Cluster sessions into 200 clusters using K-Means

Multiply cluster score vectors for similarity score

Copyright © 2009 Oracle Corporation


Computing Related Sessions


Score each session
Cluster against each
Sessions theme (cluster)

2009 Sessions 2009 Themes (200 clusters)

Ranked
Related
Sessions

.95
Vector multiply each
session‟s cluster
x =
scores against all .81
other sessions‟ cluster
scores for total

.67

order ranking of


related sessions


2009 Session Other 2009 Sessions
Cluster Scores Cluster Scores
2009 Sessions Vector Vectors
2009 Themes
(200 clusters)
Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


Agenda

Recommendation engine scenario


Overview
Technical problem and data
Methodology for OOW ‟08 and „09
Evaluating recommendation quality
New features for OOW „09
Demonstration
OOW‟08 results and summary

Copyright © 2009 Oracle Corporation


OOW‟08 Recommendation Engine Results

Distinct Schedule Builder visitors: 15667


Distinct visitors signup: 3266
Distinct visitors attended: 1775

Signup conversion rate: 20.3% (3266 / 15667)


Attended conversion rate: 11.3% (1775 / 15667)

Conversion rate
percentage of attendees who used at least 1 recommendation

Copyright © 2009 Oracle Corporation


Conversion Rates in other Domains

OOW Signup Sessions 20.3

OOW Attended Sessions 11.3

Circa 2004
OOW‟08 Recommendation Engine Results
Detail

Recommendations: Selected vs.


Recommendations Signup Attended
1768 attendees (11.3%) selected exactly 1
2000
820 (5.2%) selected 2 recommendations 1500
Selected Count
678 attendees (4.3%) selected 3 or more 1000
Attended Count
500
32 attendees selected between 8 and 10
0
Exactly 1 Exactly 2 More
than 3
Actually Attended
1246 attendees (8%) attended exactly 1
382 (2.4%) attended 2 recommended sessions
147 attendees (0.9%) attended 3 or more
23 attendees attended between 5 and 9

Copyright © 2009 Oracle Corporation


Summary

Oracle Data Mining provides a robust platform for


Text Mining and building a Recommendation Engine

Oracle Data Mining with Oracle Data Miner


code generation facilitated deployment of mining solution

Recommendation evaluation techniques show the


models were able to predict sessions of interest

OOW conversion rates show that session


recommendations were perceived useful to attendees

Copyright © 2009 Oracle Corporation


For More Information

search.oracle.com

Oracle Data Mining

or
oracle.com

www.oracle.com/technology/products/bi/odm/index.html
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle‟s
products remains at the sole discretion of Oracle.

Copyright © 2009 Oracle Corporation

You might also like