Deep Learning - Recommendation Systems

Recommender Systems &
Embeddings
Charles Ollion - Olivier Grisel
1/7
Outline
Recommender Systems
2/7
Outline
Recommender Systems
Embeddings
3/7
Outline
Recommender Systems
Embeddings
Architecture and Regularization

4/7
Recommender Systems
5/7
Recommender Systems
Recommend contents and products
Movies on Netflix and YouTube, weekly playlist and related Artists

on Spotify, books on Amazon, related apps on app stores, "Who to
Follow" on twitter...
6/7
Recommender Systems

Prioritized social media status updates

7/7
Recommender Systems

Personnalized search engine results

8/7
Recommender Systems

Personnalized search engine results
Personnalized ads and RTB

9/7
RecSys 101
Content-based vs Collaborative Filtering (CF)
Content-based: user metadata (gender, age, location...) and item

metadata (year, genre, director, actors)
10 / 7
RecSys 101

Collaborative Filtering: passed user/item interactions: stars, plays,

likes, clicks
11 / 7
RecSys 101

Collaborative Filtering: passed user/item interactions: stars, plays,

likes, clicks
Hybrid systems: CF + metadata to mitigate the cold-start problem

12 / 7
Explicit vs Implicit Feedback

Explicit: positive and negative feedback
Examples: review stars and votes
Regression metrics: Mean Square Error, Mean

Absolute Error...
13 / 7

Explicit: positive and negative feedback
Examples: review stars and votes
Regression metrics: Mean Square Error, Mean

Absolute Error...
Implicit: positive feedback only
Examples: page views, plays, comments...
Ranking metrics: ROC AUC, precision at rank, NDCG...

14 / 7

Implicit feedback much more abundant than explicit feedback
15 / 7

Explicit feedback does not always reflect actual user behaviors
Self-declared independent movie enthusiast but watch a majority

of blockblusters
16 / 7


of blockblusters
Implicit feedback can be negative
Page view with very short dwell time

Click on "next" button
17 / 7


of blockblusters
Implicit feedback can be negative
Page view with very short dwell time

Click on "next" button
Implicit (and Explicit) feedback distribution impacted by UI/UX

changes and the RecSys deployment itself.
18 / 7
Matrix Factorization for CF

ratings user item
T 2 2 2
L(U , V ) = ||ri,j − u ⋅ v j || + λ(||U || + ||V || )
i 2 2 2
∑
(i,j)∈D
Train U and V on observed ratings data ri,j
Use T
V to find missing entries in sparse rating data matrix
Use U T V to find missing entries in sparse rating data matrix R
19 / 7
Embeddings
20 / 7
Symbolic variable
Text: characters, words, bigrams...
Recommender Systems: item ids, user ids
Any categorical descriptor: tags, movie genres, visited

URLs, skills on a resume, product cat egories...
21 / 7
Symbolic variable

22 / 7
Symbolic variable

23 / 7
Symbolic variable

Notation:
Symbol s in vocabulary V
24 / 7
One-hot representation
|V |
onehot('salad') = [0, 0, 1, . . . , 0] ∈ {0, 1}
0 pig carrot salad door roof cake

25 / 7
One-hot representation
|V |
onehot('salad') = [0, 0, 1, . . . , 0] ∈ {0, 1}
0 pig carrot salad door roof cake
Sparse, discrete, large dimension |V |

Each axis has a meaning
Symbols are equidistant from each other
euclidean distance = √‾
euclidean distance = √2
‾
26 / 7
Embedding
d
embed d ing('salad') = [3.28, −0.45, . . . 7.11] ∈ ℝ
27 / 7
Embedding
d
Continuous and dense

Can represent a huge vocabulary in low dimension, typically:
d ∈ {16, 32, . . . , 4096}
Axis have no meaning a priori

Embedding metric can capture semantic distance
28 / 7
Embedding
d
Continuous and dense

Can represent a huge vocabulary in low dimension, typically:
d ∈ {16, 32, . . . , 4096}
Axis have no meaning a priori

Embedding metric can capture semantic distance
Neural Networks compute transformations on

continuous vectors
29 / 7
Implementation with Keras

Size of vocabulary n = |V | , size of embedding d
# input: batch of integers

Embedding(output_dim=d, input_dim=n, input_length=1)
# output: batch of float vectors
30 / 7

Size of vocabulary n = |V | , size of embedding d
# input: batch of integers

Embedding(output_dim=d, input_dim=n, input_length=1)
# output: batch of float vectors
Equivalent to one-hot encoding multiplied by a weight matrix

W ∈ ℝ
V ×K
:
embed d ing(x) = onehot(x). W

31 / 7
Distance and similarity in

Embedding space
Euclidean distance
d (x, y) = ||x − y||2
Simple with good properties

Dependent on norm
(embeddings usually
unconstrained)
32 / 7

Embedding space
Euclidean distance Cosine similarity
x⋅y
d (x, y) = ||x − y||2 cosine(x, y) =
||x||⋅||y||
Simple with good properties Angle between points,

Dependent on norm regardless of norm
(embeddings usually cosine(x, y) ∈ (−1, 1)
unconstrained) Expected cosine similarity of
random pairs of vectors is 0
33 / 7

Embedding space
If x and y both have unit norms:
2
||x − y|| = 2 ⋅ (1 − cosine(x, y))
2
34 / 7

Embedding space
2
||x − y|| = 2 ⋅ (1 − cosine(x, y))
2
or alternatively:
2
||x − y||
2
cosine(x, y) = 1 −
2
35 / 7

Embedding space
2
||x − y|| = 2 ⋅ (1 − cosine(x, y))
2
or alternatively:
2
||x − y||
2
cosine(x, y) = 1 −
2
Alternatively, dot product (unnormalized) is used in practice as a

pseudo similarity
36 / 7
Visualizing Embeddings
Visualizing requires a projection in 2 or 3 dimensions
Objective: visualize which embedded symbols are similar
37 / 7
PCA
Limited by linear projection, embeddings usually have complex

high dimensional structure
38 / 7
PCA
Limited by linear projection, embeddings usually have complex

high dimensional structure
t-SNE
Visualizing data using t-SNE, L van der Maaten, G Hinton, The Journal of Machine
Learning Research, 2008
39 / 7
t-Distributed Stochastic
Neighbor Embedding
Unsupervised, low-dimension, non-linear projection
Optimized to keep relative distance between nearest neighbors
40 / 7
t-Distributed Stochastic
Neighbor Embedding
Unsupervised, low-dimension, non-linear projection
Optimized to keep relative distance between nearest neighbors
t-SNE projection is non deterministic (depends

on initialization)
Critical parameter: perplexity, usually set to 20, 30

See http://distill.pub/2016/misread-tsne/
41 / 7
Example word vectors

42 / 7
Visualizing Mnist
43 / 7
Architecture and
Regularization
44 / 7
RecSys with Explicit Feedback
user
item
45 / 7
Deep RecSys Architecture
user
item
46 / 7
Deep RecSys with metadata

user metadata
user item metadata
item
47 / 7
Implicit Feedback: Triplet loss

item+
user
item-
48 / 7
Deep Triplet Networks

item+
user
item-
49 / 7
Training a Triplet Model

Gather a set of positive pairs user i and item j
While model has not converged:

50 / 7

Shuffle the set of pairs (i, j)

51 / 7

For each (i, j) :

Sample item k uniformly at random
52 / 7

For each (i, j) :

Call item k a negative item for user i
53 / 7

For each (i, j) :

Call item k a negative item for user i
Train model on triplet (i, j, k)
54 / 7
Deep Neural Networks for YouTube Recommendations

https://research.google.com/pubs/pub45530.html
55 / 7
Regularization
Size of the embeddings
56 / 7
Regularization
Depth of the network

57 / 7
Regularization
L2 penalty on embeddings
58 / 7
Regularization
L2 penalty on embeddings
Dropout
Randomly set activations to 0 with probability p

Bernoulli mask sampled for a forward pass / backward pass pair
Typically only enabled at training time
59 / 7
Dropout
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et al.,
Journal of Machine Learning Research 2014
60 / 7
Dropout
Interpretation
Reduces the network dependency to individual neurons

More redundant representation of data
Ensemble interpretation
Equivalent to training a large ensemble of shared-parameters,

binary-masked models
Each model is only trained on a single data point
61 / 7
Dropout
At test time, multiply weights by p to keep same level of activation
Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et al.,
62 / 7
Overfitting Noise
63 / 7
A bit of Dropout
64 / 7
Too much: Underfitting

65 / 7
model = Sequential()
model.add(Dense(hidden_size, input_shape, activation='relu'))
model.add(Dropout(p=0.5))
model.add(Dense(hidden_size, activation='relu'))
model.add(Dropout(p=0.5))
model.add(Dense(output_size, activation='softmax'))
66 / 7
Ethical Considerations of
Recommender Systems
67 / 7
Ethical Considerations
Amplification of existing discriminatory and
unfair behaviors / bias
Example: gender bias in ad clicks (fashion / jobs)

Using the firstname as a predictive feature
68 / 7
Ethical Considerations
Amplification of existing discriminatory and
unfair behaviors / bias
Example: gender bias in ad clicks (fashion / jobs)

Using the firstname as a predictive feature
Amplification of the filter bubble and opinion

polarization
People tend to unfollow people they don't agree with

Ranking / filtering systems can further amplify this issue
69 / 7
Call to action
Designing Ethical Recommender Systems
Wise modeling choices (e.g. use of "firstname" as feature)

Conduct internal audits to detect fairness issues
Learning representations that actively enforce fairness?
70 / 7
Call to action

Transparency
Educate decision makers and the general public

How to allow users to assess fairness by themselves?
How to allow for independent audits while respecting the
privacy of users?
71 / 7
Call to action

Transparency
Educate decision makers and the general public

How to allow users to assess fairness by themselves?
How to allow for independent audits while respecting the
privacy of users?
Active Area of Research 72 / 7
Lab 2: Room C017 and F900 in

15min!

Deep Learning - Recommendation Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning - Recommendation Systems

Uploaded by

Copyright:

Available Formats

Recommender Systems &

Architecture and Regularization

Movies on Netﬂix and YouTube, weekly playlist and related Artists

Movies on Netﬂix and YouTube, weekly playlist and related Artists

Prioritized social media status updates

Movies on Netﬂix and YouTube, weekly playlist and related Artists

Prioritized social media status updates

Personnalized search engine results

Movies on Netﬂix and YouTube, weekly playlist and related Artists

Prioritized social media status updates

Personnalized search engine results

Personnalized ads and RTB

Content-based: user metadata (gender, age, location...) and item

Content-based: user metadata (gender, age, location...) and item

Collaborative Filtering: passed user/item interactions: stars, plays,

Content-based: user metadata (gender, age, location...) and item

Collaborative Filtering: passed user/item interactions: stars, plays,

Hybrid systems: CF + metadata to mitigate the cold-start problem

Explicit vs Implicit Feedback

Examples: review stars and votes

Regression metrics: Mean Square Error, Mean

Explicit vs Implicit Feedback

Examples: review stars and votes

Regression metrics: Mean Square Error, Mean

Implicit: positive feedback only

Examples: page views, plays, comments...

Ranking metrics: ROC AUC, precision at rank, NDCG...

Explicit vs Implicit Feedback

Explicit vs Implicit Feedback

Explicit feedback does not always reﬂect actual user behaviors

Self-declared independent movie enthusiast but watch a majority

Explicit vs Implicit Feedback

Explicit feedback does not always reﬂect actual user behaviors

Self-declared independent movie enthusiast but watch a majority

Implicit feedback can be negative

Page view with very short dwell time

Explicit vs Implicit Feedback

Explicit feedback does not always reﬂect actual user behaviors

Self-declared independent movie enthusiast but watch a majority

Implicit feedback can be negative

Page view with very short dwell time

Implicit (and Explicit) feedback distribution impacted by UI/UX

Matrix Factorization for CF

Train U and V on observed ratings data ri,j

Recommender Systems: item ids, user ids

Any categorical descriptor: tags, movie genres, visited

Recommender Systems: item ids, user ids

Any categorical descriptor: tags, movie genres, visited

Recommender Systems: item ids, user ids

Any categorical descriptor: tags, movie genres, visited

Recommender Systems: item ids, user ids

Any categorical descriptor: tags, movie genres, visited

0 pig carrot salad door roof cake

0 pig carrot salad door roof cake

Sparse, discrete, large dimension |V |

Continuous and dense

Axis have no meaning a priori

Continuous and dense

Axis have no meaning a priori

Neural Networks compute transformations on

Implementation with Keras

# input: batch of integers

Implementation with Keras