Personalized Topic-Based Tag Recommendation

2012
Personalized Topic-Based Tag Recommendation

PRESENTED BY: MOSTAFA HEIDARY
Ralf Krestel , Peter Fankhauser
Keywords: Tag recommendation Personalization Language models Topic models
Outline
Introduction
What is Folksonomy?
Topic Modeling Introduction to LDA
Using LDA for Tag Recommendation

Personalized tag recommendation using LDA Evaluation
Conclusion
What plan do I have?
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
What is Folksonomy?[1]
A folksonomy is a system of classification derived from
the practice and method of collaboratively creating and managing tags to annotate and categorize content Folksonomy, a term coined by Thomas Vander Wal, is a portmanteau of folk and taxonomy.[2] If you want to create a folksonomy and associated tag cloud, you can set up a free account in a matter of minutes at delicious or Diigo.
1- Folksonomy definition on Wikipedia: http://en.wikipedia.org/wiki/Folksonomy 2- [Van 2007] Vander Wal, T.: Folksonomy Coinage and Definition. Vanderwal.net, 2007.
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
Topic Modeling
Topic
modeling provides methods for automatically organizing, understanding, searching, and summarizing large electronic archives.
1. 2. 3.
Uncover the hidden topical patterns that pervade the collection. Annotate the documents according to those topics. Use the annotations to organize, summarize, and search the texts.
03:17
M. Heidary
Model the evolution of topics over time

"Theoretical Physics" "Neuroscience"
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
Introduction to LDA
Latent Dirichlet Allocation (LDA) is a common
method of topic modeling. The general idea:
based on the hypothesis that a person writing a document has certain topics in mind. To write about a topic then means to pick a word with a certain probability from the pool of words of that topic. A whole document can then be represented as a mixture of different topics
LDA helps to explain the similarity of data by
grouping features of this data into unobserved sets

M. Heidary
03:17
Introduction to LDA
Suppose you have the following set of sentences: I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli. What is latent Dirichlet allocation? Its a way of automatically discovering topics that these sentences contain.
M. Heidary
03:17
Introduction to LDA
1. 2.
3.
4. 5.
I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli.
Sentences 1 and 2: 100% Topic A Sentences 3 and 4: 100% Topic B Sentence 5: 60% Topic A, 40% Topic B Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, (at which point, you could interpret topic A to be about food) Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, (at which point, you could interpret topic B to be about cute animals)

M. Heidary
03:17
LDA Model
It assumes that documents are produced in the
following fashion: when writing each document:

Decide on the number of words N the document will have Choose a topic mixture for the documents. Generate each word w_i in the document by:
First picking a topic (according to the multinomial distribution that you sampled above). Using the topic to generate the word itself (according to the topics multinomial distribution). For example, if we selected the food topic, we might generate the word broccoli with 30% probability, bananas with 15% probability, and so on.
M. Heidary
03:17
LDA Model
Example:

Pick 5 to be the number of words in D Decide that D will be 1/2 about food and 1/2 about cute animals Pick the first word to come from the food topic, which then gives you the word broccoli Pick the second word to come from the cute animals topic, which gives you panda Pick the third word to come from the cute animals topic, giving you adorable Pick the fourth word to come from the food topic, giving you cherries Pick the fifth word to come from the food topic, giving you eating
So the document generated under the LDA model will be
broccoli panda adorable cherries eating (note that LDA is a bag-of-words model).
M. Heidary
03:17
Learning
M. Heidary
03:17
In reality, we only observe the documents, our goal is to infer the
underlying structure.
M. Heidary
03:17
Learning
K: some fixed number of topics. Go through each document, and randomly assign each word in the document to one of the K topics. Notice that this random assignment already gives you both topic representations of all the documents and word distributions of all the topics (albeit not very good ones). So to improve on them, for each document d
Go through each word w in d
M. Heidary
03:17
Learning
And for each topic t, compute two things: 1. p(topic t | document d) = is the probability of picking a term from topic t in the document d. 2. p(word w | topic t) = is the probability of w within topic t. in this step, were assuming that all topic
assignments except for the current word in question are correct, and then updating and Reassign w a new topic, where we choose topic t with probability p(topic t | document d) * p(word w | topic t) (this is
essentially the probability that topic t generated word w)
M. Heidary
03:17
Learning
After repeating the previous step a large number of
times, youll eventually reach a roughly steady state where your assignments are pretty good. So use these assignments to estimate the topic mixtures of each document. and the words associated to each topic
M. Heidary
03:17
topic1
topic2
topic3
topic4
M. Heidary
03:17
Learning
P(ti | d) is the probability of the ith term for a given
document d and zi is the latent topic. P(ti | zi = j) is the probability of ti within topic j. P(zi = j | d) is the probability of picking a term from topic j in the document LDA estimates the topic-term distribution P(t | z) and the document-topic distribution P(z | d) from an unlabeled corpus of documents using Dirichlet priors for the distributions and a fixed number of topics.
M. Heidary
03:17
Learning
Gibbs sampling is one possible approach to this end: It
iterates multiple times over each term ti in document di, and samples a new topic j for the term based on the aforementioned probability. CTZ maintains a count of all topic-term assignments. CDZ counts the document-topic assignments Z-i represents all topic-term and document-topic assignments except the current assignment zi for term ti and are the (symmetric) hyperparameters for the Dirichlet priors, serving as smoothing parameters for the counts
03:17
M. Heidary
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17

For tagging systems the documents are resources r
R, and each resource is described by tags t T assigned by users u U. Instead of documents composed of terms, we have resources composed of tags To build an LDA model we need resources and associated tags previously assigned by users.
M. Heidary
03:17
Example of using AR
M. Heidary
03:17
Example of using LDA
Top terms composing the latent topics photography and howto

M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
Personalized tag recommendation using LDA

We need to rank possible tags t, given a resource and
a user. P(t) can be estimated via the relative frequency of tag t in all bookmarks. We use simple language models, on the other hand, we use Latent Dirichlet Allocation, also in order to recommend tags for new resources and users, which have only few bookmarks available .
M. Heidary
03:17
Language Model
where c(t,r) is the count of tag t in resource r. Plm(t,u) of a user u using tag t is determined in a
similar way from all tags the user has assigned. For new resources and users, having a few bookmarks available, the simple language model does not suffice for tag recommendation.
M. Heidary
03:17
Latent Dirichlet Allocation(LDA)
The estimation of Plda(t,u) proceeds in the same way
as the estimation of Plda(t,r) by operating on the individual tag sets of users rather than resources.
M. Heidary
03:17
based on resource profiles
Top tags composing the latent topics tech news and Flickr based on resource profiles.
M. Heidary
03:17
based on user profiles
Top tags composing the latent topics mac and do it yourself based on user profiles.
M. Heidary
03:17
Combining LM and LDA
We have experimented with a broad range for , and
achieved consistently good results for in the range of [0.2 0.8]
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
Evaluation
Results for one known bookmark and different algorithms on the Delicious dataset.
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17
Conclusion
we have explored user-centered and resource-
centered approaches for personalized tag recommendation. We compared and employed a language modeling approach and an approach based on Latent Dirichlet Allocation. Even for non-textual resources like videos or audio, additional metadata could be exploited.
M. Heidary
03:17
Outline
Introduction
What is Folksonomy?

Conclusion
M. Heidary
03:17

First, implement this approach with c#. Implementation of LM is simple, I use Some LDA implementation available at web. And I use some Farsi datasets, like news ones. I plan to consider tag and time information, to
recommend appropriate tags over the time. The Article N. Zheng and Q. Li, "A recommender
system based on tag and time information for social tagging systems," Expert Systems with Applications, vol. 38, no. 4, pp. 4575-4587, Apr. 2011. will
be useful for this approach.

M. Heidary
03:17
Thanks
Any Question?
M. Heidary
03:17

Personalized Topic-Based Tag Recommendation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Personalized Topic-Based Tag Recommendation

Uploaded by

Copyright:

Available Formats

2012

Personalized Topic-Based Tag Recommendation

Ralf Krestel , Peter Fankhauser

Keywords: Tag recommendation Personalization Language models Topic models

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

Model the evolution of topics over time

Using LDA for Tag Recommendation

method of topic modeling. The general idea:

LDA helps to explain the similarity of data by

grouping features of this data into unobserved sets

following fashion: when writing each document:

So the document generated under the LDA model will be

In reality, we only observe the documents, our goal is to infer the

Go through each word w in d

P(ti | d) is the probability of the ith term for a given

Gibbs sampling is one possible approach to this end: It

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

Example of using LDA

Top terms composing the latent topics photography and howto

Using LDA for Tag Recommendation

Personalized tag recommendation using LDA

Latent Dirichlet Allocation(LDA)

The estimation of Plda(t,u) proceeds in the same way

based on resource profiles

based on user profiles

Combining LM and LDA

We have experimented with a broad range for , and

achieved consistently good results for in the range of [0.2 0.8]

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

Using LDA for Tag Recommendation

What plan do I have?

be useful for this approach.

You might also like