You are on page 1of 19

Article _____________________________

User profiling on the Web based on deep


knowledge and sequential questioning
Silvano Mussi
CILEA (Interuniversity Consortium for Information and Communication Technologies), via R.
Sanzio 4, 20090 Segrate-Mi, Italy
E-mail: mussi@cilea.it

Abstract: User profiling on the Web is a topic that has attracted a great number of technological approaches
and applications. In most user profiling approaches the website learns profiles from data implicitly acquired from
user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This
paper presents an alternative approach. In this approach the website explicitly acquires data from users, user
interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile
enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the
Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in
a collaborative process of profile building. The approach has been tried out for over a year in a real application.
On the basis of the experimental results the approach turns out to be particularly suitable for applications where
the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has
a community of users that share the same domain knowledge of the website and produce a ‘low’ number of
accesses (‘low’ compared to the high number of accesses of a typical commercial website). After presenting the
technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the
literature on human–computer interaction and user profiling.
Keywords: user profiling, deep knowledge, value of information, Bayesian networks, sequential asking,
Shannon entropy

1. Introduction such as: Are you interested in the topic X? If yes,


to what extent? And so forth, for each possible
In order for a website to be able to be collabora-
topic X. However, such an approach could result
tive with its users (e.g. to be adaptive (Billsus
in bothering the user both because there are too
et al., 2002; Brusilovsky & Maybury, 2002), to
many questions and because some questions
exhibit a personalized behaviour (Fink et al.,
might turn out to be not very easy for the user.
2002), to produce personalized alerting (Horvitz
In order to overcome such a difficulty a website
et al., 1999) etc.), it is necessary that the website
has to be able to infer hypotheses about multiple
knows the user interests at a sufficiently fine level
user interests on the basis of little and simple
of refinement. In other words, the website has to
information easily provided by the user, i.e.
acquire the user profile. This paper presents an
information concerning simple facts1 rather than
approach to the acquisition and construction of
information entailing a certain mental effort
user profiling. A plain approach to this problem
consists in making the website directly acquire 1
If we consider the medical domain, simple facts provided by
from the user all the information it needs, e.g. by the patient are, for example, ‘I have fever’, ‘I have a pain in
asking the user to fill in a sort of questionnaire my stomach’ etc.


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 21
from the user (e.g. information resulting from order to be realistic, interests inference cannot
mental activities like judgment, introspection, avoid taking uncertainty into account. As a
abstraction, reasoning etc.). However, it may consequence, Bayesian networks (Pearl, 1988;
happen that after these initial questions the Jensen, 2001) turn out to be an ideal tool to
information collected is not sufficient to infer represent deep knowledge and to reason with it
user interests with a low margin of uncertainty. under uncertainty. A site provided with deep
A website has therefore to consider the possibi- knowledge does not need a statistically signifi-
lity of asking additional questions, questions cant number of accesses before exhibiting a
whose answers may be, in general, not very easy personalized behaviour, as is required from
to provide. In other words, because these ques- similarity-based approaches. Similarity-based
tions are not as easy as the initial ones, there is approaches like content-based or collaborative
the possibility that a user may feel bothered filtering approaches (Hirsh et al., 2000) are, in a
when he=she is asked them. The questions sense, based on shallow knowledge: an item is
should therefore be asked only if it is necessary, supposed to be relevant for the user if its content
and they cannot be asked all and indistinctly at is similar to the content of the items he=she
the same time, given that each question entails a preferred in the past (content-based approach)
trade-off between expected benefit and expected or if other users with a similar taste preferred it
bother. To solve such a problem, the paradigm in the past (collaborative filtering approach). So,
of sequential asking is proposed and applied. a site using these approaches does not know the
The paper is organized as follows. Section 2 deep reasons why a user prefers an item – it lacks
introduces, as a background topic, the role of deep knowledge, i.e. it lacks causal knowledge
deep knowledge in user profiling, while Section 3 linking the user preferences (effects) to the user
introduces, as a foreground topic, the role of goals (causes).
sequential asking in user profiling. Section 4 Turning to Bayesian networks, let us face the
presents the technical aspects of the proposal problem of building the Bayesian network of the
focusing on the paradigm of sequential asking. interests of a user working, or, more generally,
Section 5 illustrates a significant experimental operating, in a well-defined domain of real life,
application, while Section 6 presents the experi- characterized by precise domain knowledge. Let
mental results and give some remarks on them. us think, for example, of the medical domain, the
In Section 7 some strengths of the proposed computer science domain etc. Inside a domain of
approach are pointed out. Section 8 discusses knowledge (for short, in the following, domain),
the proposal in the context of human–computer multiple topics may be identified. Let us adopt
interaction research and related work. Finally, the basic assumption that domain topics repre-
Section 9 draws some conclusions. sent possible user interests (in the respective
topics). In general, working in a certain field
involves pursuing certain goals, and pursuing a
certain goal involves using certain tools. So we
2. Background: deep-knowledge-based user
have topics concerning fields (field-topics), goals
profiling
(goal-topics) and tools (tool-topics). As a con-
Inferring user interests starting from acquired sequence, being interested in a certain field-topic
facts may be carried out by using a deep- involves being interested in certain goal-topics,
knowledge-based approach. In the deep-knowl- and being interested in a certain goal-topic
edge-based approach to user profiling, deep involves being interested in certain tool-topics.
knowledge is represented by causal knowledge We have therefore implicitly built a causal
linking user facts to user interests and organizing network of interests. What we need now is to
the user interests themselves in causal paths. In quantify uncertainty. So let us ask the domain
practice, user interests coincide with the topics of expert to provide conditional probabilities, i.e.
the domain knowledge. Let us notice that, in P(interest in topic X| interest in topic Y). We

22 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
have therefore obtained the Bayesian network of
field_fact field_fact_1
interests in the domain topics for an abstract user.
In order to be able to personalize the network to a
specific user we need to add user-facts, i.e. facts
that affect interests or are consequences of field_interest field_interest
interests. Among user-facts let us distinguish
between facts concerning tools (tool-facts) and
facts concerning fields (field-facts). A tool-fact
concerns the use of a tool. For example, ‘the user goal_interest goal_interest

uses the software S’ is a tool-fact. A tool-fact is


caused by the interest in the related tool, e.g. the
fact that the user uses S results from the fact that tool_interest tool_interest
the user is interested in S. Let us ask the expert
to provide the probability of using a tool given
the interest in it, e.g. P(‘the user uses the tool S’ |
‘the user is interested in the tool S’).2 A tool-fact tool_fact tool_fact_1
plays the role of a symptom of the presence (in
the mind of the user) of interest in the related Figure 1: Conceptual structure of the user-
tool-topic. A field-fact concerns a user context. interests network. The pure interests network
For example, ‘the user works in the field of (represented by nodes of type field-interest, goal-
software development’ is a field-fact. A field-fact interest, tool-interest) has been enriched with
affects the related field-interest, e.g. working in nodes of type field-fact and tool-fact. The figure
the field of software development affects interest shows an abstract network, with only two nodes
in the field of software development. Let us ask for each type. Figure 3 will show a real instance.
the expert to provide the probability that a user
is interested in a field given the fact that the user
works in a certain field, e.g. P(‘the user is in-
3. Foreground: question-asking-based user
terested in the field of software development’ |
profiling
‘the user works in the field of software develop-
ment’). A field-fact plays the role of a situation Among user-facts we can distinguish between
conditioning the probability of the presence (in facts provided by the user without any effort at
the mind of the user) of interest in the related all, i.e. facts definitely easy to provide (for short,
field-topic (it is the case of anamnesis in the me- easy-facts), and facts provided by the user after a
dical field). In the following, for short, interests certain mental process, a certain reasoning ef-
in field-topics, goal-topics and tool-topics will be fort, i.e. facts less easy to provide (for short, less-
called field-interests, goal-interests and tool- easy-facts). For simplicity let us limit our analy-
interests respectively. Let us now resume the sis to consider less-easy-facts concerning tools
considerations made so far by representing in (i.e. tool-facts less easy to provide). For example,
Figure 1 the conceptual structure of the user- let us consider the question ‘Do you use X?’. The
interests network. For simplicity let us consider answer is easy if X is an object such as for ex-
two states for each node, i.e. y (‘yes’ state), n ample a certain computer or a certain software
(‘no’ state). See Mussi (2003) for more details on product etc., but it might not be so easy if X
a knowledge-based approach to user profiling. denotes an approach to a problem or a frame-
work etc. In fact the user might have developed a
solution that is not just a mere instantiation of a
2
Let us note that in real-world applications the presence of well-known approach but, for example, takes
interest in a tool is not sufficient to make the user use it,
especially in cases in which the user has at his=her disposal some ideas from a certain approach, some other
possible alternative tools for pursuing the same goal. ideas from other approaches, and then intro-


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 23
duces some original variants.3 On the basis of
bother
these considerations we conclude that a website
cannot avoid considering the possibility that a
user be bothered by having to answer a question
that entails a certain mental effort (reasoning,
interested
analysis, assessment etc.). So, whereas easy-facts
may be asked without restrictions, less-easy-
bother_af
facts should be asked only if it is worth doing it,
taking into account the probability of bothering
the user. In the following, a question for acquir- Figure 2: Structure of the bother-effect network.
ing an easy-fact will be called, for short, an easy-
question, whereas a question for acquiring a
less-easy-fact will be called a less-easy-question. relation: ‘bother’ ! ‘bother_af’. The ‘interest-
The probability of bothering the user because of ed’ node (which stands for ‘the user is interested
a less-easy-question is affected by the topic of the in the topic of the question’) plays the role of a
question itself. If the question concerns a topic conditioner node for the conditional probability
the user is not interested in, there is a higher table of the bother_af node. In other words, it
probability that the user gets bothered by the modulates the probability that the user is in the
question. The fact that the user is not interested state bother after the question. So, in formal
in the question topic might be due to the fact that terms, we have that P(bother_af ¼ y | bother ¼ n,
the field he=she works in does not concern that interested ¼ y) is lower than P(bother_af ¼ y |
topic. In such a case he=she might perceive a bother ¼ n, interested ¼ n).4 Let us now resume
sense of ‘logical discontinuity’ with his=her pre- the considerations made so far by representing
viously entered easy-facts and might not know in Figure 2 the structure of the bother-effect
that topic very well or even at all. So in conclus- network.
ion he=she is more likely to get bothered if the
question concerns a topic he=she is not inter-
4. Enriching and refining user profiles
ested in. These considerations prompt us to
consider a bother-effect network beside the user- The website starts the profile building process by
interests network. The bother-effect network asking the user to enter a set of easy-facts and
simply consists of three nodes (each one with then by propagating them through the user-
two states y, n): ‘bother’, ‘bother_af’, ‘interest- interests network, inferring in this way the user
ed’. Let us examine their meanings and how they profile, or in other words inferring, for each
are connected. The ‘bother’ node (which stands topic, the probability that the interest in the
for ‘the user is bothered’) and the ‘bother_af’ topic is present in the user’s mind. At this stage
node (which stands for ‘after the question the the question arises: is the inferred profile suffici-
user is bothered’) represent the current status ently defined? That is, have the user-interests
(bothered or non-bothered) of the current user been inferred with a sufficiently low uncertainty
respectively before and after a question is asked. margin? In practice, is it necessary or not to ask
They are therefore linked by the cause–effect one or more less-easy-questions in order to
capture further information from the user and
3
The distinction between easy-facts and less-easy-facts is infer a more accurate profile? The answer is that
typical in the medical domain. In fact, information provided less-easy-questions should be asked only if ‘it is
by the patient at the beginning of a medical examination is an
easy-fact, whereas information coming from a clinical test is worth doing it’. Let us define the concept of ‘it
a less-easy-fact. For example, the answer to the question ‘Do is worth doing it’. In a broad sense, the purpose
you have fever?’ is an easy-fact, but the answer to the question of asking questions is to know more about
‘Is the bilirubin level in your blood normal?’ is a less-easy-fact
(you have to undergo a blood test, you have to pay for it, it
4
takes some time etc.). Obviously, P(bother_af ¼ y | bother ¼ y) ¼ 1.

24 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
user-interests, or, more precisely, to decrease So the expected benefit of q, with respect to the
uncertainty about user-interests. In order to node G, is given by
simplify the problem we wonder if among all the
user-interests there is a subset of interests for EBG ðqÞ ¼ EVG ðqÞ  ½ENTðGÞ ð4Þ
which the margin of uncertainty is particularly
Let us consider a set of nodes G. If we assume
important to be low. To this end let us notice
that each node has the same weight of importan-
that, since goals are at the basis of user actions,
ce, the overall expected benefit from a question q
goal-interest nodes (i.e. goal-topics) are strate-
is given by
gic, and as a consequence we aim at having
low uncertainty about these nodes. So a less- X
easy-question is worth asking if, in spite of the EBðqÞ ¼ EBG ðqÞ ð5Þ
G
unavoidable risk of bothering the user, it is
expected to produce a significant decrease of Let us now turn back to the user-interests
uncertainty about the goal-interest nodes. Un- network. Let G denote a goal-interest node. For
certainty about the states of a node is well each question q 2 Q let the set Jq of the possible
represented by the concept of entropy (Shannon answers be {‘yes’, ‘no’}. If the user answers ‘yes’
& Weaver, 1949), which is the hub concept of (‘no’) to q, we set the related tool-fact node to ‘y’
information theory. So a less-easy-question is (‘n’). For example, let us consider the less-easy-
worth asking if the expected decrease in entropy question ‘Do you use X?’ and the related tool-
of the goal-interest nodes is significant enough to fact node ‘use_X’ (standing for ‘the user uses
compensate the unavoidable risk of bothering X’). If the user answers ‘yes’, ‘use_X’ is set to the
the user. The next section will formally define the state y; if he=she answers ‘no’, ‘use_X’ is set to
problem in the decision theory framework (von the state n. So (2) becomes
Winterfeldt & Edwards, 1986) based on entropy.
VG ð‘Do you use X?’ ¼ yesÞ
¼  ENTðGj‘use X’ ¼ yÞ ð20 Þ
4.1. Benefit expected from a less-easy-question
Let G be a node and let g denote a state of G, for VG ð‘Do you use X?’ ¼ noÞ
short g 2 G. The entropy of G is given by ð200 Þ
¼  ENTðGj‘use X’ ¼ nÞ
X
ENTðGÞ ¼  PðgÞ  log2 PðgÞ ð1Þ Moreover, the probability of using X given the
g2G
interest in X represents the probability of
obtaining from the user the answer ‘yes’ to the
Let us use the value function ENT which in-
question ‘Do you use X?’ given the interest in X,
creases with preference. Let Q be the set of less-
i.e. we do not consider lying. So (3) becomes
easy-questions and let q be a less-easy-question,
i.e. q 2 Q. Let Jq be the set of possible answers to EVG ðqÞ ¼  ENTðGj‘use X’ ¼ yÞ  Pð‘use X’ ¼ yÞ
q, and let jq be an answer to q, i.e. jq 2 Jq. The  ENTðGj‘use X’ ¼ nÞ  Pð‘use X’ ¼ nÞ
value, with respect to the node G, of the
ð30 Þ
information jq is given by VG( jq):

VG ð jq Þ ¼ ENTðGj jq Þ ð2Þ 4.2. Bother-effect expected from a less-easy-


question
The expected value, with respect to the node G,
From the user-interests network we can obtain,
of the question q is given by EVG(q):
through (5), the expected benefit of the question
X q: ‘Do you use X?’. However, in order to assess if
EVG ðqÞ ¼ VG ð jq Þ  Pð jq Þ ð3Þ q is worth asking or not, we have to take into
jq 2Jq account the probability that, after q is asked,


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 25
the mental state of the user is bothered (because bother_af is n, the utility is maximum, i.e. 1. If
of q). In order to properly calculate this the bother-effect is present, i.e. the state of the
probability we have to take into account two node bother_af is y, the utility is minimum, i.e.
influence factors: the first concerns a sort of 0. So, the expected bother-effect in terms of
cumulative effect in cases of multiple questions, utility is given by P(bother_af ¼ y | q) * 0 þ
and the second concerns the influence of the P(bother_af ¼ n | q) * 1 ¼ 1  P(bother_af ¼ y|q).
question topic. Let us examine the first factor. Let us call the expected cost of q the quantity
The probability that the user is ‘bothered’
ECðqÞ ¼ Pðbother af ¼ y j qÞ ð6Þ
because of q is higher after the second question
than after the first, and so forth. Let us model Finally, let us notice that the entropy utility
this cumulative effect in the following way. Let and the bother-effect utility have, in general,
us initially start with P(bother ¼ y) ¼ 0; then different importance. Let us therefore consider
each time a question is asked let us replace the the importance weight distributions kENT and
probability distribution of the bother node with kdist, where kENT represents the importance
that of the bother_af node. Let us now pass to weight assigned to the entropy and kdist repre-
examine the second factor. sents the importance weight assigned to the
The probability that the user is interested in bother-effect.5 On the basis of all these con-
the question topic is given by the probability siderations, the expected net advantage, which
distribution of the related tool-interest node will be called ‘expected profit’ (EP), from a
(Figure 1). So, given the question q, ‘Do you use question q is given by
X?’, let us replace the prior probability distribu-
EPðqÞ ¼ EBðqÞ  kENT  ECðqÞ  kdist ð7Þ
tion of the node interested (Figure 2) with the
probability distribution of the tool-interest node where kENT þ kdist ¼ 1. In conclusion, a question
‘the user is interested in X’ (Figure 1). After q is worth asking if EP(q) > 0. Moreover, the ques-
having assigned the proper probability distribu- tion with the maximum EP is the preferred one.
tions to the nodes bother and interested, let us
propagate. The expected bother-effect of q is 4.4. Sequential asking algorithm
therefore given by the value of P(bother_af ¼ y) In general, there is a certain number ( > 1) of
resulting from the propagation. Finally, let us less-easy-questions. So a website during the pro-
note that P(bother_af ¼ y) > P(bother ¼ y). cess of solving the target problem (i.e. inferring
the user profile) has to solve the meta-problem of
4.3. Profit expected from a less-easy-question choosing, under trade-off between benefit and
cost, the more opportune less-easy-question to
In order to obtain a single number quantifying ask next in order to acquire a new piece of infor-
the opportunity of asking a less-easy-question mation which will contribute to solving the
q, we have to combine, in terms of utility, target problem, taking into account also the fact
the expected benefit (entropy decrease) with the that the answer provided by the user has the side-
expected bother-effect of q. Let us consider effect of changing the benefits expected from
the utility function of the entropy of a node G. If asking other questions. Let us establish that ques-
the entropy of G is minimum, i.e. 0, the utility is tions are asked one at a time. This strategy is
maximum, i.e. 1. If the entropy of G is maxi- called myopic. It addresses the following ques-
mum, i.e. 1, the utility is minimum, i.e. 0. So, tion: if you are allowed to ask at most one
considering, for simplicity, a linear utility func-
tion, we have that the expected benefit in terms 5
The importance weights may be elicited with the probability
of utility is equal to the expected benefit in terms indifference method. Considering for example kENT we have
of pure entropy. Let us now pass to consider that the gambling situation is G: [ p* {entropy min, bother
absent}; (1  p)*{entropy max, bother present}], whereas
the utility function of the bother-effect. If the the certainty situation is I: [{entropy null, bother present}
bother-effect is absent, i.e. the state of the node for certain].

26 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
question, which question would you choose? As The practical consequence of all this is that the
is well known the analysis of all the possible user profiling process is not accomplished in a
sequences of questions is, in practice, intractable single and initial phase but in a sequence of
because the number of sequences grows expo- sessions over time. In each session the sequential
nentially with the number of tests (Heckerman asking algorithm is activated and, as a conse-
et al., 1992). So, many real-world applications quence, the user profile is enriched and refined.
use the so-called myopic value-of-information In practice, when, after a certain time from the
approach. The myopic approach is in practice a last user profiling session, the user accesses the
good heuristic (Gorry & Barnett, 1985). website, the website evaluates if there are some
We are now ready, on the basis of the consi- less-easy-questions that are worth asking. If yes,
derations made so far, to define at a conceptual the website personalizes the user home page with
level the basic algorithm of sequential asking. a message inviting the user to resume the
Let x denote the current available knowledge, dialogue in order to better refine his=her profile.
i.e. x ¼ ‘all the information entered so far’.
0. Ask all the easy-questions, acquire the 5. An application of the proposed approach
answers and set the related user-fact nodes within a supercomputing portal
to the appropriate states.
In this section we will describe a significant
1. Propagate the entered information through
application of the approach so far presented.
the user-interests network.
The proposal has been applied within the
2. If all the less-easy-questions have been
CILEA6 supercomputing portal:7 a portal de-
asked, then EXIT.
voted to disseminating supercomputing culture
3. For each less-easy-question q not yet asked
and events in the community of researchers
calculate EP(q | x).
working in the supercomputing field. More pre-
4. If each EP(q | x) r 0, then EXIT.
cisely, the proposal has been applied in the con-
5. Ask the user the q with the maximum EP,
text of a news recommender system embedded in
acquire the answer and set the related tool-
the portal. The proposal has been implemented
fact node to the appropriate state.
in a prototype (developed by the author)8 tested
6. Go back to point 1.
on a PC off-line; then the prototype was inte-
grated into the portal. Supercomputing news is
4.5. Asking over time
recommended in a personalized manner both
After a certain number of questions the prob- through a personalized presentation on the home
ability that the user is bothered (because of the page and by activating a personalized alerting
questions), i.e. P(bother_af ¼ y | q), increases so service. The application regards a particular sub-
much that for each q it happens that EP(q | field of supercomputing, that of computational
x) r 0, and as a consequence the sequential fluid dynamics (CFD). CFD encompasses many
asking stops. However, as time passes, it seems heterogeneous application fields (aerospace, bio-
reasonable to suppose that the probability that medical etc.), various phenomenological areas
the user is still in the state bothered (because of (turbulent flows, porous media flows etc.),
the questions) decreases. As a consequence, it various computational techniques (large eddy
might happen that an EP that was negative in the simulation, coupled transport equations etc.),
past is now positive. In other words, after a cer-
6
tain time interval there might be some questions CILEA is an Italian interuniversity consortium for informa-
that are worth asking, even if they were not in tion and communication technologies that provides super-
computing services to its users in order to promote
the past. Such a situation is modelled by dyna- supercomputing culture and services.
mically calculating the values of the prior proba- 7
www.supercomputing.it
bility distribution of the node bother on the basis 8
Tools used to implement the proposal: ASP pages, C
of an appropriate decreasing temporal function. language, MySQL database, Hugin inference engine.


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 27
specialized software packages (Fluent, Fidap In this application, user profiles are used by
etc.) and so forth. A CFD user may be an aero- the portal to recommend CFD news. Such news
spatial engineer working in the aerospace in- is classified by a CFD expert. More precisely,
dustry; another may be a biophysicist working in given a piece of news N, a CFD expert, for each
the biomedical field etc. A biophysicist probably topic T, defines (through subjective estimation)
pursues the goal of developing computational the probability that N concerns T. The classified
models of porous media flows, uses the coupled piece of news is then entered in the website
transport equations technique and runs his=her database. The expected utility EU of each news
models with the package Fidap. Conversely, an N for a user U is calculated according to the
aerospatial engineer probably focuses on mod- following algorithm:
elling turbulent flows, and it is probable that
EU(NU) ¼ 0
he=she is not interested in the tools used by a
For each T do
biophysicist. So the need for building accurate
EU(NU) ¼ EU(NU) þ EU(NU T)
user profiles for recommending the right news to
End for
the right users arises. Taking into account the
general network structures of Figures 1 and 2, where
the CFD user-interests network and the bother-
effect network illustrated in Figure 3 have been EUðNUT Þ
produced. Notice that the user-interests network ¼ UðNUT j ‘N concerns T’ ¼ yes; ‘interest in T’
encompasses six nodes (on the top) representing ¼ yesÞ
field-facts, 18 nodes (on the bottom) represent-  Pð‘N concerns T’ ¼ yesÞ
ing tool-facts and 30 nodes (in the middle)
 Pð‘interest in T’ ¼ yesÞ
representing interests, i.e. topics.
Tool-facts have been distinguished as easy- where U(NU T|‘N concerns T’ ¼ yes, ‘interest in
facts and less-easy-facts. The facts concerning T’ ¼ yes) stands for ‘utility of N for the user U,
the use of software packages (i.e. the nodes given that N concerns T and the user U is inter-
whose names are prefixed by ‘use_s_’) have been ested in T’. For simplicity U(NU T | . . .) has been
considered as easy-facts, whereas the facts assumed to be equal to 1 for any piece of news
concerning the use of computational techniques and any user. The website has therefore at its dis-
(i.e. the nodes whose names are prefixed by posal a personalized expected utility of each piece
‘use_t_’) have been considered as less-easy-facts. of news, calculated on the basis of the inferred
Referring to the algorithm defined in Section 4.4, user profile. So, when a user accesses the portal,
let us start (step 0) by asking the user to enter the the portal uses such EU values to recommend
set of easy-facts (Figure 4). The entered facts are news to the user. Recommendation is carried out
then propagated through the network, inferring by presenting news with different emphasis and
in this way the CFD profile of the current user in decreasing order of relevance (Figure 6).
(step 1). So now the portal knows, for each of the As stated in Section 4.5, user profiles are
30 topics, the probability that the current user is enriched and refined over time. In practice, when
interested in it. The loop of the remaining steps is a user accesses the portal, the portal evaluates if
then performed, possibly asking less-easy-ques- there are some less-easy-questions that are worth
tions depending on the specific current case. For asking9 and, if that is the case, places at the
example, if the user declares that he=she works beginning of the news list a message inviting the
in the aerospace field and does not use any of user to resume the dialogue (Figure 7).
the software packages listed in Figure 4, the News recommendation is also accomplished
portal finds that it is worth trying to decrease via alerting e-mail. Users can take advantage of
uncertainty about the goal-interests and begins
to ask less-easy-questions (see Figure 5 for 9
The algorithm in Section 4.4 is performed by starting from
an example). step 2.

28 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.

c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
Figure 3: The figure shows the bother-effect network (top left) and the CFD user-interests network of the application of the proposal to the
CILEA supercomputing portal. The interests network instantiates the abstract network of Figure 1. More precisely, starting from the top, the
nodes of the top layer (whose names begin with Work_in) are of type field-fact. Coming down, the nodes of the following three layers represent
interests and are of type field-interest, goal-interest, tool-interest respectively. The nodes of the bottom layer are of type tool-fact. In particular

Expert Systems, February 2006, Vol. 23, No. 1


the eight nodes on the left concern the use of computational techniques (less-easy-facts), whereas the remaining ten on the right concern the use of

29
software packages (easy-facts).
If you wish you may enrich your profile record by selecting in the area of Computational Fluid Dynamics the application field /s
you work in

Aerospace (Environmental Control System, Propulsion, Pumps, Rotor-Airframe Interaction, External Aerodynamics ...)
Biomedical (Blood Handling Equipment, Physiological Flows, Toxicology Research ...)
Chemical Process (Combustion, Drying, Emission Control, Filtration, Reaction, Water Treatment ...)
Environment (Atmospheric Plume Dispersion, Coast Erosion, Hydro-Geological Applications, Irrigation Components,
Meteorological Applications, Petroleum Platforms, Sea Technologies, Service Reservoirs, Wastewater Pump-
stations, Weirs, ...)
Automotive (Engine Cooling, External Aerodynamics, Intake Valves, Hunderhood Flow Simulation, Vehicle Interior, Vapor
Dispersion, Windshield Washer Nozzles...)
Energy (Boilers, Burners, Coal Transport and Classification,Combustor, Hydro-power, Incinerators, Nuclear Reactors,
Turbomachinery ...)

If you wish you may enrich your profile record by selecting the software package/s you use for running your models in the area of
Computational Fluid Dynamics

KIVA
FLUENT
FIDAP
ADINA
CFX
CFX_TASKFLOW
CFX_TURBOGRID
GAMBIT
TGRID
STAR_CD

Figure 4: The form used for entering the initial set of easy-facts, i.e. the subset of six field-facts (top)
followed by the subset of ten tool-facts (bottom).

the intelligent alerting service: for each user, the Do you use (in your work activity) the computational technique:
portal considers the set of news not yet seen by Volume of Fluid Methods?
yes
the user, and for each piece of news calculates
the expected utility the piece of news has for the no
user; if the expected utility is greater than a given
threshold, the portal sends the relevant piece of I don't feel like answering this question at the moment
news to the user via e-mail.
I would prefer not to answer this question, even in the future

SEND
6. Experimental results
The proposal implementation presented in Sec- Figure 5: An example of a less-easy-question.
tion 5 has been working for more than a year
inside the supercomputing portal. In order to comments both about the proposed approach to
better test the proposal, several heterogeneous user profiling and about the accuracy of various
institutions (research centres, university depart- types of profiles.
ments, organizations operating in the environ-
mental physics field, industries, CFD software
6.1. User feedback about the approach
providers etc.) working in the CFD field have
been involved. During that period of time Most users have expressed a favourable impres-
several researchers working with those institu- sion about the approach. In particular, they
tions have used the proposal implementation. have appreciated both the fact that their inferred
They have been interviewed, collecting their profiles were ready soon after their initial login

30 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
GOOD NEWS FOR YOU!!!

Int. conf. on Aerospace supercomputing

FURTHER NEWS OF PROBABLE INTEREST FOR YOU

Tutorial course on Turbulent Flows modelling

OTHER NEWS YOU MIGHT BE INTERESTED IN

Summer school on Fluent

OTHER NEWS

Advanced course on porous media flows modelling

Int. Conf. on Biomedical supercomputing

Figure 6: An example of personalized news recommendation, given the fact that the user works in
the aerospace field.

Figure 7: An example of a message inviting the user to resume the dialogue.

and the fact of being directly involved (through resuming the dialogue. Moreover, for each
specific knowledge-based questions) in the pro- question the user is offered the possibility of
file refining process over time. They have temporally suspending the answer if he=she does
declared that they are not bothered by being not feel like answering at that moment, or telling
asked questions over time from the website; the website not to ask that question any longer
conversely they have declared that they are (see the choice options appearing in Figure 5).
pleased to cooperate with the website. They have We note that the type of user population that has
perceived the question-asking ‘attitude’ from the been considered in this experimental year has the
website as a sort of constant and competent following characteristics. Users are researchers
attention from the website to their needs (the in the CFD field. In general they have used the
website behaves like an expert that looks after alerting service to be automatically notified of
their specialist interests). In fact questions are possibly relevant newly created news, and have
asked in an unobtrusive manner because of both not accessed the portal very often. Both users
the fact that they are asked over time according and the website share a very specific field of
to the sequential asking algorithm and the fact knowledge (CFD knowledge), and it is just this
that a user is not forced to answer them. A user is knowledge on which the dialogue between users
first asked if he=she agrees with the website and the website is based.


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 31
6.2. User feedback about profile accuracy For each hypothetical case in SHC:
Profile accuracy has been checked in an empiri-
 enter the related user-facts and propagate
cal manner. Initially the probability tables of the
them through the user-interests network, in
user-interests network were elicited, with the aid
this way producing the related inferred
of suitable forms, from the CILEA CFD expert.
profile;
Then they were tuned with the cooperation of a
 activate the recommender system with the
set of end-users who had used both ‘probability
SHN set, in this way producing the list of
soundness’ tests and ‘recommendation sound-
hypothetical news recommended according
ness’ tests. In order to clearly explain what these
to the inferred profile;
soundness tests consist of, let us consider three
 ask each end-user in the SEU to examine
sample sets whose definitions are illustrated in
 the inferred profile, to check if the inferred
the following section.
user-interest probabilities could be con-
sidered acceptable in the light of the
domain common sense, given the entered
6.2.1. Sample set definitions Some end-users
facts (‘probability soundness’ test);
belonging to heterogeneous CFD sub-fields have
 the emphasis given to each hypothetical
been involved in providing feedback about
piece of news by the recommender system,
profile accuracy. Let SEU be the sample set of
to check if the emphasis degree could be
such end-users.
considered compatible with the inferred
Some typical hypothetical cases of user-facts
profile in the light of the domain common
have been defined. For example, let us consider a
sense (‘recommendation soundness’ test).
hypothetical case defined by the user-facts
‘working in the aerospace field’ and ‘using the The feedback collected from the end-users of the
software tool Fluent’ (e.g. the case of an SEU was then used to tune the probability tables
aerospatial engineer), another defined by the of the user-interests network.
user-facts ‘working in the biomedical field’ and
‘using the software tool Fidap’ (e.g. the case of a
bioengineer) and so on. Let SHC be the sample 7. Strengths of the proposed approach and
set of such hypothetical cases. suitable application domains
Some suitable hypothetical piece of news has
A basic human–computer interaction element
been created. More precisely, for each CFD
underlying the proposed approach is represent-
topic T hypothetical news NT concerning T with
ed by the fact that the main aim of the website is
probability 1 and the other topics with probabi-
that of cooperating with a user (without bother-
lity 0 has been created so that the expected utility
ing him=her) to build an accurate profile of him=
of NT is equal to the probability of ‘interest in
her. The website aims at building an accurate
T’ ¼ yes. For example, for the topic ‘turbulence
user profile to create a collaborative user rela-
flow modelling’ news like ‘International Confer-
tionship over time, so that the user perceives the
ence on Turbulence Flow Modelling’ has been
website as a collaborator watching over his=
created, for the topic ‘porous media flow model-
her interests and looking after his=her profile
ling’ news like ‘Summer School on Porous
over time. In fact, even after the initial profile
Media Flow Modelling’ has been created, and
construction session, the website does not miss
so forth. Let SHN be the sample set of such
future favourable opportunities to invite the user
hypothetical news.
to resume the dialogue in order to better refine
his=her profile. This approach of involving the
6.2.2. Soundness tests Soundness tests have user as a partner in the process of building
been performed according to the following his=her profile by asking him=her questions even
algorithm. over time (without bothering him=her) is more

32 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
suitable to websites specialized in certain Summarizing the considerations made so far,
areas interesting a very specific community, such let us conclude that the approach presented in
as, for example, a specific scientific community. this paper is suitable for websites characterized
In these cases, in general, the domain knowledge by the following facts:
of the application expands more in depth than
in broadness, so questions are not excessively  they are based on specialist knowledge (such
numerous, are more specific, the answers are as for example scientific websites);
more significant because they refer to a specia-  the dialogue with users is on the basis of
lized universe of knowledge, and the dialogue domain knowledge that the website and the
between the user and the website resembles users have in common;
a dialogue between two experts operating in  they may have a low number of accesses, but
the same field. As a consequence in these cases nevertheless they need to have at their
it is less probable that a user gets bored when disposal a complete inferred user profile
the website asks him=her some competent soon after the user enters his=her initial data.
questions.
Conversely, in cases of websites dealing with a
Another strength of the proposed approach is
represented by the fact that the website has at its broad and generic domain of knowledge, a very
disposal a complete inferred profile of the user large population of visitors and a statistically
soon after the initial session, where the user significant number of accesses, such as for
enters his=her user-facts. In other words, the example popular commercial websites, famous
website does not need a statistically significant book-store websites etc., alternative approaches
number of accesses from the user to build a com- are used. In general, these approaches are based
on collaborative filtering techniques or other
plete user profile. This is a very useful feature for
techniques of user profiling based on user be-
those websites that have a low number of access-
es, possibly because of the type of population haviour observation (i.e. implicit data acquisi-
accessing the website, the type of news or ser- tion), i.e. techniques that avoid any type of
vices provided by the website etc. For example, a active involvement from users. Some of such
deep-knowledge-based website, concerning a techniques will be reviewed in Section 8.
specific field and having the main purpose of
promoting and disseminating the specific field
8. Related work and discussion
culture, may have a user population mostly
consisting of professional people operating in In this section we will discuss the ideas underly-
the same field (e.g. researchers in the field). In ing the proposal in the wider context of human–
general, such users do not need to access the computer interaction research with a particular
website for their everyday tasks. They access the focus on user profiling. User profiling on the
website for reading recent news. News concerns, Web is a topic which has received much
in general, cultural events in the field: confer- attention in several international journals, con-
ences, courses, books etc. – infrequent events ferences and books. The topic has been ap-
which do not occur daily. Moreover, the website proached in the light of various technologies and
has an alerting service which users might prefer has found application in several heterogeneous
to use instead of directly accessing the website. fields. An exhaustive overview of the state of the
For all these reasons the website may have a low art is beyond the scope of this paper. We will
number of accesses, but nevertheless the website limit ourselves to comparing the proposal to
needs to have at its disposal an accurate user some significant approaches. User profiling is
profile (for providing personalized services) typically either knowledge-based or behaviour-
from the beginning of the user–website relation- based. In knowledge-based approaches informa-
ship. The presented approach represents a tion about users is acquired explicitly through
solution to such a problem. questionnaires or single questions. Behaviour-


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 33
based approaches use implicit observations of work. Such a network is learned from a sample
user behaviour, commonly using machine-learn- of documents that are judged by the user to be
ing techniques to discover useful patterns in the relevant or irrelevant. The proposed method
behaviour (data mining techniques are applied addresses the information retrieval target task.
to log files to extract patterns). We can also clas- An approach addressing the information
sify works on user profiling from two orthogonal retrieval target task and integrating the case-
points of view: the technology used to build pro- based reasoning and Bayesian network techni-
files and the target task the profiles are built for. ques to build user profiles incrementally is
For example, technologies may be Bayesian net- presented in Schiaffino and Amandi (2000).
works, case-based reasoning, ontologies, data Case-based reasoning provides a mechanism to
mining etc., while target tasks may be informa- acquire knowledge about user actions that are
tion retrieval, e-learning, e-commerce, recom- worth recording to determine his=her habits and
mender systems etc. According to these points of preferences. Each case records the attributes or
view, the proposal presented in the present paper keywords used by a user to perform queries. A
may be classified as knowledge-based, using the query is classified according to its similarity to
Bayesian network technology and addressing previous recorded queries. The Bayesian net-
the recommender system target task. work provides a tool to represent relationships
Godoy et al. (2004) propose two intelligent between items of interest. It is built gradually as
agents, PersonalSearcher and NewsAgent, that a user queries the database. Information stored
assist users in tasks of filtering and organize in the form of cases is used to gradually build
information available on the Web. Personal- and update the Bayesian network as the interests
Searcher (a personalized Web searcher) is an of the user change over time. The user profile
interface agent that helps users who are search- consists of a statistical profile (type of queries
ing the Web for relevant information by filtering frequently made etc.) and an inferred (via the
a set of documents retrieved from several search Bayesian network) profile. A profile is used to
engines according to users’ interests. NewsAgent suggest the execution of relevant queries to a user
(a personalized digital newspaper generator) is at an appropriate moment. The authors focus
an interface agent that selects those articles that their attention on users who need data stored in
are relevant to a user from several online news- the database to fulfil their everyday tasks, or at
papers. The agents incrementally build a hier- least who often use the database. Conversely,
archy (a tree) of users’ relevant topics by means our proposal focuses on a type of population of
of a textual case-based reasoning technique (a users who do not access a website very often. In
specialization of case-based reasoning for tex- our case the website has at its disposal a general
tual documents). Profiles are adapted as agents Bayesian network (knowledge base) that is then
interact with users over time. Both agents instantiated to a specific user and generates a
observe users’ behaviour while they are reading specific inferred profile, but does not need to be
Web documents, recording the main features built incrementally. This is an advantage in
characterizing these experiences. A user profile applications where the user population does not
consists of a set of weighted topics relevant to a produce a great number of accesses. In these
user. This approach is therefore behaviour- cases the solution is just the knowledge-based
based, uses the case-based reasoning technology, one: asking users explicitly for some informa-
and addresses the information retrieval and tion, involving the user in a cooperative relation-
intelligent assistant target tasks. In our proposal ship. So, even soon after the first contact with a
also user profiles are represented in terms of topics, new user (registration phase and data entry of
i.e. user interests. In our proposal, however, a Figure 4) the website has at its disposal an
topic hierarchy is represented by a network. inferred profile of the user. Another difference
Wong and Butz (2000) propose a method for concerns the target task: our proposal addresses
representing a user profile as a Bayesian net- the task of news recommendation.

34 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.
In Nokelainen et al. (2002) an adaptive online The same authors (Crabtree & Soltysiak,
questionnaire system, EDUFORM, is pre- 1998) address the information retrieval target
sented. The proposal addresses the educational task, focusing in particular on tracking interest
target task and uses Bayesian probabilistic themes over time through measuring the simi-
models. The authors face the problem of ‘one larity of interest themes across time periods.
size fits all’ on-line questionnaires equipped with Heuristics-based techniques are also used in
numerous propositions. In particular, EDU- Kostoff et al. (2001) and Rousseau et al. (2004)
FORM is a Web-based data gathering tool, addressing the intelligent assistant and informa-
which performs adaptive and dynamic optimi- tion retrieval target tasks respectively.
zation of the number of questionnaire proposi- In Esposito et al. (2003) a comparison of the
tions during the data gathering process. effectiveness of two supervised methods for
EDUFORM uses probabilistic Bayesian meth- learning user profiles, inductive logic program-
ods to create user profiles that are then used to ming and Bayesian classifier, is accomplished.
dynamically optimize the set of propositions The comparison focuses on the two different
that are presented to a user in order to maximize learning strategies to infer models of user
information extraction. In our approach the interests from textual book descriptions. Experi-
goal of minimizing the number of questions mental results are conducted in the context of a
and maximizing information extraction is content-based profiling system for a virtual
achieved by using the value-of-information bookshop on the Web.
framework. Recently particular attention has been paid to
The approach presented in Adomavicius and the use of ontology-based technologies in user
Tuzhilin (1999, 2001) is well suited to the e- profiling. Middleton et al. (2004) explore an
commerce target task. It is behaviour-based and ontological approach to user profiling within
uses data mining methods. More precisely, the recommender systems, working on the problem
authors present a method for constructing user of recommending online academic research
behavioural profiles using data mining techni- papers. They present two experimental systems
ques. Profiles are specified with sets of rules that create user profiles from unobtrusively
learned from transactional histories. Since many monitored behaviour and relevance feedback,
rules can be spurious, irrelevant or trivial, a representing the profiles in terms of a research-
method for validating them separating good paper topics ontology. Papers are classified
rules from bad ones is presented. using ontological classes. The database of re-
Data mining techniques are also used in search papers is classified using a research-paper
Nasraoui et al. (2002) where the authors present topics ontology and a set of training examples.
a framework (based on fuzzy relational cluster- Recorded Web browsing and relevance feedback
ing) for mining typical user profiles from the elicited from users are used to compute daily
vast amount of historical data stored in server profiles of users’ research interests. Interest pro-
access logs. files are represented in ontological terms, allow-
Soltysiak and Crabtree (1998) use a heuristics- ing other interests to be inferred. The interest
based clustering method to generate user interest profiles are visualized to allow elicitation of
profiles. The method is applied to the e-mails a direct profile feedback.
user sends or receives, and the WWW pages The same authors face the theme of the use of
he=she browses. They use a keyword extraction ontologies in recommender systems (Middleton
technique for identifying relevant keywords et al., 2001), and in Middleton et al. (2003) they
within a document. A document is then repre- explore the idea of profile visualization to
sented as a vector of keywords. The vectors are capture further knowledge about user interests.
used as the basis of grouping documents into Let us classify in Table 1 the set of papers con-
clusters. A sufficiently large cluster of docu- sidered so far. The papers, along with the present
ments represents a user’s interest. proposal, are located in the table according to


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 35
36
Table 1: The works examined classified according to the technology they use and the target task they address
e-Commerce Information retrieval Recommender system e-Learning and Intelligent assistant
education

Ontologies Middleton et al., 2004


Middleton et al., 2001
Middleton et al., 2003
a
Bayesian networks Wong & Butz, 2000

Expert Systems, February 2006, Vol. 23, No. 1


Case-based reasoning Godoy et al., 2004 Godoy et al., 2004
Case-based reasoning and Schiaffino &
Bayesian networks Amandi, 2000


Bayesian probabilistic Nokelainen
models et al., 2002
Data mining and rule Adomavicius &
discovery Tuzhilin, 1999
Adomavicius &
Tuzhilin, 2001
Web log data mining and Nasraoui et al., 2002
fuzzy clustering
Heuristics-based clustering Crabtree & Soltysiak &
Soltysiak, 1998 Crabtree, 1998
Rousseau et al., 2004 Kostoff et al., 2001

c 2006 The Authors. Journal Compilation 


Inductive logic programming Esposito et al., 2003
Bayesian classification Esposito et al., 2003
a
Work presented in this paper.

c 2006 Blackwell Publishing Ltd.


the technology and the target task by which they References
are characterized. Although the set of applica-
ADOMAVICIUS, G. and A. TUZHILIN (1999) User pro-
tions that have been considered is not exhaus- filing in personalized applications through rule
tive, the table gives an idea of the variety of discovery and validation, in Proceedings of the Fifth
technological approaches and target tasks of ACM SIGKDD International Conference on Knowl-
user profiling. It also gives an idea of both the edge Discovery and Data Mining, San Diego, CA,
enormous amount of work done in the user 377–381.
ADOMAVICIUS, G. and A. TUZHILIN (2001) Using data
profiling field and the difficulties underlying the mining methods to build customer profiles, IEEE
problem of building accurate user profiling. Computer, 34 (2), 74–82.
BILLSUS, D., C.A. BRUNK, C. EVANS, B. GLADISH and
M. PAZZANI (2002) Adaptive interfaces for ubiqui-
tous Web access, Communications of the ACM, 45
(5), 34–38.
9. Conclusions
BRUSILOVSKY, P. and M.T. MAYBURY (2002) From
This paper has presented a user profiling app- adaptive hypermedia to the adaptive Web, Commu-
nications of the ACM, 45 (5), 30–33.
roach based on a deep-knowledge model of user
CRABTREE, I.B. and S.J. SOLTYSIAK (1998) Identifying
interests (i.e. domain topics) and a sequential and tracking changing interests, International Jour-
asking algorithm using the value-of-information nal of Digital Libraries, 2, 38–53.
theory based on Shannon entropy. The app- ESPOSITO, F., G. SEMERARO, S. FERILLI, M. DEGEM-
roach has been implemented and tried out for MIS, N. DI MAURO, T.M.A. BASILE and P. LOPS

over a year in a real context. Experimental re- (2003) Evaluation and validation of two approaches
to user profiling, in Proceedings of the ECML=
sults have indicated that the proposal is parti- PKDD-2003 First European Web Mining Forum, B.
cularly suitable for websites addressing domains Berendt, A. Hotho, D. Mladenic, M. van Someren,
with specific deep knowledge (like scientific web- M. Spiliopoulou and G. Stumme (eds).
sites) and with a user population that is charac- FINK, J., J. KOENEMANN, S. NOLLER and I. SCHWAB
terized by sharing the same knowledge and (2002) Putting personalization into practice, Com-
munications of the ACM, 45 (5), 41–42.
producing a ‘low’ number of accesses (i.e. ‘low’ if
GODOY, D., S. SCHIAFFINO and A. AMANDI (2004)
confronted with the high number of accesses of a Interface agents personalizing Web-based tasks,
typical commercial website). The approach has Cognitive Systems Research, 5, 207–222.
been discussed in the context of the scientific GORRY, G.A. and G.O. BARNETT (1985) Experience
literature concerning human–computer inter- with a model of sequential diagnosis, in Computer-
assisted Medical Decision Making, J.A. Reggia and S.
action and in particular user profiling. The
Turhim (eds), Berlin: Springer, Vol. 1, pp. 206–222.
discussion has highlighted that the proposed HECKERMAN, D.E., E.J. HORVITZ and B.N. NATHWA-
approach differs from others mostly because of NI (1992) Toward normative expert systems, Part I:
its emphasis on involving users in collaborative The Pathfinder project, Methods of Information in
processes for building and refining their profiles. Medicine, 31 (2), 90–105.
HIRSH, H., C. BASU and B.D. DAVISON (2000) Learn-
ing to personalize, Communications of the ACM, 43
(8), 102–106.
HORVITZ, E., A. JACOBS and D. HOVEL (1999) Atten-
Acknowledgements tion-sensitive alerting, in Proceedings of UAI ’99 –
Conference on Uncertainty in Artificial Intelligence,
In alphabetical order, thanks to Dr Enrico 305–313.
Cavalli for the integration of the prototype into JENSEN, F.V. (2001) Bayesian Networks and Decision
the portal running on the server computer, Graphs, Berlin: Springer.
thanks to Dr Paolo Ramieri for his role of KOSTOFF, R.N., J.A. DEL RIO, J.A. HUMENIK, E.O.
GARCIA and A.M. RAMIREZ (2001) Citation mining:
CFD expert providing CFD knowledge, thanks
integrating text mining and bibliometrics for re-
to the anonymous reviewers for their valuable search user profiling, Journal of the American Society
comments, and thanks to the users involved for for Information Science and Technology, 52 (13),
their cooperation and feedback. 1148–1156.


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd. Expert Systems, February 2006, Vol. 23, No. 1 37
MIDDLETON, S.E., D.C. DE ROURE and N.R. SHAD- SHANNON, C.E. and W. WEAVER (1949) The Mathe-
BOLT (2001) Capturing knowledge of user preferences: matical Theory of Communication, Urbana, IL:
ontologies in recommender systems, in Proceedings of University of Illinois Press.
the International Conference on Knowledge Capture SOLTYSIAK, S.J. and I.B. CRABTREE (1998) Automatic
K-CAP 2001, Victoria, B.C., Canada. learning of user profiles – towards the personaliza-
MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE tion of agent services, BT Technology Journal, 16 (3).
ROURE (2003) Capturing interest through inference VON WINTERFELDT, D. and W. EDWARDS (1986)
and visualization: ontological user profiling in Decision Analysis and Behavioral Research, Cam-
recommender systems, in Proceedings of the Interna- bridge: Cambridge University Press.
tional Conference on Knowledge Capture K-CAP WONG, S.K.M. and C.J. BUTZ (2000) A Bayesian
2003, Sundial Beach Resort, Sanibel Island, FL. approach to user profiling in information retrieval,
MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE Technology Letters, 4 (1), 50–56.
ROURE (2004) Ontological user profiling in recom-
mender systems, ACM Transactions on Information
Systems, 22 (1), 54–88.
MUSSI, S. (2003) Providing websites with capabilities of
one-to-one marketing, Expert Systems, 20 (1), 8–19.
The author
NASRAOUI, O., R. KRISHNAPURAM, A. JOSHI and T.
KAMDAR (2002) Automatic Web user profiling and Silvano Mussi
personalization using robust fuzzy relational clus-
tering, in e-Commerce and Intelligent Methods, J. Silvano Mussi graduated in physics in 1975 from
Segovia, P. Szczepaniak and M. Niedzwiedzinski the University of Milan, Italy. He worked at
(eds), Studies in Fuzziness and Soft Computing,
ITALTEL for ten years in the fields of software
Berlin: Springer.
NOKELAINEN, P., H. TIRRI, M. MIETTINEN and T. engineering and functional discrete simulations
SILANDER (2002) Optimizing and profiling users of real-time systems. He has been with CILEA
online with Bayesian probabilistic modeling, in since 1981. He has cooperated in research acti-
Proceedings of the NL 2002 Conference. vities with Milan Polytechnic and Brescia Uni-
PEARL, J. (1988) Probabilistic Reasoning in Intelligent
versity where for three academic years he
Systems, San Mateo, CA: Morgan Kaufmann.
ROUSSEAU, B., P. BROWNE, P. MALONE and M. was contract-professor of artificial intelli-
O’FOGHLù (2004) User profiling for content perso- gence. For over ten years he has been doing
nalization in information retrieval, in Proceedings – research in the fields of knowledge engineer-
19th ACM Symposium on Applied Computing, SAC ing and expert systems. His current research
(Nicosia Cyprus). interests address methods for providing web-
SCHIAFFINO, S.N. and A. AMANDI (2000) User profil-
ing with case-based reasoning and Bayesian net- sites with capabilities of decision-making and
works, in Proceedings – International Joint Confer- reasoning under conditions pervaded with un-
ence IBERAMIA-SBIA, 12–21. certainty.

38 Expert Systems, February 2006, Vol. 23, No. 1 


c 2006 The Authors. Journal Compilation 
c 2006 Blackwell Publishing Ltd.

You might also like