You are on page 1of 7

Collaborative filtering recommendation with threshold

value of the equipotential plane in implication field


Hoang Tan Nguyen Hung Huu Huynh Hiep Xuan Huynh
Department of Information and University of Science and Technology Can Tho University
Communications of Dong Thap 54, Nguyen Luong Bang street, Lien 3/2 Street, Ninh Kieu District, Can
12 Tran Phu street, ward 1, Cao Lanh Chieu District, Da Nang City, Viet Nam Tho City, Viet Nam
City, Dong Thap, Viet Nam +84905444669 +84985796067
+84913794800 hxhiep@ctu.edu.vn
hhhung@dut.udn.vn
hoangntdt@gmail.com

ABSTRACT similarity to the recommendation system, such as asymmetric


Collaborative filtering is one of the most popular and effective similarity for collaborative filtering via matrix factorization
techniques available today in the recommender system. However, [3][4][6]. Recommendation with asymmetric user influence and
most of them use symmetric similarity measures. Therefore, the global importance value 0, to address the asymmetric effects of
default effect and the role of the pair of users are the same, but in users in the recommendation system. Another new trend is the use
practice this may not be true. In addition, they only logically of statistical implication analysis in the recommendation system,
demonstrate the existence of a priority relationship between two which addresses the problem of asymmetric user influence and
users rather than the level of the relationship in practice. In this solves the problem of assessing the occurrence or functional
paper, we propose a new approach in the development of a relationship. Interaction between users and data items in practice,
collaborative filtering technique based on the variation analysis of such as the recommender system model based on approach to
the implication index, an asymmetric measure, in the implication association rules combined implicative measure [15]. To overcome
field to rank and filter information based on the variance of the the disadvantage of traditional recommender systems is to focus on
implication index by a counter-example, which provides a the logic that demonstrates the existence or absence of a priority
meaningful recommendation with a certain level of implication to relationship between the user and the data item or product. In this
address these issues of traditional recommender systems. model, the authors are particularly interested in the ratio or
implicative relationship between the user and the data item in a
CCS Concepts particular context in order to make recommendations to the user
more effective. Another study in the application of statistical
• Information systems➝Database management system
implication analysis to the recommender system was the user-based
engines • Computing methodologies➝Massively parallel and collaborative filtering recommender system using association rules
high-performance simulations. This is just an example, please use combined implication cohesion measure [16] to calculate the
the correct category and subject descriptors for your submission. similarity for each pair of users in collaborative filtering. Recently,
The ACM Computing Classification Scheme: [5] was proposed recommendation based on the variance of
implication index in implication field to user.
http://www.acm.org/about/class/class/2012. Please read the HOW
TO CLASSIFY WORKS USING ACM'S COMPUTING In this paper we also use statistical implication analysis to propose
CLASSIFICATION SYSTEM for instructions on how to classify a new approach to collaborative filtering based on threshold value
your document using the 2012 ACM Computing Classification of the equivalence plane in the implication field [8] to solve the
System and insert the index terms into your Microsoft Word source problem of asymmetric user influence and the implication
file. relationship between the users in the recommender systems.

Keywords The paper is organized in five parts, the first one introduces the
Implication index; implication field; collaborative filtering; context and issues to be solved by the present system as well as
implication threshold; equipotential plane. proposing the approach to solving, and the second part presents the
related contents. To the statistical implication analysis and the
1. INTRODUCTION extended studies in the implication field, the third part presents the
Because of the rapid increase of data in era of information model of the recommender system based on the variance of the
explosion today, recommender systems [1][2] becomes a tool that implied index in the implication field, the next part is the
is extremely necessary and widely used more in electronic trade, experimental section model with scripts and finally conclusions.
services such as Amazon, Pandora, Netflix, etc. The objective of
the recommender systems is to filter useful information from a 2. IMPLICATION STATISTICAL FIELD
large number of information so that it is predictable that user will 2.1 Implication statistical analysis theory
rate for an item and thereby recommendation items (products,
services, etc.) suitable for the user. Algorithms for the 2.1.1 Implication statistical analysis
recommender system have attracted the attention of the researchers Statistical implication analysis (SIA) theory [9] [11] [13] [14],
for practical application. Among they are, the collaborative filter proposed by Regis Gras, studies the implication relationship of data
algorithms [17] are the most widely used and successful technique. variables. Measures in the analysis implicative statistical us
Most of these algorithms are based on the measure of symmetry for implication index (aka Gras implication index) and implication
filtering information and recommendations for users. Recently, intensity, are used to detect the rule or R-rule (rule of the rule)
several solutions have been proposed that use asymmetric strong implicative relationship between the two sides of the rule, or
to measure the correlation between two variables (individual, From (3), the differential of the function 𝑞 appears as a scalar
attribute ...), these measures are asymmetric. In addition, statistical product between gradient q and the increase of 𝑞 on the surface
implication analysis focuses on counter example factor analysis. It representing the variables of the function 𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). 𝑔𝑟𝑎𝑑 𝑞
can be presented as follows: denotes the variability of the function of four variables, which is
Let 𝐸 be a finite set of binary variables, A and B are two subsets of the cardinalities of the sets 𝐸, 𝐴, 𝐵, and 𝐴 ∩ 𝐵̅, which points to the
𝐸 , respectively, which contain the elements 𝑎 ∈ 𝐴 such that direction of the function 𝑞 in four dimensions space. In fact, the
𝐴(𝑎) = 𝑡𝑟𝑢𝑒 and 𝑏 ∈ 𝐵, such that 𝐵(𝑏) = 𝑡𝑟𝑢𝑒, sets 𝐴̅ , 𝐵̅ is the value of this differential lies in the estimation of the increase
complement of sets 𝐴 and 𝐵 respectively, let 𝑛𝑎 = 𝑐𝑎𝑟𝑑(𝐴), 𝑛𝑏 = (positive or negative) of q that we note 𝛥𝑞 relative to the respective
𝑐𝑎𝑟𝑑(𝐵) is the cardinality of 𝐴 and 𝐵 respectively, 𝑛𝑎̅ = variations ∆𝑛, ∆𝑛𝑎 , ∆𝑛𝑏 , and ∆𝑛𝑎𝑏̅ . So we have:
𝑐𝑎𝑟𝑑( 𝐴̅), 𝑛𝑏̅ = 𝑐𝑎𝑟𝑑( 𝐵̅) is the cardinality of the set 𝐴̅ and the set ∆𝑞 =
𝜕𝑞
∆𝑛 +
𝜕𝑞
∆𝑛𝑎 +
𝜕𝑞
∆𝑛𝑏 +
𝜕𝑞
∆𝑛𝑎𝑏̅ + 𝑜(∆𝑞) (4)
𝜕𝑛 𝜕𝑛𝑎 𝜕𝑛𝑏 𝜕𝑛𝑎^𝑏
𝐵̅ and 𝑛𝑎𝑏̅ = 𝑐𝑎𝑟𝑑(𝐴 ∩ 𝐵̅) is the cardinality of the set 𝐴 ∩ 𝐵̅, that
̅

is a set containing the elements that satisfy the properties 𝑎 = 𝑡𝑟𝑢𝑒 with 𝑜(𝑞) is an infinitely small.
and 𝑏 = 𝑓𝑎𝑙𝑠𝑒 , 𝑛𝑎𝑏̅ also called counter-example, and also Now, to further examine the relationship between the implication
randomly and independently selects subsets of X and Y same index and implication intensity. Take the primitive of the equation
cardinality with 𝐴, 𝐵 respectively, meaning 𝑐𝑎𝑟𝑑(𝑋) = 𝑛𝑎 and (1), we have:
𝑐𝑎𝑟𝑑(𝑌) = 𝑛𝑏 . Let 𝑋̅ and 𝑌̅ respectively be the complement of 𝑋 −q2
dφ 1
and 𝑌 in 𝐸 and have corresponding cadinality as 𝑛𝑎̅ = 𝑛 − 𝑛𝑎 =- e 2 <0 (5)
dq √2π
𝑛𝑏̅ = 𝑛 − 𝑛𝑏 .
This confirms that the implication intensity increases as 𝑞
The implication relationship between 𝐴 and 𝐵 is modeled in the decreases, but the rate of increase is determined by formula (5),
statistical implication analysis as follows (see Figure 1). which allows for a more rigorous study of the variability of 𝜑.

2.1.3 Implication Field


2.1.3.1 Implication statistical field
Consider the implication index 𝑞(𝑎, 𝑏̅) in the four-dimensional
space 𝐸, with the point 𝑀 whose coordinates are the parameters
Figure 1. The illustration of the components of statistical analysis associated with the binary variables a and b are (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ),
implicated by Venn diagrams then 𝑞(𝑎, 𝑏̅) is a scalar field by applying the mapping from space
Implication intensity measure 𝜑(𝑎, 𝑏) of rule 𝐴 → 𝐵 is defined by 𝑅 4 to space 𝑅. For the vector 𝑔𝑟𝑎𝑑. 𝑞 contains the partial
[11][13]: derivatives of 𝑞 for the variables 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ is a special gradient
field is called implication field, because it satisfies the Schwartz
𝜑(𝑎, 𝑏) = 1 − Pr( 𝑄(𝑎, 𝑏̅) ≤ 𝑞(𝑎, 𝑏̅))
criteria for the mixed differential, that is, The mixed derivative
𝑛𝑎𝑏
̅ ∞
𝜆𝑠 −𝜆 1 𝑡2 event of each pair of variables [8], is:
1−∑ 𝑒 = ∫ 𝑒 − 2 𝑑𝑡 , với 𝑛𝑦 < 𝑛 (1)
= 𝑠! √2𝜋
δ
(
δq
)=
δ
(
δq
) (6)
𝑠=0 ̅) 𝑞(𝑎,𝑏 δna∧b
̅ δnb δnb δna∧b̅
{ 0, other wise
Similar to each other pairs in the variables (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) ,
where 𝑞(𝑋, 𝑌̅) is implication index and is definded by:
g𝑟𝑎𝑑 𝑞 is considered to be the potential of 𝑞. Vector grad q is
𝒏𝒂 𝒏̅
𝒃
̅) = 𝒏𝒂𝒃
̅− 𝒏 𝒏𝒂 𝒏𝒃̅ performed to change the space of the confidentiality of the case, it's
𝒒(𝒂, 𝒃 𝒏𝒂𝒏𝒃
̅ and 𝝀= , (2) (3)
sort of the low value to a higher value. At each point of the gradient,

𝒏 𝒏
we observe an increase in the implied density of space and to what
In terms of approximation (e.g. 𝜆 ≥ 4 ), 𝑞(𝑎, 𝑏̅) is the
extent the rate at which it changes under the influence of one or
approximation of the normal distribution 𝑁(0,1).
more parameters.
The implication rule that X→Y is admissible at the confidence
level 𝛼 if and only if 𝜑(𝑋, 𝑌) ≥ 1 − 𝛼. [11][13]. 2.1.3.2 Implication index equipotential plane
Consider the implication index as a function of four
2.1.2 Implication index variation variables𝑞(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎^𝑏̅ ). A line or plane of equipotential in field
Let consider small variations in the neighborhood of all four 𝐶 is curved in 𝐸, an ordered 4-dimensionals space along which or
observed values of variables: 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ . These variables must at which point a variable 𝑀 maintains the same value of potential
be considered as real numbers and q as a continuously of 𝑞. The equation of this curve is shown in [8]:
differentiable function with respect to these variables constrained 𝑛𝑎 𝑛𝑏̅
𝑛𝑎𝑏̅ −
to respect inequalities: 0 ≤ 𝑛𝑎 ≤ 𝑛𝑏 ; 𝑛𝑎^𝑏̅ ≤ inf{𝑛𝑎 , 𝑛𝑏 } and 𝑞(𝑎, 𝑏̅) − 𝑛 =0 (7)
sup{𝑛𝑎 , 𝑛𝑏 } ≤ 𝑛. The differential of q in Frechet’s geometry is 𝑛 𝑛̅
√ 𝑎 𝑏
expressed in the following way: 𝑛
𝜕𝑞 𝜕𝑞 𝜕𝑞 𝜕𝑞 Next, on that curve, the scalar product of grad q and 𝑑𝑀 are 0 (in
𝑑𝑞 = 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 + 𝑑𝑛 ̅ = 𝑔𝑟𝑎𝑑𝑞. 𝑑𝑀 (3) (3) and (7)). This is understood as the orthogonal of(4)a gradient
𝜕𝑛 𝜕𝑛𝑎 𝑎 𝜕𝑛𝑏 𝑏 𝜕𝑛𝑎^𝑏̅ 𝑎^𝑏
tangent or a hyperplastic tangent to that curve, that is, to the
with 𝑀 the point with the coordinates (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) belong to equipotential plane.
the scalar vector field 𝐶, 𝑑𝑀 is the differential component vector of To illustrate, considering the relationship from a potential F
the instance variables and grad q is the partial differential vector of depends only on two variables, figure 2 shows, for example, the
the variables. orthogonal orientation of the gradient to the difference of
isomorphism, along the constant F but changes from F7 to F10 [8].
𝒚 Grad F in M’ 3.2 Threshold implication variability between
Grad F in M
equipotential planes
As discussed in the previous section, statistical implication
M’ analysis focuses on counter example factor to analysis.
M
’ F7
It is difficult to replace an original rule by another rule when a few
F8
F9 counter examples (unlikelihood) appear, only when the counter
F10 example higher the confidence of the rule decreases and the rule
𝒙 can be denied. However, when the number of example (likelihood)
is numerous and the number of counter examples is rarer, the rule
Figure 2. The illustrating potentials relationship depends only on 2 becomes stronger and is recognized. For example, let's look at the
variables. rules that are acceptable. “Ferrari cars are red." Even if one or two
In this case, the potential q forms the equipotential plane (shown in of the counter examples appear (Ferrari cars are not red), this rule
Figure 2 for easy representation). is maintained, and it will be even confirmed once again by the
We can understand that this case stronger for strict plane and release of new examples. Thus, contrary to mathematics, where
weakening in the more sparse. To get a value q in this case, fixed 3 rules are not allowed to have any exceptions, the rules here
considered are still acceptable when the number of counter-
variables, such as 𝑛 , 𝑛𝑎 , 𝑛𝑏 and q values compatible with the
examples remains in the "acceptable" threshold, because in these
constraints of field.
situations rules are still active and effective. In data analysis, the
3. RECOMMENDATION BASED ON problem is to define a consensus standard, thereby quantifying the
confidence threshold of the rule according to user requirements.
EQUIPOTENTIAL PLANE IN
In this section, we propose a recommendation based on the
IMPLICATION FIELD variation of the implication index depending on the variation of the
3.1 Implication statistical rules counter example in the implication field for determining the
Let dataset 𝐷 is a database that consists of 𝑇 transactions, each equipotential plane of implication index set, from there, the item
transaction 𝑇𝑖 consists of objects or items that are objects that (or k-top items list) consultant is suitable for the user with a definite
appear in transactions such as (products, services, ...). The itemset implication threshold. The threshold θ is the tolerance value of q in
𝐼 is a set consisting of m items. The frequency of occurrence of a the same equipotential plane, θ is defined by byFactor. To
data item in the database is denoted δ. The support of the set 𝑋, determine θ, it is necessary to consider how the dependent variable
denoted by 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑥) , is the percentage of transactions that 𝑞(𝑎, 𝑏) varies when an element 𝑥 is added to (or removed from) the
contain 𝑋 in database 𝐷.The association rule is a rule of the form sample data, with four occurrences.
𝑋 → 𝑌, where 𝑋, 𝑌 ⊂ 𝐼 are itemsets, 𝑋 is called premise, 𝑌 is the To more specific, the partial derivative according to 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ :
consequence. Association rules is usually evaluated by two metric
𝜕𝑞 1 𝑛𝑎 𝑛𝑏̅
is the support (support - S) and reliability (confidence - 𝐶). The = (𝑛𝑎⋀𝑏̅ + )
support of the rule 𝑋 → 𝑌 denoted by sup (𝑋 → 𝑌) is the ratio of 𝜕𝑛 2√𝑛 𝑛 (8a)
transactions including X and Y to total transactions. 𝑠𝑢𝑝(𝑋 →
𝛿(𝑋∪𝑌 )
𝑌) = The confidence of the rule 𝑋 → 𝑌, symbol: conf 𝜕𝑞 1 𝑛𝑎𝑏̅ 𝑛 2 1 𝑛𝑏̅
3
(8b)
|𝑇|
=− ( ) − √ (5d)
(𝑋 → 𝑌) is the probability that a transaction contains 𝑋 will 𝜕𝑛𝑎 2 𝑛𝑏̅ 𝑛𝑎 2 𝑛𝑎
𝛿(𝑋∪𝑌 ) 𝑠𝑢𝑝(𝑋→𝑌 ) √
contain 𝑌 defined by: 𝑐𝑜𝑛𝑓(𝑋 → 𝑌 ) = = = 𝑛
𝛿(𝑋) 𝑠𝑢𝑝(𝑥) 1
𝑃(𝑌|𝑋). 𝜕𝑞 1 𝑛𝑎 2 3 1 𝑛𝑎 1 1
= 𝑛𝑎𝑏̅ ( ) (𝑛 − 𝑛𝑏 )2 + ( )2 (𝑛 − 𝑛𝑏 )2 (8c)
𝜕𝑛𝑏 2 𝑛 2 𝑛
The rule in statistical implication analysis is as follows: Assume
𝐴 ⊂ 𝐼 that is the set of items rated by the user 𝑢𝑎 ; 𝐴̅ is the 𝜕𝑞 1 1
= =
complement of A. The set ⊂ 𝐼 is the set of items rated by the user 𝜕𝑛𝑎^𝑏̅ 𝑛 𝑛̅ 𝑛 𝑛 − 𝑛𝑏 ) (8d)
√ 𝑎 𝑏 √ 𝑎(
𝑢𝑏 ; 𝐵̅ is the complement of B; 𝑛𝑎 = 𝑐𝑎𝑟𝑑(𝐴) is the number of 𝑛 𝑛
data items rated by the user 𝑢𝑎 , which is the number of elements of From (8d), if 𝑛𝑎𝑏̅ increases, the implication index increased, and
the set A); 𝑛𝑏 = 𝑐𝑎𝑟𝑑(𝐵) is the number of data items rated by thus the intensity implies decreased.
user 𝑢𝑏 (number of elements of set B); 𝑛𝑎𝑏̅ = 𝑐𝑎𝑟𝑑(𝐴 ∩ 𝐵̅) is the
Let 𝜆1 , 𝜆2 and 𝑞1 , 𝑞2 , corresponding to 𝜆, 𝑞 are related to the
number of data items rated by the user 𝑢𝑎 but not rated by the user
original data sample and the extended data sample as Table 1A
𝑢𝑏 . In addition to the usual measures, in the above paragraph, the
(Value ± 1 corresponds to 1 in the additional case and -1 when
specific measure of the implicative rules is implication index, this
measure expressed the degree to implication which association removing the 𝑥 in the dataset) and variability ∆𝑞 = 𝑞2 − 𝑞1 as
rules are not, an implication rule expressed by a set of four variables Table 1B
(𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). They are called the cardinalities of the implication TABLE 1A- TABLE OF VARIABILITY OF 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ , 𝜆
rules. In other words, the relationship between the user 𝑢𝑎 and 𝑢𝑏 a b ∆𝑛𝑎 ∆𝑛𝑏̅ ∆𝑛𝑎𝑏̅ 𝜆1 𝜆2 ∆𝜆
is the relationship between the item set 𝐴 is liked by the user 𝑢𝑎 (𝑖) 0 0 0 ±1 0 𝑛𝑎 (𝑛𝑏̅ ± 1) >0
and the item set 𝐵 is liked by the user 𝑢𝑏 represented by the set of 𝑛±1
four elements (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ). 𝑛𝑎 𝑛𝑏̅ 𝑛𝑎 𝑛𝑏̅
(𝑖𝑖) 0 ±1 0 0 0 <0
𝑛 𝑛±1
(𝑖𝑖𝑖) ±1 0 ±1 ±1 ±1 (𝑛𝑎 ± 1)(𝑛𝑏̅ ± 1)>0
𝑛±1
(𝑖𝑣) ±1 ±1 ±1 0 0 (𝑛𝑎 ± 1)𝑛𝑏̅ >0 u, the new item 𝑖 ∈ Ι\Ι𝑢 which user u most likely have interest in.
𝑛±1 When ratings are available, this task is often defined as a regression
or classification problem (multilayer) whose purpose is to
understand a function:
TABLE 1B- TABLE OF VARIABILITY OF 𝑞
𝑓: 𝑈 × 𝐼 → 𝑆
∆𝑞
which predicts the rating of 𝑓(𝑢, 𝑖) of a user 𝑢 for a new item𝑖.
(𝑖) 𝑛𝑎 (𝑛𝑏̅ + 1) 𝑛𝑎 𝑛𝑏̅ (a)
𝑛𝑎𝑏̅ −
𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛 This function is used to recommendation the active user 𝑢𝑎 to an
𝑛 𝑛̅ item 𝑖 ∗ that rating the highest estimated value [1][2].
√𝑛𝑎 (𝑛𝑏̅ + 1) √ 𝑎 𝑏
𝑛+1 𝑛 𝑖 ∗ = 𝑎𝑟𝑔 max 𝑓(𝑢𝑎 , 𝑗) (9)
𝑗∈𝐼\𝐼𝑢
(𝑖𝑖) 𝑛𝑎 𝑛𝑏̅ 𝑛𝑎 𝑛𝑏̅ (b)
𝑛𝑎𝑏̅ −
𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛 Based on the results of studies on the above implication field,
𝑛 𝑛̅ 𝑛 𝑛̅ recommendation algorithms proposed as follows:
√ 𝑎 𝑏 √ 𝑎 𝑏
𝑛+1 𝑛
Algorithm 1. IRG (Implication Rules Generator)
(𝑖𝑖𝑖) (𝑛𝑎 + 1)(𝑛𝑏̅ + 1) 𝑛 𝑛̅ (c)
(𝑛𝑎𝑏̅ + 1) −
𝑛+1 𝑛𝑎𝑏̅ − 𝑎 𝑏 Input: set of transactions
− 𝑛
(𝑛𝑎 + 1)(𝑛𝑏̅ + 1) 𝑛𝑎 𝑛𝑏̅ Output: implicative rule set and their cardinality (𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ).
√ √
𝑛+1 𝑛
(𝑛𝑎 + 1)𝑛𝑏̅
Step 1: generate rules set from set of transactions by using data
(𝑖𝑣) 𝑛𝑎 𝑛𝑏̅ (d)
𝑛𝑎𝑏̅ −
𝑛 + 1 − 𝑛𝑎𝑏̅ − 𝑛 mining algorithms (such as apriori).
𝑛 𝑛̅
√(𝑛𝑎 + 1)𝑛𝑏̅ √ 𝑎 𝑏 Step 2: Calculating cardinalities of implication rules, Details are as
𝑛+1 𝑛 follows: Count number of transactions n. Generating two binary
To determine the variation threshold of implication index θ on the (True/False) matrixes 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠, 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠, with true value if item
𝜕𝑞 Δ𝑞 j belong to left hand side for 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠 (respectively, right hand
equipotential planes, let and respectively partial derivatives
side for 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠)
𝜕𝜉 Δ𝜉
and increment of q according to ξ, where 𝜉 ∈ {𝑛, 𝑎, 𝑏, 𝑎𝑏̅}. A For each rule[𝑖]:
variation of 𝑞 from the addition (or eliminate) of an individual on
𝑙ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑙ℎ𝑠𝑅𝑢𝑙𝑒𝑠 × (𝑑𝑎𝑡𝑎)𝑇
1
the dataset can change the number of k implication rules based on
Δ𝑞
the dataset, this leads to an increase in𝜃 = 𝑘 , it mean: 𝑛𝑎[𝑖] = 𝑟𝑜𝑤𝑆𝑢𝑚(𝑙ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡[𝑖])
Δ𝜉

𝜕𝑞 Δ𝑞 (10) 𝑟ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡 = 𝑟ℎ𝑠𝑅𝑢𝑙𝑒𝑠 × (𝑑𝑎𝑡𝑎)𝑇


=𝑘 + 𝑜(𝑞) 𝑛𝑏[𝑖] = 𝑟𝑜𝑤𝑆𝑢𝑚(𝑟ℎ𝑠𝑃𝑟𝑜𝑑𝑢𝑐𝑡[𝑖])
𝜕𝜉 Δ𝜉
𝜕𝑞 Δ𝑞 𝑛𝑎𝑏 [𝑖]: The calculation is the same as 𝑛𝑎[𝑖], 𝑛𝑏[𝑖]
where 𝑜(𝑞) is an infinitely small. , are definded with but on both the left and right sides.
𝜕𝜉 Δ𝜉
formulas from (8a) to (8d) and (a) to (d) of table 1B.Threshold 𝑛𝑎𝑏̅ [𝑖] = 𝑛𝑎 [𝑖] − 𝑛𝑎𝑏 [𝑖]
Δ𝑞
θ is defined as 𝑘 Δ𝜉 from (10). Step 3: Return (𝑛[𝑖], 𝑛𝑎 [𝑖], 𝑛𝑏 [𝑖], 𝑛𝑎𝑏̅ [𝑖])
Algorithm 2. RBEP (Recommendation by Equipotential Plane of
3.3 Recommendation based on variation implication index)
implication index with threshold value of Input: dataset, threshold θ, variant factor byFactor
equipotential plane Output: recommendation: item/ top k item list
To provide a formal definition of the recommendation task, it is
Step 1: call IRG(dataset) for generating rules set and calculating
necessary to introduce some concepts of the consulting system.
𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ .
Accordingly, the set of users in the system will be denoted by U,
and the set of items is I. In addition, the set of ratings in the system Step 2: With each 𝑟𝑢𝑙𝑒(𝑖) , calculate implication index 𝑞(𝑖)
is denoted by R, and the set of possible scales for a rating is 𝒮 (eg, according to formula (2). After that, calculate partial derivatives of
𝒮 = [1, 5] or 𝒮 = {𝑙𝑖𝑘𝑒, 𝑑𝑖𝑠𝑙𝑖𝑘𝑒}). Also, assume that no more than 𝑞(𝑖) follow byFactor according to formula (5).
one rating can be performed by any user 𝑢 ∈ 𝑈 for a particular item Step 3: Determine the set of recSet containing q on the same
𝑖 ∈ Ι and written to 𝑟𝑢𝑖 ∈ 𝑅 for this rating. To determine the equipotential plane follows byFactor: (|∆𝑞(𝑎, 𝑏̅)| ≤ 𝜃), is defined
subsets of users 𝑢 have rated an item 𝑖, the symbols 𝑈𝑖 are used. by equation (10).
Likewise, Ι𝑢 represents the subset of the line items evaluated by a
user. Finally, the items were rated by two people 𝑢 and 𝑣, that Step 4: return recommended: item or k items from recSet set.
is, Ι𝑢 ⋂Ι𝑣 , which is an important concept in the presentation of the The algorithms described above serves as the basis for a
article, and Ι𝑢𝑣 is used to denote this concept, ie Ι𝑢𝑣 = Ι𝑢 ⋂Ι𝑣 . In a recommendation model for a statistical-based recommendation
similar representation, 𝑈𝑖𝑗 is used to denote the set of users that system RSIF (Recommender System based on Implication Field)
have rated both items 𝑖 and 𝑗, i.e. 𝑈𝑖𝑗 = U𝑖 ⋂U𝑗 as figuge 3:
One of the most important issues related to the advisory system is
the best recommendation and recommended items list n best items
for the user. This issue included in the search, for a particular user

1 The transition matrix of data.


6 4 -7.45843 15 4 -4.8574 24 2 -2.36298
4. EXPERIMENT
7 4 -7.26654 16 2 -4.59922 25 2 -2.16484
4.1 Dataset
With the system model suggested above, we conducted 8 3 -6.97816 17 4 -4.27338 26 3 -1.78044
experiments on a collection of MovieLens data collected by 9 1 -6.75977 18 2 -3.9652 27 5 -1.3093
studying GroupLens from the MovieLens site, of which TABLE 3. IMPLICATION RULES AND IMPLICATIONS IN
approximately 100,000 are from around 1682 films. Taken by 943 EQUIPOTENTIAL PLANE NO. 1
users the ratings range from 1-5 corresponding to the films rated
from the lowest to the highest. The data set is preprocessed to serve Implication
No Description of the rule index
the experiment to be more accurate, by:
{Star Wars (1977),Empire Strikes Back, The (1980)}
- Standardization of data: Users who rank high (or low) for 138 => {Raiders of the Lost Ark (1981)} -9.149508
all their films depending on the individual can lead to {Star Wars (1977),Raiders of the Lost Ark
bias. Eliminate this effect by normalizing the data so that (1981),Return of the Jedi (1983)} => {Empire
the average rating of each user is the same scale. 90 Strikes Back, The (1980)} -9.053185

- Selecting relevant data: Ignoring data can lead to bias and {Star Wars (1977),Empire Strikes Back, The
(1980),Return of the Jedi (1983)} => {Raiders of the
also to speed up computation, by not interested in the 226 Lost Ark (1981)} -9.000696
film has had only a few times, because, because The {Empire Strikes Back, The (1980),Return of the Jedi
ratings of these films may be subject to bias due to lack 86 (1983)} => {Raiders of the Lost Ark (1981)} -8.970471
of data, and users rated only a few films because their TABLE 4. ERROR INDEXES OF ISF MODELS WITH
ratings may be biased. IBCF MODELS AND UBCF
On the dataset has been preprocessed so and to avoid overfitting
problems, as well as to get better accuracy we conducted Model RMSE MSE MAE
experiments in k-fold cross validation mode, rather For Splitting ISF 0.9434059 0.8900147 0.7419290
and Boostrapping method. IBCFcosine 1.2372211 1.5307160 0.9264473

4.2 Experimental tools UBCFcosine 0.9857491 0.9717012 0.7785217


Experiments were conducted based on the implicativefield toolkits IBCFPearson 1.2204847 1.4895830 0.9094559
developed by us based on the R language, Include statistical UBCFPearson 0.9987563 0.9975141 0.7919161
analysis tools for the recommender system based on the variance of
the implication index in the statistical implication field. Trends variability implication factorial byFactor, here, elements
4.3 Scenario 1 𝑛𝑎𝑏̅ have a major role in strengthening or reject a rule (in theory
(The recommendation is based on the variance implication index implicative statistics mentioned in the previous paragraphs), this
by the counter-example in the implication field) factor increases have increased the implication index value, is
The experimental results of model recommender system based on synonymous with strength reduction implication intensity, however
variability implication index in the implication statistical field in not significant reduction, so set the rule on equality of treatment
the dataset described in the preceding paragraph, the collective rule implication remains previous level.
was born (with conditional support = 0.4 and confident = 0.4. A The density of the implication field an unequal distribution, the
total of 119 rules, after eliminating meaningless rule (the left side high implicative density in the equipotential plane has a slightly
of the rule by nil), and satisfying the implied magnitude greater than more variable indicator value and is more concentrated than the 5,
0.5, the remaining 84 rules, With the threshold θ = 0.337565 we 11, 12 22, 23 and 27. The density of the implication field the least
obtain the updated values of the implication index q in terms of the and the minimum of such aspects as the equipotential plane 3, 9,
variation of any element of ( 𝑛, 𝑛𝑎 , 𝑛𝑏 , 𝑛𝑎𝑏̅ ) , in this scenario 10, 19 and 20, as shown in Table 3. This shows the suitability of
byFactor= 𝑛𝑎^𝑏̅ and collected 25 set of equipotential planes 3 the rule. With the variation of the implied index, where the
dimensions (𝑛, 𝑛𝑎 , 𝑛𝑏 ), These hyperplanes have the potentials of implication index a certain amount of variability, where the rule is
the implication index of unevenness, listed in Table 3. The not accepted at a specified impedance threshold, it will move to
equipotential plane of the equation is composed of the rules set another equipotential plane whose implication threshold more
number 1 {21, 35, 16, 17}, the rules set number 2 {38, 29, 15}, etc., appropriate. And so, it will help to recommendation users of the
and the rules set number 27 {150, 128, 125, 135, 211}. The sets on item with the most appropriate level of implication.
each hyperplane have implication index values that are the same A target user will be recommended a movie or a list of films that
with an approximation of θ as Table 2. he or she would like to follow corresponding content based on
TABLE 2- THE INTENSITY OF IMPLICATION FIELD ON previous movies they viewed, as shown in table 3. It is possible to
EQUIPOTENTIAL PLANES AND THEIR IMPLICATIONS BY recommendation movie “Raiders of the Lost Ark (1981)” for users
byFactor=𝑛𝑎𝑏̅ who have seen movies “Empire Strikes Back, The (1980),Return of
Eq. quantity 𝑞 Eq. quantity 𝑞 Eq. quantity 𝑞 the Jedi (1983)”, (rule No.86).
plane of rule plane of rule plane of rule
1 4 -9.0434710 1 -6.73528 19 1 -3.83082 4.4 Scenario 2
2 3 -8.80657 11 5 -5.97182 20 1 -3.5129 (Comparison with user-based collaborative filtering)
3 1 -8.69112 12 5 -5.70381 21 2 -3.13998 To compare the accuracy of the proposed model with user-based
4 2 -8.1998 13 4 -5.43779 22 5 -2.80447 collaborative filter models (UBCF) using the Cosine and Pearson
measures, the experiment in this scenario is also carried out with
5 5 -7.75697 14 4 -5.1421 23 5 -2.58721
the results recorded in the figure. 4 and figure.5, the model has
better results ISF model UBCF use metrics Pearson but inferior
model Cosine UBCF use metrics, indicators of low error more ISF [3] Bin Cao, Qiang Yang, Jian-Tao Sun, Zheng Chen, Learning
then UBCF table 4. bidirectional asymmetric similarity for collaborative filtering via
matrix factorization, Data Mining and Knowledge Discovery, Volume
22, Issue 3, pp.393–418, 2011.
[4] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh,
Recommender system based on analysis Implicative statistical user
preferences over time, IX International Conference A.S.I. Analyse
Statistique Implicative – Statistical Implicative Analysis (ASI9),
Franch, 2017 (in Vietnamese) (Accepted).
[5] Hoang Tan Nguyen, Hung Huu Huynh, Hiep Xuan Huynh,
Recommendation based on the variance of implication index in
statistical implication field, Proceedings of the X National Conference
Figure 4. The ROC curve Figure 5. Precision and Recall on Fundamental and Applied IT Research (FAIR’17); Da Nang,
compares the ISF and other UBCF comparison between ISF and other 2017. (in Vietnamese) (Accepted).
modes UBCF modes [6] Mukund Deshpande, George Karypis: Item-based top-N
recommendation algorithms. ACM Transaction on Information
4.5 Scenario 3 Systems 22(1), pp. 143–177, 2004.
(Comparison with item-based collaborative filtering) [7] Rahul Katarya, Om Prakash Verma, Effective collaborative movie
In this scenario, the ISF is compared to the IBCF using the Pearson recommender system using asymmetric user similarity and matrix
factorization, The 2016 IEEE International Conference on Computing,
and Cosine indices over the ROC, Precision-Recall curves, in
Communication and Automation
Figure 6 and Figure 7, the ISF model is better. The Cosine and (ICCCA’16),DOI:10.1109/CCAA.2016.7813692,pp.1-12, 2016.
Pearson IBCF models, the ISF's lower rating than the IBCF in Table [8] Régis Gras, Pascale Kuntz and Nicolas Greffard, Notion de champ
4 implicatif en analysis statistique implicative, The 8th International
Meeting on Statistical Implicative Analysis, Tunisia, pp 1-21, 2015 (in
French).
[9] Régis Gras, Dominique Lahanier-Router, Duality between variables
space and subjects space of the statistic implicative analysis, Dualite
entre espace des variables et espace des sujets en analyse statisticque
implicative, The VI International conference, ASI Analyse statistique
implicative- Implicative statistical Analysis Caen (ASI6), France, pp
1-28, 2012.
[10] Régis Gras, Pascale Kuntz, Discovering R-rules with a directed
hierarchy,Journal Soft Computing - A Fusion of Foundations,
Figure 6. The ROC curve compares Figure 7. Precision and Recall Methodologies and Applications (Volume 10 Issue 5), Springer-
the ISF and other IBCF modes comparison between ISF and Verlag, pp 453-460, 2006.
other IBCF modes [11] Régis Gras, Einoshin Suzuki Fabrice Guillet, Filippo Spagnolo (Eds.),
Statistical Implicative Analysis, Theory and Application, Springer
Verlag Berlin Heidelberg, 2008.
5. CONCLUSIONS
[12] Regis Gras, Pascale Kuntz. and Briand H., “Les fondements de
Approach to variation in the implication index was applied to l’analyse statistique implicative et quelques prolongements pour la
modeling ISF recommender system. The proposed model was also fouille de données”, The Mathématiques et Sciences Humaines 39,
tested on the Movilens 100K and the implicativefield toolkit for pp.9-29, 2001.
user recommendation and evaluation with workflow models using [13] Regis Gras, Raphael Couturier, Spécificités de l'Analyse Statistique
symmetry similarity measures, the results were mostly good more Implicative (A.S.I.) par rapport à d'autres mesures de qualité de
than those using common symmetry, has a matching match. These règles d'association, Quaderni di Ricerca in Didattica - GRIM (ISSN
contributions are intended to increase the effectiveness of on-line 1592-4424, p.19-57, 2010. (in French).
recommendations (to improve the accuracy of ranking predictions, [14] Dominique Lahanier-Reuter, Didactics of Mathematics and
and to show trends in rules). Implicative Statistical Analysis, Statistical Implicative Analysis -
Studies in Computational Intelligence, pp 277-298, 2008.
6. REFERENCES [15] Nghia Quoc Phan, Ky Minh Nguyen, Hoang Tan Nguyen, Hiep Xuan
[1] Adomavicius Gediminas, Tuzhilin Alexander, Toward the Next Huynh, Recommender system based approach combining law and
Generation of Recommender Systems: A Survey of the State-of-the-Art implies statistical measure, Proceedings of the VIII National
and Possible Extensions, IEEE transactions on Knowledge and Data Conference on Fundamental and Applied IT Research (FAIR’15);
engineering, Vol.17 No.6,pp. 734 - 749 2005. Ha Noi, 2015. (in Vietnamese)
[2] Adomavicius Gediminas, Tuzhilin Alexander, Context-aware [16] Lan Phuong Phan, Trang Uyen Tran, Hung Huu Huynh, Hiep Xuan
recommender systems, Springer US, pp. 217-253, 2011. Huynh, the user-based collaborative filtering recommeder system
using associaion rules combined implication statistical cohension
measure, Proceedings of the IX National Conference on
Fundamental and Applied IT Research (FAIR’16); Cần Thơ, 2016.
(in Vietnamese)
[17] Francesco Ricci, Lior Rokach and Bracha Shapira, Introduction to
Recommender Systems Handbook, Springer-Verlag and Business
Media LLC, pp.1-35, 2011.
[18] Zhi-Lin Zhao Chang-Dong Wang , Jian-Huang Lai AUI&GIV
Recommendation with asymmetric user influence and global
importance value. Public Library of Science ONE, pp.2016.
.
Columns on Last Page Should Be Made As Close As
Possible to Equal Length

Authors’ background
Your Name Title* Research Field Personal website

*This form helps us to understand your paper better, the form itself will not be published.

*Title can be chosen from: master student, Phd candidate, assistant professor, lecture, senior lecture, associate
professor, full professor

You might also like