Classic Models Enforce Independence of Index Terms. For The Vector Model

Generalized Vector Model
Classic models enforce independence of index terms. For the Vector model:
Set of term vectors {k1, k2, ..., kt} are linearly independent and form a basis for the subspace of interest. ki kj = 0
Frequently, this is interpreted as:

i,j
In 1985, Wong, Ziarko, and Wong proposed an interpretation in which the set of terms is linearly independent, but not pairwise orthogonal.
Key Idea:
In
the generalized vector model, two index terms might be non-orthogonal and are represented in terms of smaller components (minterms). As before let,
wij be the weight associated with [ki,dj] {k1, k2, ..., kt} be the set of all terms
If
these weights are all binary, all patterns of occurrence of terms within documents can be represented by the minterms:
m1 = (0,0, ..., 0) m5 = (0,0,1, ..., 0) m2 = (1,0, ..., 0) m3 = (0,1, ..., 0) t m4 = (1,1, ..., 0) m2 = (1,1,1, ..., 1) In here, m2 indicates documents in which solely the term k1 occurs.
Key Idea:
The
basis for the generalized vector model is formed by a set of 2 vectors defined over the set of minterms, as follows: 0 1 2 ...
m1
= (1, 0, 0, ..., 0, 0) m2 = (0, 1, 0, ..., 0, 0) m3 = (0, 0, 1, ..., 0, 0) t m2 = (0, 0, 0, ..., 0, 1)

Notice
i,j
that,
mi mj = 0 i.e., pairwise orthogonal
Key Idea:
Minterm
The
vectors are pairwise orthogonal. But, this does not mean that the index terms are independent:
minterm m4 is given by: m4 = (1, 1, 0, ..., 0, 0) This minterm indicates the occurrence of the terms k1 and k2 within a same document. If such document exists in a collection, we say that the minterm m4 is active and that a dependency between these two terms is induced. The generalized vector model adopts as a basic foundation the notion that cooccurence of terms within documents induces dependencies among them.
Forming the Term Vectors

The
ki
vector associated with the term ki is computed as: ci,r mr

=
r, g i(m r )=1
sqrt( r, g
ci,r
(m r
)=1 ci,r )
dj | l, gl(dj)=gl(mr)
wij
The
weight c i,r associated with the pair [ki,mr] sums up the weights of the term ki in all the documents which have a term occurrence pattern given by mr. Notice that for a collection of size N, only N minterms t affect the ranking (and not 2 ).
Dependency between Index Terms

A degree
of correlation between the terms ki and kj can now be computed as: ki kj = r, g

(m )=1 g (mr )=1 i r
j
c i,r * c j,r
This
degree of correlation sums up (in a weighted form) the dependencies between ki and kj induced by the documents in the collection (represented by the mr minterms).
The Generalized Vector Model: An Example k1

d2 d4 d1 d6 d5 d7 d3
k2
k3
d1 d2 d3 d4 d5 d6 d7 q
k1 2 1 0 2 1 1 0 1
k2 0 0 1 0 2 2 5 2
k3 1 0 3 0 4 0 0 3
Computation of C i,r
wij
d1 d2 d3 d4 d5 d6 d7 q k1 2 1 0 2 1 0 0 1 k2 0 0 1 0 2 2 5 2 k3 1 0 3 0 4 2 0 3
d1 = m6 d2 = m2 d3 = m7 d4 = m2 d5 = m8 d6 = m7 d7 = m3 q = m8
k1 1 1 0 1 1 0 0 1
k2 0 0 1 0 1 1 1 1
k3 1 0 1 0 1 1 0 1
c i,r = dj | l, gl(dj)=gl(mr) wij

m1 m2 m3 m4 m5 m6 m7 m8 c1,r 0 3 0 0 0 2 0 1 c2,r 0 0 5 0 0 0 3 2 c3,r 0 0 0 0 0 1 5 4
Computation of Index Term Vectors

m1 m2 m3 m4 m5 m6 m7 m8 c1,r 0 3 0 0 0 2 0 1 c2,r 0 0 5 0 0 0 3 2 c3,r 0 0 0 0 0 1 5 4
k1
1 (3 m2 + 2 m6 + m8 ) sqrt(32 + 22 + 12 ) k2 = 1 (5 m3 + 3 m7 + 2 m8 ) 2 2 2 sqrt(5 + 3 + 2 ) k3 = 1 (1 m6 + 5 m7 + 4 m8 ) 2 2 2 sqrt(1 + 5 + 4 )
Computation of Document Vectors

d1 d2 d3 d4 d5 d6 d7 q k1 2 1 0 2 1 0 0 1 k2 0 0 1 0 2 2 5 2 k3 1 0 3 0 4 2 0 3
d1 d2 d3 d4 d5 d6 d7 q
= 2 k1 + k3 = k1 = k2 + 3 k3 = 2 k1 = k1 + 2 k2 + 4 k3 = 2 k2 + 2 k3 = 5 k2 = k1 + 2 k2 + 3 k3
Ranking Computation
k1
1 (3 m2 + 2 m6 + m8 ) 2 2 2 sqrt(3 + 2 + 1 ) k2 = 1 (5 m3 + 3 m7 + 2 m8 ) sqrt(52 + 32 + 22 ) k3 = 1 (1 m6 + 5 m7 + 4 m8 ) 2 2 2 sqrt(1 + 5 + 4 )

d1 d2 d3 d4 d5 d6 d7 q
= 2 k1 + k3 = k1 = k2 + 3 k3 = 2 k1 = k1 + 2 k2 + 4 k3 = 2 k2 + 2 k3 = 5 k2 = k1 + 2 k2 + 3 k3
Conclusions
Model considers correlations among index terms Not clear in which situations it is superior to the standard Vector model Computation costs are higher Model does introduce interesting new ideas

Classic Models Enforce Independence of Index Terms. For The Vector Model

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classic Models Enforce Independence of Index Terms. For The Vector Model

Uploaded by

Copyright:

Available Formats

Generalized Vector Model

Frequently, this is interpreted as:

= (1, 0, 0, ..., 0, 0) m2 = (0, 1, 0, ..., 0, 0) m3 = (0, 0, 1, ..., 0, 0) t m2 = (0, 0, 0, ..., 0, 1)

Forming the Term Vectors

vector associated with the term ki is computed as: ci,r mr

Dependency between Index Terms

of correlation between the terms ki and kj can now be computed as: ki kj = r, g

The Generalized Vector Model: An Example k1

c i,r = dj | l, gl(dj)=gl(mr) wij

Computation of Index Term Vectors

1 (3 m2 + 2 m6 + m8 ) sqrt(32 + 22 + 12 ) k2 = 1 (5 m3 + 3 m7 + 2 m8 ) 2 2 2 sqrt(5 + 3 + 2 ) k3 = 1 (1 m6 + 5 m7 + 4 m8 ) 2 2 2 sqrt(1 + 5 + 4 )

Computation of Document Vectors

1 (3 m2 + 2 m6 + m8 ) 2 2 2 sqrt(3 + 2 + 1 ) k2 = 1 (5 m3 + 3 m7 + 2 m8 ) sqrt(52 + 32 + 22 ) k3 = 1 (1 m6 + 5 m7 + 4 m8 ) 2 2 2 sqrt(1 + 5 + 4 )

You might also like