Professional Documents
Culture Documents
0019
Abstract
The unified modelling language (UML) is extensively used for modelling user requirements by
developers and researchers. Use case diagram (UCD) is a prominent UML diagram used for modelling
functional requirements. It is used for depicting the functional requirements of the system and also acts
as a starting point for rest of the diagrams of UML. As the size and complexity of the software being
developed increases, the use case diagram also gets increasingly complicated. The goal of this paper is
to enhance the quality of the use case driven development process. In this paper an approach is
proposed to find key/important use cases in the UCD using two graph ranking algorithms namely Page
Rank and HITS-Hypertext Induced Topic Search. Identification of key use cases guides the software
developer to prioritize the implementation of use cases in UCD. The proposed approach shows reliable
results in accordance with human thinking.
Key Words: Use Case Diagram, Use Case Control Flow Graph, Page Rank and HITS Algorithm
analysis. HITS (Hypertext Induced Topic Search) algo- Baowen in order to measure importance of classes in a
rithm [14] uses the web’s hyperlink structure to find the class diagram [7]. Page Rank is also used to assess the
authority and hub scores for each page. The main objec- impact of concern changes in [8] by identifying concerns
tives of our work are: (i) to determine the structural cohe- at requirements level and deriving concern relationship
sion among the use cases using dependency relationships graph. Rank directed method based on Page Rank cap-
that exist in the UCD and represent them through a con- tures the difference in relationships among classes to
trol flow graph called Use Case Control Flow Graph identify significant classes [9]. An approach to find im-
(UCFG), (ii) to find the important use cases by applying proved coupling metrics by considering weight of me-
Page Rank algorithm and weighted Page Rank algorithm thods based on Page Rank is also proposed by Park et al.
on the UCFG, (iii) to find the important use cases by in [10]. The importance of objects, classes, and methods
computing hub scores by applying HITS algorithm on is determined by applying Page Rank to software arti-
the UCFG, (iv) to compare and analyze of the results ob- facts and their relationship by Perin et al. in [11]. Huang
tained by the application of Page Rank and HITS algo- et al. [12] proposed an algorithm for ranking classes in
rithms. Application of these algorithms helps in identify- class diagram using Page Rank algorithm. Yadav and Khan
ing important use cases thereby prioritizing the underly- [13] proposed a coupling complexity normalization (CCN)
ing requirements of the system to be developed. Error free metric to minimize complexity of object-oriented designs
and efficient implementation of the important use case(s), and used Page Rank algorithm to measure relationship
thus identified, is essential due to the dependency of sev- among classes.
eral other use cases on it. This information about the key HITS (Hypertext Induced Topic Search) algorithm
use case(s) can help the developers to prioritize the use has been proposed by Kleinberg [14]. It is a web mining
cases in order to effectively implement them. algorithm which represented internet as a large directed
The paper consists of five sections. Section 2 describes graph to find most important web pages for a query.
the related work and background about Page Rank and Zaidman and Demeyer [15] used HITS to find the impor-
HITS algorithm. Section 3 describes the proposed ap- tant classes in the system. They applied HITS on Argo-
proach to identify key use cases in the UCD and illustrates UML case tool to analyze dynamic coupling among classes
it with an example. In section 4 results are discussed. Fi- and found that for early system understanding, 10% classes
nally, section 5 describes the conclusion and the future of any system should be considered as key classes. HITS
work. has also been applied to find the class authoritative com-
plexity in class diagram which can be used to sort classes
2. Related Work and Background by Tong [16]. The HITS algorithm is applied by Alexan-
der in [17] to evaluate the quality of class diagram.
Page Rank is a web Page Ranking algorithm proposed Though we can find several approaches in literature
by Page et al. [3] based on the concept of arranging pages where Page Rank and HITS algorithms are applied on
in a structured graph. Page Rank algorithm and its differ- the class diagram, to the best of our knowledge, these have
ent versions are used in many domains. In the field of not been applied to other UML diagrams such as use case
software engineering, Page Rank has been applied to an- diagram. In this paper, we have proposed an approach to
alyze different software artifacts. Software metric based apply Page Rank and HITS algorithm on UCD to iden-
on Page Rank, known as Coderank [4] is computed for tify the key use case(s) of the system.
artifacts like classes of a system to measure their impor-
tance along with their visualization and interpretation. 2.1 Page Rank and HITS Algorithm
Component rank technique [5] finds the most useful com- Brief description of the Page Rank algorithm and
ponent available for reuse in multiple software libraries. HITS algorithm, is given in this section.
Page Rank has also been applied to class diagram to mea-
sure complexity of relationships among the classes in 2.1.1 Page Rank Algorithm
[6]. Page Rank along with HITS is used by Yuming and Google search engine is widely used for searching
Applying Page Rank and HITS Algorithm to Identify Key Use Cases 657
web pages on the internet [3]. It uses link structure of the links from the page.
web to calculate rank of each web page and this ranking The directed graph is next converted to connectivity
is called Page Rank of a page. Each webpage in a Page matrix M. If there is an edge from node i to node j in the
Rank algorithm [3] is represented by a node of the graph directed graph, value 1 is written in the matrix and 0 oth-
and edges show the links among the pages. The Page erwise. Next step is to execute equation (3) iteratively
Rank algorithm measures the importance of a web page. with the matrix M and normalise the h and a score to get
A web page is considered as important only if it has more the final hub and authority score for a page i. Initial value
number of web links pointing towards it from other pages. assigned to h and a score is 1.
The rank of a webpage is calculated by adding the ranks
of the pages that point towards it. The combined rank of a(k) = MT h(k-1), h(k) = Ma(k-1) (3)
all web pages is 1.
The basic Page Rank algorithm [3] is described by 3. Identifying Key Use Cases Using Page
the following equation Rank Algorithm
PR ( u) = (1 - d ) + d ´ åvÎP (u) PR ( v ) N ( v ) (1) As use case diagram is the starting point of our pro-
posed approach, we have followed the syntax and se-
where u and v represent nodes i.e. webpages. PR(u) is mantics for creating use case diagram having elements
the Page Rank of page u and P(u) belongs to the set of Actor, Use Case, Association, Generalization, <include>
pages that point to u. PR(v) is the Page Rank of page v. and <extend> relationships, given in [1,2].
N(v) is the total number of outgoing links of page v. ‘d’ In our approach, we transform the UCD into a di-
is the damping factor having value between 0 and 1 and rected graph called use case control flow graph (UCFG)
is usually 0.85. The pages having no outgoing links af- showing the flow of control among different use cases
fect the rank score adversely. and actors in the use case diagram. UCFG has a set of
nodes N and set of edges E where each node corresponds
2.1.2 HITS Algorithm to use case/actor and each edge corresponds to the rela-
HITS (Hypertext Induced Topic Search) proposed tionship among use cases or between use case and actor,
by Kleinberg, is a graph-based web mining algorithm in the use case diagram. Following are the steps of the
where internet is considered as a large directed graph proposed approach.
[14]. For a particular query, most relevant and important 1. Assign a number to each use case in the UCD and al-
web pages are identified based on the hyperlink structure. phanumeric short form to each actor.
Each page is a node and edge is a hyperlink in the di- 2. Convert UCD into UCFG.
rected graph. Authority of a page is the number of in 3. Convert UCFG to connectivity matrix M. Make entry
links of page and shows that how informative a particu- in the matrix M according to the following equation
lar page is. Hub on the other hand is the number of out wt(i, j) = 1, if there is a relation between use case i and
links of a page i.e. it’s link with other informative pages. use case j or actor A and use case i.
So, each page has two scores - an authority score and = 0, if no relation exists between use case i and
hub score that are used to rank each page on the network use case j or actor A and use case i.
for importance. A page has a good authority score if sev- The weight of edge j to i is also considered to be 1, as
eral hubs point to it and similarly good hub score if it we know that a use case i that calls another use case j
points to many authorities. For a page i, hub score hi and returns the control back to the calling use case after
an authority score ai is calculated using the equation (2) execution. Similarly, if an actor A calls a particular
use case i then after execution it returns some values
a i(k) = å jÎI h jk- i , hi(k) = å jÎO a jk- i (2) to the actor. Therefore, we consider the weight of the
i i
edge returning from called use case in the connectiv-
where I are the in links to the page and O are the out ity matrix also.
658 Sangeeta Sabharwal et al.
4. Calculate the stochastic transition matrix ST as given 3.1 Demonstration of the Proposed Approach
below In this section, we illustrate the proposed approach
with an example UCD as shown in Figure 1. It consists of
ST ( i, j) = wt ( i, j) å j = 1 to n
wt ( i, j) (4) one actor and nineteen use cases. The UCD shows in-
clude and extend relationships among use cases. For ex-
where ST is n ´ n matrix. ample, use case (6) has an include relationship with use
5. Execute the Page Rank algorithm using equation (1) case (9), (10) and extend relationship with use case (19).
on the stochastic matrix ST to get the rank scores of all Use case control flow graph (UCFG) for the UCD is
the use cases using Matlab. shown in Figure 2. The control flow between the actor
and use case is represented with an arrow from an actor Table 3. Rank of the use cases/actor in the example use
to the use case and another arrow from use case to the ac- case diagram
tor to show return of control to the actor. Similarly, the Use Rank score Rank score Rank score
flow of control among the use cases having include and case/actor (d = 0.85) (d = 0.70) (d = 0.50)
extend relationship is also shown in the UCFG. A 0.0905 0.0801 0.0678
The use case control flow graph is next transformed 1 0.0701 0.0633 0.0564
into matrix M as illustrated in Table 1. The UCFG is 2 0.0760 0.0739 0.0701
3 0.0760 0.0739 0.0701
transformed into matrix M by assigning 1 for the pres-
4 0.0724 0.0670 0.0606
ence of an edge among actors and use cases and 0 for ab- 5 0.0767 0.0753 0.0711
sence of an edge. For example, use case (1) has associa- 6 0.1007 0.0967 0.0883
tion with actor and is related to use cases (5) and (6). 7 0.0293 0.0325 0.0369
Therefore, 1 is marked under columns A, 5 and 6 for row 8 0.0293 0.0325 0.0369
9 0.0289 0.0318 0.0360
1. Similarly, all other entries are made in the matrix M.
10 0.0289 0.0318 0.0360
Next, the stochastic matrix ST is computed from ma- 11 0.0289 0.0323 0.0367
trix M and is shown in Table 2. The ST matrix is n by n 12 0.0289 0.0323 0.0367
matrix computed using equation (4), given in the pro- 13 0.0289 0.0323 0.0366
posed approach. 14 0.0289 0.0323 0.0366
The Page Rank algorithm was then applied on the 15 0.0535 0.0545 0.0544
16 0.0535 0.0545 0.0544
stochastic matrix M using equation (1) and the results
17 0.0302 0.0340 0.0386
obtained are shown in Table 3. The results in the Table 3 18 0.0302 0.0340 0.0386
show rank values for three different values of the damp- 19 0.0289 0.0318 0.0360
ing factor d i.e. d = 0.85, d = 0.70 and d = 0.50.
It is observed that the use case (6) has the highest 3.2 Identifying Key Use Cases Using Weighted
Page Rank value for different values of d, from Table 3. Page Rank
Hence, use case (6) is the key use case in UCD. General- In this section, we find the key use case using weighted
ization is not shown here. Rank of a generalized use case Page Rank and assign weights as
will be derived in the similar way as described above. wt(i, j) = 1, if there exists an include dependency be-
Table1. Matrix M corresponding to example use case diagram
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
5 … … … … … … … … … … … … … … … … … … … …
19 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
tween use case i and use case j or association between Table 4. Rank of the use cases/actor in the example use
actor A and use case i. case diagram using weighted Page Rank
= 0, if there is no relation between use case i and use Use Rank score Rank score Rank score
case j or actor A and use case i. case/actor (d = 0.85) (d = 0.70) (d = 0.50)
= 0.85, if there exists an extend dependency between A 0.0223 0.0209 0.0226
use case i and j. 1 0.0150 0.0154 0.0182
The edge between two use cases i and j, related th- 2 0.0191 0.0195 0.0234
3 0.0191 0.0195 0.0234
rough extend dependency is given weight less than 1 be-
4 0.0183 0.0177 0.0203
cause use case j will not be always called or executed and 5 0.0134 0.0159 0.0209
it will be conditionally called. Therefore, weight assigned 6 0.0210 0.0231 0.0280
with extend relation is 0.85. In this method, we have en- 7 0.0050 0.0070 0.0112
hanced Page Rank algorithm by integrating semantic in- 8 0.0050 0.0070 0.0112
9 0.0063 0.0079 0.0118
formation among the use cases in UCD. The weight as-
10 0.0063 0.0079 0.0118
sociated with the extend relationship is determined after 11 0.0072 0.0084 0.0122
performing experiments with different values such as 0.80, 12 0.0072 0.0084 0.0122
0.70, and 0.50 for which the results obtained were not 13 0.0072 0.0084 0.0122
correct and cohesive. The condition of Page Rank algo- 14 0.0072 0.0084 0.0122
rithm SPage Rank = 1 is not satisfied in the case of 15 0.0136 0.0143 0.0181
16 0.0136 0.0143 0.0181
weighted Page Rank algorithm.
17 0.0076 0.0089 0.0128
Weighted Page Rank algorithm is applied on the UCD 18 0.0076 0.0089 0.0128
to find the most important use case(s) in the similar man- 19 0.0056 0.0073 0.0112
ner as described in section 3. Firstly, the matrix M is cre-
ated from the UCFG which is then converted to stochas- according to the following equation and make entry in
tic matrix M using equation (3) with the difference of matrix M.
weight assigned for extend dependency among use cases wt(i, j) = 1, if there is a relation between use case i and
as 0.85 instead of 1. Then Page Rank algorithmis applied use case j or actor A and use case i.
using equation (1) and rank of all the use cases in the = 0, if no relation exists between use case i and
UCD is computed as shown in Table 4. The results in the use case j or actor A and use case i.
Table 4 also show the rank values for three different val- 4. Apply the hits algorithm using equation (3) on the ma-
ues of the damping factor d i.e. d = 0.85, d = 0.70 and d = trix M recursively and refine the score to get the final
0.50. The use case (6) has the highest rank therefore it is hub and authority score for the UCD. Initial value 1 is
the key use case of the example UCD. given to both authority and hub score.
The matrix M corresponding to the example use case
3.3 Identifying Key Use Cases Using HITS Algorithm diagram is shown in Table 5. The results are shown in Ta-
In this section, we will discuss our proposed appro- ble 6 after applying HITS algorithm on the matrix M. It is
ach for finding the key use cases in UCD using HITS al- again observed that the use case (6) is the most important
gorithm. To find the important use case(s) we will con- use case in the UCD as it has the maximum hub score
sider the hub score ranking as hub is considered to be a (but less than the hub score of the actor in UCD).
use case that is calling several other use cases. Following
steps are there in the proposed approach. 4. Results and Discussion
1. Assign a number to each use case in UCD starting
with 1 and alphanumeric name for each actor. In this section we compare and discuss the results.
2. Convert UCD into UCFG. The application of Page Rank on the use case diagram
3. Convert UCFG to connectivity matrix M. Calculate shows the importance of a use case, based on the rela-
the weight of relationship or link between the use cases tionships of a particular use case with actor and other use
Applying Page Rank and HITS Algorithm to Identify Key Use Cases 661
Table 6. Rank of the use cases/actor in the example use example UCD (but less than rank of the actor in UCD).
case diagram using HITS algorithm The HITS algorithm is also applied with the UCD as
Use case/actor Authority score Hub score discussed in section 3.3. The HITS algorithm is applied
A 0.05 0.2105 by taking 1 for existence of edge in the UCFG and 0 oth-
1 0.05 0.1053 erwise. The results are shown in Table 6. They suggest
2 0.05 0.1053 that the hub score for the use case (6) is maximum and
3 0.05 0.1053 therefore it is the most important use case in the selected
4 0.05 0.1053
example use case diagram.
5 0.05 0.1053
6 0.05 0.1579 The successful coding and execution of highest rank-
7 0.05 0 ing use case is necessary as many other use cases are ei-
8 0.05 0 ther called by it or calling it. The use cases are appropri-
9 0.05 0 ately ranked and our objective to find the use case(s)
10 0.05 0 which is most important in the UCD is satisfied. The key
11 0.05 0
12 0.05 0
use case(s) will play a significant role in the performance
13 0.05 0 of any software. Therefore, the ranking or priority of the
14 0.05 0 use cases is important while making design decisions.
15 0.05 0.0526
16 0.05 0.0526 5. Conclusion and Future Work
17 0.05 0
18 0.05 0
19 0.05 0 In this paper approaches based on Page Rank algo-
rithm, weighted Page Rank algorithm and HITS algorithm
cases. We have applied Page Rank algorithm on the exam- is proposed to find and prioritize the key use case(s) of the
ple UCD shown in Figure 1. As a result, the largest value system. This approach provides information about the key
of Page Rank is for the use case (6) as observed from Ta- requirements i.e. which part of the initially unknown prob-
ble 3 (but less than the rank of actor in the UCD). Hence, lem domain should be looked at first by identifying the key
we conclude that this is the most important use case use cases in UCD. At the same time, it also gives us some
among all the use cases defined in the UCD in Figure 1. insight about the use cases or requirements getting lowest
The results are verified by application of weighted ranks. This information can be used by requirement engi-
Page Rank where variation in the weight associated with neers and software developers to make better decisions re-
extend relation is incorporated. The weight of the edge lated to the implementation of functional requirements and
connecting two use cases is taken as 1 for association and hence develop quality software. In future, we propose to
include relation whereas 0.85 for extend relation as this extend this work by applying this technique to the other re-
use case is not always called. The results are shown in quirement engineering artifacts. Further we would like to
Table 4 after applying weighted Page Rank for damping perform more experiments with large real-life systems to
values 0.85, 0.70 and 0.50. We can conclude from the re- show that the results are reliable and effectively guide soft-
sults that use case (6) is the most important use case in ware developer during early requirements analysis.
662 Sangeeta Sabharwal et al.