You are on page 1of 13

February 16, 2006

mst.nw

An Implementation of Kruskals Algorithm


Yu Wang (wangy22@mcmaster.ca)

Contents
1 Introduction 2 Data Structures 2.1 Node and Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Graph and Minimum Spanning Tree . . . . . . . . . . . . . . . . 2.3 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Algorithms 3.1 Edge Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Kruskals Algoritm . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Main Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4 5 5 7 9

4 Testing 10 4.1 Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 A Dened Chunks B Index 13 13

Introduction

Written in Pascal, mst.pas implements Kruskals algorithm to nd the minimum spanning tree T = (Vo , Eo ) from a connected, undirected graph G = (Vi , Ei ), where V and E correspond to sets of nodes1 and edges. It reads G from standard input and displays the edge information of the found T on standard output, if possible. This article explains mst.pas in detail and discusses issues such as exception handling, eciency, automated testing and selection of test cases. The general specication for this program can be found in the assignment section at [Sek06], and we assume our reader have background knowledge about Kruskals Algorithm. Knowing the concept of literate programming introduced in the class, this A article is prepared by NOWEB and L TEX. The presented code is to be compiled 2 by fpc (Free Pascal Compiler) .
1 We 2 fpc

use node instead of vertex in this article. is not compatible with gpc.

February 16, 2006

mst.nw

2
1

Data Structures

Before go into the detail, we rst dene our global variables: global variables 1 i, j, k, l : integer; input : string; num_edges : integer; num_nodes : integer; size_of_T : integer;
Denes: num_edges, used in chunks 1, 4, 1, 5, 6, 1, 11, 13, and 1. num_nodes, used in chunks 1, 4, 1, 5, 6, 12, 1, 13, and 1. size_of_T, used in chunks 1, 4, 1, 7, and 13. This code is used in chunk 14.

//for looping //for input

These variables are frequently used in our program. i, j, k, l are declared for loop statements and does not store any substantial information regarding the graph. String variable input is used for storing buer from standard input and to be converted to numerical value later. In Pascal, if numerical value is expected for standard input, then entering a non-numerical value will lead to the termination of the program. In our case, exception is handled. num_edges and num_nodes stores |Ei | and |Vi | in G repectively. size_of_T keeps track of |Eo | in T during the running time of the algorithm.

2.1

Node and Edge

As V is a set and for each node v V , it does not have an ordering to other nodes, we dene them to be integers, as long as each of them can be identied uniquely. For each edge e E , it contains two nodes r, s V , and its weight c R as shown in structured type:
2

structured type 2 edge_t = record r, s : integer; c : real; end;


Denes: edge_t, used in chunks 3, 8, 2, and 10. This code is used in chunk 14.

February 16, 2006

mst.nw

2.2

Graph and Minimum Spanning Tree

Assuming that G and T are connected and undirected, in our program they can be presented as arrays of edges, as shown in global arrays below. Then all information about nodes can be found in the edges of relevant graphs. Besides arrays G and T, we also dene array D of intergers for grouping the nodes when detecting cycles in graph, which will be explained in section Kruskals Algorithm.
3

global arrays 3 G : array of edge_t; T : array of edge_t; D : array of integer;


Denes: D, used in chunks G, used in chunks T, used in chunks Uses edge_t 2. This code is used in 3, 12, 3, and 13. 36, 11, and 13. 3, 4, 7, 3, 13, and 3. chunk 14.

Note that the arrays are dened as dynamic arrays, which is a feature of fpc, but not of gpc. A dynamic array can be resized using internal procedure setlength(), as shown in initialization:
4

initialization 4 num_edges := 0; num_nodes := 0; setlength (G, num_edges); size_of_T := 1; setlength (T, size_of_T);
Uses G 3, T 3, num_edges 1, num_nodes 1, and size_of_T 1. This code is used in chunk 14.

February 16, 2006

mst.nw

With num_edges and num_nodes initialized to zero, G is reset to an empty array. This correspond to its abstract structure G, which is a empty set for now. We consider base index to be 1 for each array. Notice that T is initialized with a length of 1. It is because setlength() procedure assumes the array passing to it has a base index of 0. Since array T changes during the running time of Kruskals Algorithm, but G does not, we can safely ignore the eect on G.

2.3

Input and Output

As specied, our program reads G from standard input and prints T in standard output. The code chunks are shown in graph input, graph output and minimum spanning tree output. For graph input, internal procedure val() is called for converting alphabetical value to numerical value. While the input is found to be invalid, the same question will be prompted again. Again, G is set to be an array of length num_edges + 1, as setlength() assumes the base index of passing array to be 0.
5

graph input 5 repeat write (Enter # of edges (>0): ); readln (input); val (input, num_edges); until num_edges > 0; setlength (G, num_edges + 1); for i := 1 to num_edges do begin repeat write (index (>0) of r node for edge , i, ? ); readln (input); val (input, G[i].r); until G[i].r > 0; if G[i].r > num_nodes then num_nodes := G[i].r; repeat write (index (>0) of s node for edge , i, ? ); readln (input); val (input, G[i].s); until G[i].s > 0; if G[i].s > num_nodes then num_nodes := G[i].s;

February 16, 2006

mst.nw

repeat write (weight (>0) of edge , i, ? ); readln (input); val (input, G[i].c); until G[i].c > 0; writeln; end;
Uses G 3, num_edges 1, and num_nodes 1. This code is used in chunk 14. 6

graph output 6 writeln (num_edges = , num_edges, , num_nodes = , num_nodes); for i := 1 to num_edges do writeln (weight of edge (, G[i].r, , , G[i].s, ) = , G[i].c:1:3); writeln;
Uses G 3, num_edges 1, and num_nodes 1. This code is used in chunk 14.

minimum spanning tree output checks size_of_T to determine dierent output. If it is larger than 1, then the minimum spanning tree is found; otherwise, there does not exist a minimum spanning tree.
7

minimum spanning tree output 7 if size_of_T > 1 then begin writeln (The minimum spanning tree is found:); for i := 1 to size_of_T - 1 do writeln (weight of edge (, T[i].r, , , T[i].s, ) = , T[i].c:1:3); end else begin writeln (The minimum spanning tree does not exist.); end;
Uses T 3 and size_of_T 1. This code is used in chunk 14.

February 16, 2006

mst.nw

3
3.1

Algorithms
Edge Sorting

One of the preconditions for Kruskals Algorithm is that array G is sorted by the weight of each edge in it, in an increasing order. We choose quick sort as our sorting algorithm.
8

local variables in quick sort 8 i, j : integer; t, v : edge_t;


Uses edge_t 2. This code is used in chunk 10.

Same as in global variables, i and j in local variables in quick sort are used as incrementing variables in loop statements. Dene as edge_t, v is the piviot element to be compared with all other edges in G. In our implementation, v is always the right most edge element in array a, which is passed in quick sort. When paritioning a, edge t is a temporary edge for swapping between two edge elements, as shown below:
9

partitioning in quick sort 9 if left < right then begin v := a[right]; i := left - 1; j := right; repeat repeat i := i + 1; until a[i].c >= v.c; repeat j := j - 1; until a[j].c <= v.c; t := a[i]; a[i] := a[j]; a[j] := t; until j <= i; a[j] := a[i]; a[i] := a[right]; a[right] := t;
Uses left 10 and right 10. This code is used in chunk 10.

February 16, 2006

mst.nw

With a and its left and right boundary indices passed to partitioning in quick sort, quick sort is executed recursively. Note that quick sort is an in-place sorting algorithm, with time complexity O(n lg n) on average[Cor03].
10

quick sort 10 quickSort(var a : array of edge_t; left, right : integer); var local variables in quick sort 8 begin partitioning in quick sort 9 quickSort(a, left, i-1); quickSort(a, i+1, right); end; end;
Denes: left, used in chunks 9 and 10. quickSort, used in chunks 10 and 11. right, used in chunks 9 and 10. Uses edge_t 2. This code is used in chunk 14.

3.2

Kruskals Algoritm

Our implementation of Kruskals Algorithm is based on the pseudocode found in [Sol06]. To sort edges in G, simply pass G with indices of its rightmost and leftmost elements, which are 1 and num_edges, to quickSort() procedure:
11

edge sorting by weight 11 quickSort (G, 1, num_edges);


Uses G 3, num_edges 1, and quickSort 10. This code is used in chunk 14.

Besides edge soring by weight, another precondition for the algorithm is to set each node as a group by itself. It is suggested to treat these groups as sets in [Cor03]. However, implementing set in Pascal is inconvenient and inecient. Instead, we use array D for node grouping as shown in [Sol06], which is much easier to be implemented and more ecient. It can be seen as a mapping from a node vm Vi to a node vr Vi , where vr is the representative node of the group and vm is member node of this group.
12

initialize D 12 setlength (D, num_nodes + 1); for i := 1 to num_nodes do D[i] := i;


Uses D 3 and num_nodes 1. This code is used in chunk 14.

February 16, 2006

mst.nw

After D is initialized, there will be num_nodes groups and each group contains only one member node. Each member node is therefore also a representative node of the group. During the running time of Kruskals Algorithm, nodes are grouped conditionally. To tell if an edge e = (r, s) Ei forms a cycle with other edges in the existing minimum spanning tree T, we check if the representative nodes of r and s are the same node. If not, e can be safely added to T.
13

Kruskals Algorithm 13 for i := 1 to num_edges do begin if D[G[i].r] <> D[G[i].s] then begin size_of_T := size_of_T + 1; setlength (T, size_of_T); T[size_of_T - 1] := G[i]; k := D[G[i].r]; l := D[G[i].s]; for j := 1 to num_nodes do if D[j] = l then D[j] := k; end; end;
Uses D 3, G 3, T 3, num_edges 1, num_nodes 1, and size_of_T 1. This code is used in chunk 14.

February 16, 2006

mst.nw

Based on the fact that all edges are sorted by their weights, the algorithm traverses through all edges and repeatly checking the condition above. Its loop invariant is that, T is forms a minimum spanning tree by the edges in itself.

3.3

Main Body

Combining all the code chunks we have shown above, here we provide the main body of our implementation:
14

mst.pas 14 program mst; type structured type 2 var global variables 1 global arrays 3 procedure quick sort 10 begin initialization 4 graph input 5 edge sorting by weight 11 initialize D 12 Kruskals Algorithm 13 graph output 6 minimum spanning tree output 7 end.
Root chunk (not used in this document).

February 16, 2006

mst.nw

10

4
4.1

Testing
Automated Testing

Since our program reads graph from standard input and display result to standard output, we can do the testing using I/O pipe redirecting under UNIX. Assuming the compiled executable is named mst, we can test it by the command, $ ./mst < test.mst where test.mst is a test le containing cases to be tested. Its format is that, each line contains one value; rst line contains number of edges, or |Ei |; following lines are divided into |Ei | groups; each such group contains 3 lines, where rst two lines are r and s, representing edge (r, s), and the third line is the weight of (r, s); empty line at the end of le. For an example, we are give a graph, which is a triangle in shape. Then the data le in this case is: $ cat test.mst 3 1 2 1.2 2 3 2.3 3 2 0.1

4.2

Test Cases

Choosing cases for testing a program can have dierent ways. Usually we use 3 dierent kinds of cases, which are (a) normal cases, (b) extreme cases and (c) invalid cases.

February 16, 2006

mst.nw

11

For normal cases, we often choose input data from real world, with expected output already resulted in other means. The number of such cases can be large. For extreme cases, we choose input data with special values. For our program, we can test it with a graph containing zero or one edge, edges with zero weight or some other rarely seen situations. For invalid cases, we want to test the error handling ability of our program, by giving inputs in dierent types. For example, a real number is entered, while in fact an interger is expected. The normal case we are testing here is selected from [Cor03], as shown below. $ ./mst < num_edges weight of weight of weight of weight of weight of weight of weight of weight of weight of weight of weight of weight of weight of weight of test2.mst (input sequence are suppressed to save space) = 14, num_nodes = 9 edge (7, 8) = 1.000 edge (6, 7) = 2.000 edge (3, 9) = 2.000 edge (3, 6) = 4.000 edge (1, 2) = 4.000 edge (7, 9) = 6.000 edge (8, 9) = 7.000 edge (3, 4) = 7.000 edge (2, 3) = 8.000 edge (9, 1) = 8.000 edge (4, 5) = 9.000 edge (5, 6) = 10.000 edge (2, 8) = 11.000 edge (4, 6) = 14.000

The minimum spanning tree is found: weight of edge (7, 8) = 1.000 weight of edge (6, 7) = 2.000 weight of edge (3, 9) = 2.000 weight of edge (3, 6) = 4.000 weight of edge (1, 2) = 4.000 weight of edge (3, 4) = 7.000 weight of edge (2, 3) = 8.000 weight of edge (4, 5) = 9.000 By checking the expected result, our program returns correct anwser. The extreme case being tested contains two sets of data. First one is, $ ./mst < num_edges weight of weight of weight of test3.mst (input sequence are suppressed to save space) = 3, num_nodes = 3 edge (2, 3) = 0.000 edge (1, 2) = 0.000 edge (1, 3) = 100000000000000000000000000.000

February 16, 2006

mst.nw

12

The minimum spanning tree is found: weight of edge (2, 3) = 0.000 weight of edge (1, 2) = 0.000 and the second one is, $ ./mst < num_edges weight of weight of weight of test3.mst (input sequence are suppressed to save space) = 3, num_nodes = 3 edge (2, 3) = 0.000 edge (1, 2) = 100000000000000000000000000.000 edge (1, 3) = 100000000000000000000000000.000

The minimum spanning tree is found: weight of edge (2, 3) = 0.000 weight of edge (1, 2) = 100000000000000000000000000.000 By checking the expected results, our program returns correct anwsers. The invalid case to be tested is here, $ ./mst [[Enter # of edges (>0): This is a test for invalid case. Enter # of edges (>0): Give some number here: Enter # of edges (>0): 1 index (>0) of r node for edge 1 ? What about index (>0) of r node for edge 1 ? 2? index (>0) of r node for edge 1 ? If we are still prompted for input, index (>0) of r node for edge 1 ? then it means the error handling of index (>0) of r node for edge 1 ? our program is working. index (>0) of r node for edge 1 ? Give a set of valid input, to end the program:index (>0) of r node for edge 1 ? 1 index (>0) of s node for edge 1 ? 2 weight (>0) of edge 1? 1 num_edges = 1, num_nodes = 2 weight of edge (1, 2) = 1.000 The minimum spanning tree is found: weight of edge (1, 2) = 1.000 Since the program exits nomrally, we conclude that our program is stable.

February 16, 2006

mst.nw

13

Appendices A Dened Chunks

Kruskals Algorithm 13 13, 14 edge sorting by weight 11 11, 14 global arrays 3 3, 14 global variables 1 1, 14 graph input 5 5, 14 graph output 6 6, 14 initialization 4 4, 14 initialize D 12 12, 14 local variables in quick sort 8 8, 10 minimum spanning tree output 7 7, 14 mst.pas 14 14 partitioning in quick sort 9 9, 10 quick sort 10 10, 14 structured type 2 2, 14

Index

D: 3, 3, 3, 12, 3, 13 G: 3, 3, 4, 3, 3, 3, 3, 5, 6, 3, 3, 3, 11, 13 T: 3, 3, 4, 3, 3, 7, 3, 3, 13, 3 edge_t: 2, 3, 8, 2, 10 left: 9, 10, 10 num_edges: 1, 1, 4, 1, 1, 5, 6, 1, 11, 13, 1, 1, 1, 1 num_nodes: 1, 1, 4, 1, 5, 6, 12, 1, 13, 1, 1, 1, 1 quickSort: 10, 10, 11 right: 9, 10, 10 size_of_T: 1, 1, 4, 1, 7, 13

References
[Cor03] Thomas H. Cormen. Introduction To Algorithms. The MIT Press, second ed. edition, 2003. page 145. [Sek06] Emil Sekerinski. Computing and software 703 (2006 winter term) - website, 2006. Available at http://www.cas.mcmaster.ca/ emil/cas703/assignments.html. [Sol06] Michael Soltys. Computer science 2me3 (2006 winter term) - lecture notes, 2006. Available at http://www.cas.mcmaster.ca/ soltys/cs2me3w06/pages1-3.pdf.

You might also like