You are on page 1of 28

Introduction Sorting and searching are fundamental operations in computer science.

Sorting refers to the operation of arranging data in some given order. Searching refers to the operation of searching the particular record from the existing information. Normally, the information retrieval involves searching, sorting and merging. In this chapter we will discuss the searching and sorting techniques in detail. Sorting Sorting is very important in every computer application. Sorting refers to arranging of data elements in some given order. Many sorting algorithms are available to sort the given set of elements. We will now discuss two sorting techniques and analyze their performance. The two Techniques are: internal sorting external sorting Internal sorting Internal Sorting takes place in the main memory of a computer. The internal sorting methods are applied to small collection of data. It means that, the entire collection of data to be sorted in small enough that the sorting can take place within main memory. We will study the following methods of internal sorting Insertion sort Selection sort Merge sort Radix sort Quick sort Heap sort Bubble sort

Insertion sort In this sorting we can read the given elements from 1 to n, inserting each element into its proper position. For example, the card player arranging the cards dealt to him. The player picks up the card and inserts them into the proper position. At every step, we insert the item into its proper place. This sorting algorithm is frequently used when n is small. The insertion sort algorithm scans A from A[l] to A[N], inserting each element A[K] into its proper position in the previously sorted subarray A[l], A[2], . . . , A[K-1]. That is:
Pass 1. A[l] by itself is trivially sorted. Pass 2. A[2] is inserted either before or after A[l] so that: A[l], A[2] is sorted. Pass 3. A[3] is inserted into its proper place in A[l], A[2], that is, before A[l], between A[l] and A[2], or after A[2], so that: A[l], A[2], A[3] is sorted. Pass 4. A[4] is inserted into its proper place in A[l], A[2], A[3] so that: A[l], A[2], A[3], A[4] is sorted. Pass N. A[N] is inserted into its proper place in A[l], A[2], . . . , A[N - 1] so that: A[l], A[2], . . . ,A[N] is sorted.

Algorithm INSERTION ( A , N ) This algorithm sorts the array A with N elements 1. Set A[0] := -- . [initializes the element] 2. Repeat Steps 3 to 5 for K= 2,3, ,N 3. Set TEMP := A[K] and PTR:= K-1 4. Repeat while TEMP < A[PTR] (a) Set A[PTR +1]:=A[PTR] [Moves element forward] (b) Set PTR := PTR-1 [End of loop]. 5. Set A[PTR+1] := TEMP [inserts element in proper place] [End of Step 2 loop] 6. Return

Selection sort In this sorting we find the smallest element in this list and put it in the first position. Then find the second smallest element in the list and put it in the second position. And so on. Pass 1. Find the location LOC of the smallest in the list of N elements A[l], A[2], . . . , A[N], and then interchange A[LOC] and [1] . Then A[1] is sorted. Pass 2. Find the location LOC of the smallest in the sublist of N 1 Elements A[2], A[3],. . . , A[N], and then interchangeA[LOC] and A[2]. Then:A[l], A[2] is sorted, since A[1]<A[2]. Pass 3. Find the location LOC of the smallest in the sublist of N 2 elements A[3], A[4], . . . , A[N], and then interchange A[LOC] and A[3]. Then: A[l], A[2], . . . , A[3] is sorted, since A[2] < A[3]. Pass N - 1. Find the location LOC of the smaller of the elements A[N - 1), A[N], and then interchange A[LOC] and A[N- 1]. Then: A[l], A[2], . . . , A[N] is sorted, since A[N - 1] < A[N]. Thus A is sorted after N - 1 passes.

Example Suppose an array A contains 8 elements as follows: 77, 33, 44, 11, 88, 22, 66, 55

Algorithm 1. To find the minimum element MIN ( A, K , N, LOC) An array A is in memory. This procedure finds the location LOC of the smallest element among A[K] , A[K+1],.A[N]. 1. Set MIN:= A[K] and LOC := K [Initializes pointers] 2. Repeat for J = K +1, K+2 If MIN > A [J] , then : Set MIN := A[J] and LOC := A[j] and LOC: = J 3. Return 2. To Sort the elements SELECTION (A, N) 1. Repeat Steps 2 and 3 form K= 1,2, .., N 1 2. Call MIN(A,K,N,LOC) 3. [Interchange A[K] and A[LOC] ] Set TEMP: = A [K], A [K]:= A [LOC] and A [LOC]:=TEMP 4. Exit.

Merge sort Combing the two lists is called as merging. For example A is a sorted list with r elements and B is a sorted list with s elements. The operation that combines the elements of A and B into a single sorted list C with n = r + s elements is called merging. After combing the two lists the elements are sorted by using the following merging algorithm Suppose one is given two sorted decks of cards. The decks are merged as in Fig. That is, at each step, the two front cards are compared and the smaller one is placed in the combined deck. When one of the decks is empty, all of the remaining cards in the other deck are put at the end of the combined deck. Similarly, suppose we have two lines of students sorted by increasing heights, and suppose we want to merge them into a single sorted line. The new line is formed by choosing, at each step, the shorter of the two students who are at the head of their respective lines. When one of the lines has no more students, the remaining students line up at the end of the combined line.

The above discussion will now be translated into a formal algorithm which merges a sorted r-element array A and a sorted s-element array B into a sorted array C, with n = r + s elements. First of all, we must always keep track of the locations of the smallest element of A and the smallest element of B which have not yet been

placed in C. Let NA and NB denote these locations, respectively. Also, let PTR denote the location in C to be filled. Thus, initially, we set NA : = 1, NB : = 1 and PTR : = 1. At each step of the algorithm, we compare A[NA] and B[NB] and assign the smaller element to C[PTR]. Then we increment PTR by setting PTR:= PTR + 1, and we either increment NA by setting NA: = NA + 1 or increment NB by setting NB: = NB + 1, according to whether the new element in C has come from A or from B. Furthermore, if NA> r, then the remaining elements of B are assigned to C; or if NB > s, then the remaining elements of A are assigned to C. Algorithm MERGING ( A, R, B, S, C) Let A and B be sorted arrays with R and S elements. This algorithm merges A and B into an array C with N = R + S elements. 1. [Initialize ] Set NA : = 1 , NB := 1 AND PTR : = 1 2. [Compare] Repeat while NA <= R and NB <= S If A[NA] < B[NB] , then (a)[Assign element from A to C ] set C[PTR] := A[NA] (b)[Update pointers ] Set PTR := PTR +1 and NA := NA +1 Else (a) [Assign element from B to C] Set C[PTR] := B[NB] (b) [Update Pointers] Set PTR := PTR +1 and NB := NB +1 [End of loop] 3. [Assign remaining elements to C] If NA > R , then Repeat for K = 0 ,1,2,..,S- NB Set C[PTR+K] := B[NB+K] [End of loop] Else Repeat for K = 0,1,2,,R-NA Set C[PTR+K] := A[NA+K] [End of loop] 4. Exit

Quick sort Algorithm Quick sort is one of the example of "Divide and Conquer approach" for solving the problems. Quick sort algorithm works by placing the last element of queue in proper position through comparing the other element from the first end of queue. The steps followed by quick sort algorithm are as follows: 1. Adjust the dividing (pivot) point in the queue i.e last element of the queue. 2. Then compare each element of the queue from the beginning of the queue if condition satisfy that element is less then pivot element then place it left hand side by exchanging the element else greater element than pivot element will be at right hand side. 3. After completing a iteration exchange the pivot element with the exact element from where all element in left hand side are less & right hand side are greater and after placing pivot element divide the queue in two parts. 4. After dividing in two parts again choose the pivot element in both the queues and sort them separately by repeating step 1,2,3,4. 5. Repeat the process until the queue is not sorted and after sorting each sub queues recursively combine them to one one sorted queue. Now we move to see the quick sort algorithm structure as follow: Firstly to set our pointers to get partition of an array: Quick sort(Array, S ,Piv) If S < Piv Then q Partition(Array, S, Piv) Quick_sort(Array, S, q-1) Quick_sort(Array, q+1,Piv) Partition (Array, S , Piv) x Array[Piv] iS-1 For j S to Piv-1 do If Array[j] x Then i i + 1 Exchange Array[i] Array[j] Exchange Array[i + 1] Array[Piv] Return i + 1 (Return the position of S)

Bubble sort Algorithm Bubble sort and somes say it as sinking sort. Selection sort algorithm simply start sorting step by step comparing element to the next element and swapping them this procedure repeat's until all element in array is sorted in some sequence accordingly. Bubble sort algorithm gets name bubble because of sorting the elements in array in shorter range i.e just next value of the element in array is checked and swapped or we can say sorting function is perform in very smaller time that is why it is also called comparison sort. Now we will see the algorithm structure as follows:

Radix Sort Radix sorting involves looking at a radix (or digit) of a number and placing it in an array of linked lists to sort it. Algorithm for radix sorting: 1. Look at the rightmost digit. 2. Assign the full number to that digits index. 3. Look at the next digit to the left FROM the current sorted array. IF there is no digit, pad a 0. 4. REPEAT STEP 3 UNTIL all numbers have been sorted. Let's see a step by step example of a radix sort of the following set of unsorted numbers. The bold digits here represent the first digit to look at when attempting to sort the list. You must also append it to the end of that linked list in the array. 212 21 72 5 431 898 616 24 9 Step 1: 0 1 2 3 4 5 6 7 8 9

21 -> 431 212 -> 72 24 05 616 898 09

Step 2: (working from step 1) 0 005 -> 009 1 212 -> 616 2 021 -> 024 3 431 4 5 6 7 072 8 9 898

Step 3: (working from step 2) 0 5 -> 9 -> 21 -> 24 -> 72 1 2 212 3 4 431 5 6 616 7 8 898 9 Step 3 is the final step and the list is sorted. The benefits of a radix sort is the fact that it can be done by pencil and paper. It also only contains a fixed data structure (an array of size 10). The downside of radix sort is that it takes time to implement since you may manually go through numerous steps to sort the list depending on how many numbers you have to sort. Here is another example of radix sort, this time using numbers up to 4 digits in length. You will notice something interesting here 58 99 999 47 200 101 1002 12 1111 Step 1: 0 1 2 3 4 5 6 7 8 9

200 101 -> 1111 1002 -> 12

47 58 99 -> 999

Step 2: (working from step 1) 0 200 -> 101 -> 1002 1 1111 -> 012 2 3 4 047 5 058 6

7 8 9

099 -> 999

Step 3: (working from step 2) 0 1002 -> 0012 -> 0047 -> 0058 -> 0099 1 0101 -> 1111 2 0200 3 4 5 6 7 8 9 0999 Step 4: (working from step 3) 0 12 -> 47 -> 58 -> 99 -> 101 -> 200 -> 999 1 1002 -> 1111 2 3 4 5 6 7 8 9 Step 4 is the final step here. Notice however that the index 0 goes from 0 to 999 while 1 goes from 1000 to 1999 etc...

Heapsort Heaps The (Binary) heap data structure is an array object that can be viewed as a nearly complete binary tree. A binary tree with n nodes and depth k is complete iff its nodes correspond to the nodes numbered from 1 to n in the full binary tree of depth k.

Attributes of a Heap An array A that presents a heap with two attributes: length[A]: the number of elements in the array. heapsize[A]: the number of elements in the heap stored with array A. length[A] heapsize[A]

Basic procedures If a complete binary tree with n nodes is represented sequentially, then for any node with index i, 1 i n, we have A[1] is the root of the tree the parent PARENT(i) is at i/2 if i 1 the left child LEFT(i) is at 2i the right child RIGHT(i) is at 2i+1

The LEFT procedure can compute 2i in one instruction by simply shifting the binary representation of i left one bit position. Similarly, the RIGHT procedure can quickly compute 2i+1 by shifting the binary representation of i left one bit position and adding in a 1 as the loworder bit. The PARENT procedure can compute i/2 by shifting i right one bit position.

Heap properties There are two kind of binary heaps: maxheaps and minheaps. In a maxheap, the maxheap property is that for every node i other than the root, A[PARENT(i) ] A[i] . the largest element in a maxheap is stored at the root the subtree rooted at a node contains values no larger than that contained at the node itself In a minheap, the minheap property is that for every node i other than the root, A[PARENT(i) ] A[i] . the smallest element in a minheap is at the root the subtree rooted at a node contains values no smaller than that contained at the node itself

The height of a heap The height of a node in a heap is the number of edges on the longest simple downward path from the node to a leaf, and the height of the heap to be the height of the root, that is (lgn). For example: the height of node 2 is 2 the height of the heap is 3

The MAXHEAPIFY procedure MAXHEAPIFY is an important subroutine for manipulating max heaps. Input: an array A and an index i Output: the subtree rooted at index i becomes a max heap Assume: the binary trees rooted at LEFT(i) and RIGHT(i) are maxheaps, but A[i] may be smaller than its children Method: let the value at A[i] float down in the maxheap MAXHEAPIFY(A, i) 1. l LEFT(i) 2. r RIGHT(i) 3. if l heapsize[A] and A[l] > A[i] 4. then largest l 5. else largest i 6. if r heapsize[A] and a[r] > A[largest] 7. then largest r 8. if largest i 9. then exchange A[i]

A[largest]

10. MAXHEAPIFY (A, largest)

Building a Heap We can use the MAX HEAPIFY procedure to convert an array A=[1..n] into a maxheap in a bottomup manner. The elements in the subarray A[( n/2+1)n ] are all leaves of the tree, and so each is a 1element heap. The procedure BUILDMAXHEAP goes through the remaining nodes of the tree and runs MAXHEAPIFY on each one. BUILDMAXHEAP(A) 1. heapsize[A] length[A] 2. for i length[A]/2 downto 1 3. do MAXHEAPIFY(A,i)

The heapsort algorithm Since the maximum element of the array is stored at the root, A[1] we can exchange it with A[n]. If we now discard A[n], we observe that A[1...(n1)] can easily be made into a maxheap. The children of the root A[1] remain maxheaps, but the new root A[1] element may violate the maxheap property, so we need to readjust the maxheap. That is to call MAXHEAPIFY(A, 1). HEAPSORT(A) 1. BUILDMAXHEAP(A) 2. for i length[A] downto 2 3. do exchange A[1] A[i] 4. heapsize[A] heapsize[A] 1 5. MAXHEAPIFY(A, 1)

3 essential properties of algorithms: In computer science, an in-place algorithm (or in Latin in situ) is an algorithm which transforms input using a data structure with a small, constant amount of extra storage space. The input is usually overwritten by the output as the algorithm executes. An algorithm which is not in-place is sometimes called not-in-place or out-of-place In computer science, an online algorithm is one that can process its input piece-bypiece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an offline algorithm is given the whole problem data from the beginning and is required to output an answer which solves the problem at hand. A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array.

Algorithm Insertion sort Selection sort Merge sort Radix sort Quick sort Heap sort Bubble sort

In-place Yes Yes No No Yes Yes Yes

Online Yes No Yes No Yes No No

Stable Yes No Yes Yes No No Yes

External sorting External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file.

Basic External Sorting Algorithm Assume unsorted data is on disk at start Let M = maximum number of records that can be stored & sorted in internal memory at one time Algorithm Repeat: 1. Read M records into main memory & sort internally. 2. Write this sorted sub-list onto disk. (This is one run). Until all data is processed into runs Repeat: 1. Merge two runs into one sorted run twice as long 2. Write this single run back onto disk Until all runs processed into runs twice as long Merge runs again as often as needed until only one large run: the sorted list

You might also like