Reason for need System dependent Most cost (time) comes from I/O Basic idea is merging BALANCED MULTIWAY MERGE Ex 2-WAY MERGE ASSUME - tape drive - 4 tapes - machine holds 2 elements at one time TAPE 1 2 3 4 5 1 3 9 0 2 4 8 3 9 4 6 7 1 5 Pass 1 Merge the first 2 elements into a sorted list and send them to tape 1. Merge the next 2 elements into a sorted list and send them to tape 2. Merge the next 2 elements into a sorted list and send them to tape 1. Merge the next 2 elements into a sorted list and send them to tape 2, etc. This produces sorted blocks of size 2. TAPE 1 1 5 0 2 3 9 1 7 2 3 9 4 8 4 6 5 3 4 Pass 1 complete. Pass 2 Repeat the same actions as in pass 1, only with sorted blocks rather than with individual elements. TAPE 1 2 3 1 3 5 9 3 4 6 9 4 0 2 4 8 1 5 7 Pass 2 complete TAPE 1 0 1 2 3 4 5 8 9 2 1 3 4 5 6 7 9 3 4 Pass 2 complete Pass 3 Merge the sorted blocks. TAPE 1 0 1 1 2 3 3 4 4 5 5 6 7 8 9 9 2 3 4 Pass 3 complete The list is now sorted. Analysis If we have N WORDS M INTERNAL MEMORY SIZE then NUMBER OF PASSES FOR P-WAY MERGE IS log p M/N Implementation of a P-way merge (replacement selection) to merge sorted blocks - sorted blocks form a heap - the first element of each block is the key - the smaller element is parent of the larger - as keys are removed, the heap is rebuilt Ex Implement a 3-way merge. 0,2 1,5 2 --> --> 1,5 3,9 2 39 5 39 39 5 9 Y --> --> --> 5 9 Proposition The number of runs is about 2 * (SIZE OF THE HEAP) Proposition If the size of a list is N, the size of internal memory is M, the number of taps is P + 1, then there are 1+log p (N/2M) passes. Note: By overlapping I/O we can save time. When a previous block leaves main memory entirely, read in the next block. Treat this block as if its key were larger than the keys of any previous blocks. Polyphase merging Multi-way merging uses either many tapes or much copying. In polyphase merging, distribute the elements on the tapes unevenly and merge until the list is sorted. Ex Suppose we have 3 tapes, and suppose that the list has 13 elements. We work backwards from the sorted list to the original list. Suppose that the sorted list is on Tape 1. We represent this situation by writing Tape 1 1 2 3 where the numbers on the tapes represent the number of sorted blocks. Then we must have had sorted blocks on Tapes 1 and 3, say. (We assume that we merge onto Tape 1.) Tape 1 2 1 3 1 Say the block on Tape 3 came from 2 other blocks. Then we must have had Tape 1 1 2 2 3 The previous step must have been Tape 1 3 2 3 2 The precious step must have been Tape 1 2 3 3 5 The previous step must have been Tape 1 5 2 8 3 Since the list has 13 elements, this is the way the list was originally distributed. In general, if we have 3 tapes, then the elements of the list are distributed on 2 tapes, and the number of elements on the tapes are given by two consecutive Fibonacci numbers. If the size of the list is not the sum of two consecutive Fibonacci numbers, then we can add dummy elements to the end of the list. Ex Consider the list 11 17 15 14 19 18 17 16 19 14 which has 10 elements. Add 3 dummy elements to the end of the list and distribute them. Tape 1 11 17 15 14 19 2 18 17 16 19 14 x x x 3 We assume that the dummy elements are bigger than all the other elements. Pass 1 Tape 1 2 x ! x ! x 3 11 18 ! 17 17 ! 15 16 ! 14 19 ! 14 19 (The exclamation points separate sorted blocks.) Pass 1 complete Pass 2 Tape 1 11 18 x ! 17 17 x ! 15 16 x 2 3 14 19 ! 14 19 Pass 2 complete Pass 3 Tape 1 15 16 x 2 11 14 18 19 x ! 14 17 17 19 x 3 Pass 3 complete Pass 4 Tape 1 2 14 17 17 19 x 3 11 14 15 16 18 19 x x Pass 4 complete Pass 5 Tape 1 11 14 14 15 16 17 17 18 19 19 x x x 2, 3 Pass 5 complete