Problems in Arrays, Searching, Sorting, Hashing

PROBLEM CALCULATE THE VALUE OF AN N × N DETERMINANT

Program

#include #include #define N 3 #define EPSILON 1e-10 typedef enum {FALSE, TRUE} bool; void print( double a[][N] ) { /* * print the matrix. */ int i, j; for( i=0; i EPSILON ) { printf( "factor=%g dividing row %d by %g... ", factor, i, temp ); divRow( a, i, temp ); print(a); factor *= temp; } temp = a[j][j]; if( fabs(temp) > EPSILON && fabs(temp-1.0) > EPSILON ) { printf( "factor=%g dividing row %d by %g... ", factor, j, temp ); divRow( a, j, temp ); print(a); factor *= temp; } if( fabs(a[i][j]) > EPSILON ) { printf( "factor=%g row[%d] -= row[%d]... ", factor, i, j ); subRow( a, i, j ); print(a); } if( anyZero(a) == TRUE ) return 0; } a[N-1][N-1] *= factor; // all but(?) last element of row N-1 are zero. return 1; } double multDia( double a[][N] ) { /* * returns multiplication of diagonal elements. */ int i; double factor = 1; for( i=0; i

Explanation

  1.  
  2. The usual way of finding the value of an N × N determinant is to take the first element of the first row and multiply it by the value of the determinant formed by removing that row and that column. This procedure is followed recursively at every step until only a single element remains whose value itself is the value of the 1 × 1 determinant. The sign of the element being multiplied may change depending on its row and column. In general, if the element has row i and column j, then its sign is determined by the formula (−1)^(i+j).
  3.  
  4. There is another way to solve the problem. We note that if we perform row or column transformations on the determinant, its value does not change. We take advantage of this fact to transform the matrix corresponding to the determinant into an upper or lower triangular matrix and then find its determinant value. It becomes clear that the determinant value of an upper or lower triangular matrix is the product of the diagonal elements. Thus the job of finding a determinant value is basically a matter of making a matrix upper or lower triangular.
  5.  
  6. To make a matrix upper triangular (makeUpper()), we perform row transformations on the matrix. Another important observation is that multiplication of a determinant D by a value v is a new determinant in which any of the rows of D are multiplied by v. Multiplication of a row by v means multiplying each element in the row by v. Thus we maintain a variable factor that keeps track of the multiplier as the rows of the determinant are multiplied or divided by various factors in the process of upper-triangulation. We assume all elements of the determinant to be floating-point numbers.
  7.  
  8. In the process of upper-triangulation, if any of the diagonal elements become 0 (anyZero()), that means the value of the determinant is 0 (remember that the value of the determinant is the multiplication of diagonal elements (multDia()).
  9.  
  10. Since here we assume that the elements are floating-point numbers, the calculations might not be exact. To account for this approximation, we maintain an error term EPSILON which is set to a sufficiently low value (tending to 0). Any value in the range +EPSILON…0…–EPSILON is considered to be 0.
  11.  
  12. The algorithm for making a matrix upper triangular is as follows:

    factor = 1.
    for i=1 to N-1 do
    for j=0 to i-1 do
     divide row i by D[i][j] // divRow().
     factor *= D[i][j].
     divide row j by D[j][j] // divRow().
     factor *= D[j][j].
     subtract elements of row j from corresponding elements of row i
    // subRow().
     check for any of the diagonal elements of D to be 0.
    // anyZero().
     determinant = factor*product of diagonal elements.
    
  13. Example: Let the determinant be

     | 8 0 1 |
     | 2 1 3 |
     | 5 3 9 |.
     factor = 1.
    

Snapshots of the algorithm when run on this determinant are shown here.

I

J

STEP

FACTOR

DETERMINANT

1

0

divide row 1 by

2

|8    0    1 |

       

|1 0.5 1.5 |

       

|5    3    9 |

1

0

divide row 0 by 8

16

| 1 0 0.125 |

       

| 1 0.5 1.5  |

       

| 5    3     9 |

1

0

row 1 −= row 0|

16

| 1  0   0.125 |

       

| 0 0.5 1.375 |

       

| 5 3 9 |

2

0

divide row 2 by 5

80

| 1 0 0.125 |

       

| 0 0.5 1.375 |

       

| 1 0.6 1.8 |

2

0

row 2 −= row 0

80

| 1 0 0.125 |

       

| 0 0.5 1.375 |

       

| 0 0.6 1.675 |

2

1

divide row 2 by 0.6

48

| 1 0 0.125 |

       

| 0 0.5 1.375 |

       

| 0 1 2.792 |

2

1

divide row 1 by 0.5

24

| 1 0 0.125 |

       

| 0 1 2.75 |

       

| 0 1 2.792 |

2

1

row 2 −= row 1

24

| 1 0 0.125 |

       

| 0 1 2.75 |

       

| 0 0 0.0417|

Thus, the determinant = 24*(1*1*0.0417) = 1.

Points to Remember

  1. The first way to solve the problem recursively was natural but clumsy, as it requires removal of a row and a column. So, when designing an algorithm, we should try different approaches and then select the most appropriate one.
  2. Note how we reduced the problem of finding a determinant value to making a matrix upper triangular. These reductions not only simplify a problem but can also help in reusing the code and analysis.
  3. An error term such as EPSILON should be used in floating point computations.

 

 

PROBLEM WRITE A PROGRAM TO FIND THE SADDLE POINT OF A MATRIX, IF IT EXISTS

Program

#include #define M 3 #define N 3 int findMin( int a[][N], int row ) { /* * find min value in row of a. * return the value. */ int min = a[row][0]; int j; for( j=1; j max ) max = a[i][col]; return max; } void saddle( int a[][N] ) { /* * finds ALL saddle points of a if exist. */ int i, j; for( i=0; i

Explanation

  1.  
  2. An M × N matrix is said to have a saddle point if some entry a[i][j] is the smallest in row i and the largest in column j. In the following example matrix, 7 is the saddle point.

     | 1 2 3 |
     a = | 4 5 6 |
     | 7 8 9 |
    
  3. The program is simple and could be completed in O(M × N × M) time by using nested for loops. The algorithm is as follows:

    for i=0 to M-1 {
    min = minimum value in row i. // findMin().
    for j=0 to N-1 {
     max = maximum value in column j. // findMax().
     if (min == max)
     print "saddle found at row ", i, "and column ", j.
     }
    }
    
  4. Finding the minimum in a row is O(N) and finding the maximum in a column is O(M). Thus the complexity of the algorithm becomes O(M*(N+N*M)), or O(M*N*M).

Points to Remember

  1. A saddle point is the value which is minimum in the row and maximum in the column.
  2. There can be more than one saddle point in a matrix.
  3. There may be no saddle point in a matrix.
  4. The complexity of the algorithm is O(M × M × N), where M × N is the size of the matrix.

 

 

PROBLEM MULTIPLY TWO SPARSE MATRICES

Write a function mmult() to multiply two sparse matrices and store the result in another sparse matrix. Each sparse matrix is represented by an array of triplets. Each triplet consists of a row, a column, and a value.

Program

#include #define M 10 // the matrices are MxP, PxN and MxN. #define P 7 #define N 8 void ftrans( int a[][3], int b[][3] ) { /* * finds fast-transpose of a in b. */ int t[P+1]; int i, j, n, terms; int temp, temp2; n = a[0][1]; terms = a[0][2]; b[0][0] = n; b[0][1] = a[0][0]; b[0][2] = terms; if( terms <= 0 ) return; for( i=0; i

Explanation

  1.  
  2. A sparse matrix is represented by an array of triplets. Each triplet consists of a row, a column, and a value corresponding to one nonzero element in the sparse matrix. Thus, we maintain an array of size N × 3 for each sparse matrix, where N is the number of non-zero elements in the array. The first row of each sparse matrix contains the number of rows, columns, and nonzero elements in the matrix. An example follows.

     
     

    I

    A[I][0]

    A[I][1]

    A[I][2]

     
     
       
     
         
     
     

    0

    7

    7

    8

     
     
     

    1

    1

    1

    15

     
     
     

    2

    1

    4

    22

     
     
     

    3

    1

    6

    −15

     
     
     

    4

    2

    2

    11

     
     
     

    5

    2

    3

    3

     
     
     

    6

    3

    4

    −6

     
     
     

    7

    5

    1

    91

     
     
     

    8

    6

    3

    28

     
     

    a[0][0]=7 is the number of rows of the matrix a. a[0][1]=7 is the number of columns of the matrix a. a[0][2]=8 is the number of nonzero elements in the matrix a. The nonzero elements are saved in rows i=1 to i=8 where a[i][0] is the row, a[i][1] is the column, and a[i][2] is the value in the matrix a. Note that the elements are sorted by row, but inside a row they are sorted by columns.

  3.  
  4. The function mmult(a, b, c) takes each element of a, with row arow and column acol, and multiplies with each element of row acol of b. The product is added to the element in c with row arow and column equal to the column of the element in b. Note that since a and b are sorted by (row, column), the new entries generated for c are also in the same sorted order. Thus we can easily insert the new entries in c at the end of the currently stored entries.
  5.  
  6. The only problem now is how to reach the first entry in b with row acol. One way is to use a binary search to reach the row, as it is sorted by row. But we note that the number of rows of b is fixed. So we maintain an array of indices pointing to the first entries for each row in b. Thus, if the matrix b is as shown previously, the indices maintained in a vector t are as shown here.

     
     

    I

    T[I]

     
     
       
     
         
     
     

    0

    0

     
     
     

    1

    1

     
     
     

    2

    4

     
     
     

    3

    6

     
     
     

    4

    0

     
     
     

    5

    7

     
     
     

    6

    8

     
     

    t[i] == 0 signifies that row i of b does not contain any elements. Because of this, the original binary search of order O(log n) can now be done in O(1) time. But this needs extra O(Nb) time to construct t[] where Nb is the number of non-zero entries in b.

  7.  
  8. Example: Let

     | 0 2 |
    a = | 3 0 |, b = | 0 0 5 |
     | 0 0 | | 7 0 6 |
     | 1 4 |
    

    Here M = 4, P = 2, and N = 3. Then we should get

     | 14 0 12 |
    c = | 0 0 15 |
     | 0 0 0 |
     | 28 0 29 |
    

The algorithm traverses each element of a, and for each element in a with row arow and column acol:

row acol of b is traversed

the element in a is multiplied by each element in the row-list of b with row acol and column bcol, and

the products are inserted in c as c[arow][bcol].

Thus in every step, an element in a contributes to c. If c[arow][bcol] exists, the product is added to the original value. The multiplication of a and b is given here for the previous example.

STEP

AROW

ACOL

BCOL

A[AROW][ACOL] *B[ACOL][BCOL]

C[AROW][BCOL]

1

0

1

0

14

14

2

0

1

2

12

12

3

1

0

2

15

15

4

3

0

2

5

5

5

3

1

0

28

28

6

3

1

2

24

29

Note that in step 6, the product is 24, while c[3][2] gets a value of 29. This happens because c[3][2] already contains a value 5 in step 4.

Points to Remember

  1. By storing the sparse matrix as an array of triplets, we save a considerable amount of space required for pointers, in the case of a sparse matrix represented by horizontal and vertical lists.
  2. Note how the matrix multiplication time reduces by storing the indices to the rows of b.
  3. The complexity of mmult() is O(Na*Cb*Nc*Nc) where Na is the number of non-zero elements in a, Cb is the number of columns in b and Nc is the number of entries in c. The factor Nc*Nc can be reduced by maintaining the maximum (row, column) inserted in c and by performing a binary search while searching for an existing (row, column), or by maintaining an array similar to t[] for b. By storing such extra indices, insertion in c can be done in O(1) which will make mmult() be O(Na*Cb).

 

 

PROBLEM MULTIPLICATION OF TWO SPARSE MATRICES (DIFFERENT VERSIONS)

Let a and b be two sparse matrices. Write a function sMatMul(a, b, c) to set up the structure for c=a*b.

Program

#include 

#define M 10
#define N 10
#define P 10
#define DEFAULTVAL 0
#define SUCCESS 0
#define ERROR -1

typedef int type;
typedef struct node node;

struct node {
 type data;
 int row;
 int col;
 node *hnext;
 node *vnext;
};

typedef struct {
 node *rows;
 node *cols;
} spmat;

void sInit( spmat *mat, int rows, int cols ) {
 /*
 * initialize the matrix.
 */
 int i;

 mat->rows = (node *)malloc( sizeof(node)*rows );
 mat->cols = (node *)malloc( sizeof(node)*cols );
 for( i=0; irows[i].hnext =
 mat->rows+i; mat->rows[i].row = i;
 }
 for( i=0; icols[i].vnext =
 mat->cols+i; mat->cols[i].col = i;
 }
}

int sAdd( spmat *mat, int row, int col, type data ) {
 /*
 * adds a new node to the sparse matrix.
 */
 node *ptr;

 if( data == DEFAULTVAL )
 return;
 ptr = (node *)malloc( sizeof(node) ); // freed in cColInsert() if reqd.
 ptr->data = data;
 ptr->row = row;
 ptr->col = col;
 cRowInsert( mat->rows+row, ptr );
 cColInsert( mat->cols+col, ptr );

 return SUCCESS;
}

int cRowInsert( node *head, node *dataptr ) {
 /*
 * inserts dataptr in appropriate row of sparse matrix.
 */
 node *ptr, *prev;
 for( prev=head, ptr=prev->hnext; ptr!=head && ptr->colcol; prev=ptr, ptr=ptr->hnext )
 ;
 if( ptr!=head && ptr->col == dataptr->col ) { // data already exists.
 ptr->data += dataptr->data; // this is for multiplication.
 return SUCCESS;
 }
 // dataptr should be added between prev and ptr.
 dataptr->hnext = ptr;
 prev->hnext = dataptr;

 return SUCCESS;
}

int cColInsert( node *head, node *dataptr ) {
 /*
 * inserts dataptr in appropriate col of sparse matrix.
 * Assume that cRowInsert() was called before.
 */
 node *ptr, *prev;
 for( prev=head, ptr=prev->vnext; ptr!=head && ptr->rowrow; prev=ptr, ptr=ptr->vnext )
 ;
 if( ptr!=head && ptr->row == dataptr->row ) { // data already exists.
 free(dataptr);
 return SUCCESS;
 }
 // dataptr should be added between prev and ptr.
 dataptr->vnext = ptr;
 prev->vnext = dataptr;

 return SUCCESS;
}

void cRowPrint( node *head ) {
 /*
 * print a row.
 */
 node *ptr;

 printf( "%2d : ", head->row );
 for( ptr=head->hnext; ptr!=head; ptr=ptr->hnext )
 printf( "%d(%d,%d) ", ptr->data, ptr->row, ptr->col );
 printf( "
" );
}

void cColPrint( node *head ) {
 /*
 * print a col.
 */
 node *ptr;
 printf( "%2d : ", head->col );
 for( ptr=head->vnext; ptr!=head; ptr=ptr->vnext )
 printf( "%2d(%d,%d) ", ptr->data, ptr->row, ptr->col );
 printf( "
" );
}

void sHPrint( spmat *mat, int rows ) {
 /*
 * print sparse matrix by traversing it row-wise.
 */
 int i;
 for( i=0; irows+i );
 printf( "
" );
}

void sVPrint( spmat *mat, int cols ) {
 /*
 * print sparse matrix by traversing it col-wise.
 */
 int i;
 for( i=0; icols+i );
 printf( "
" );
}

type sGetVal( spmat *a, int row, int col ) {
 /*
 * return a[row][col];
 */
 node *head = a->rows+row;
 node *ptr;

 for( ptr=head->hnext; ptr!=head; ptr=ptr->hnext )
 if( ptr->col == col )
 return ptr->data;
 return DEFAULTVAL; // entry absent in matrix : default value 0.
}

int sMatMulBad( spmat *a, spmat *b, spmat *c ) {
 /*
 * original inefficient implementation of matrix mult.
 */
 int i, j, k;

 for( i=0; irows[i].hnext; ptri!=a->rows+i; ptri=ptri->hnext )

 int row = ptri->col;
 for( ptrj=b->rows[row].hnext; ptrj!=b->rows+row;
ptrj=ptrj->hnext ) {
 sAdd( c, i, ptrj->col, ptri->data*ptrj->data );
 }
 }
 return SUCCESS;
}

int main() {
 spmat a, b, c;

 sInit(&a,M,P); sInit(&b,P,N); sInit(&c,M,N);
 sAdd( &a, 0,1, 2 );
 sAdd( &a, 1,0, 3 );
 sAdd( &a, 3,1, 4 );

 sAdd( &b, 0,2, 5 );
 sAdd( &b, 1,0, 7 );
 sAdd( &b, 1,1, 6 );

 sHPrint(&a,M);
 sHPrint(&b,P);

 sMatMul( &a, &b, &c );
 sHPrint(&c,M);
 sVPrint(&c,N);

 return 0;
}

Explanation

  1. A sparse matrix is represented as two arrays of pointers: one for rows and the other for columns. Each row and each column is represented by horizontal and vertical circular lists. A nonzero entry a[i][j] is added as a node to the horizontal list of row i and the same node in the vertical list of column j. A node represents an entry. Thus it contains row, column, and values along with horizontal and vertical pointers in the lists.
  2. Let the sizes of sparse matrices a, b be M×P, P×N. Thus the size of c is M×N. We describe the algorithm using an example. Let

     | 0 2 |
    a = | 3 0 |, b = | 0 0 5 |
     | 0 0 | | 7 0 6 |
     | 1 4 |
    

    Here M = 4, P = 2, and N = 3. Then we should get

    
     | 14 0 12 |
    c = | 0 0 15 |.
     | 0 0 0 |
     | 28 0 29 |
    
  3. The algorithm traverses each row of a and checks its horizontal list corresponding to each row for any elements in it. For each element in a with row arow and column acol, row acol of b is traversed and the element in a is multiplied by each element in the row-list of b with row acol and column bcol. The products are inserted in c as c[arow][bcol]. Thus, in every step, an element in a contributes to c. If c[arow][bcol] exists, then the product is added to the original value. The multiplication of a and b is given here for the previous example.

    STEP

    AROW

    ACOL

    BCOL

    A[AROW][ACOL]*B [ACOL][BCOL]

    C[AROW][BCOL]

    1

    0

    1

    0

    14

    14

    2

    0

    1

    2

    12

    12

    3

    1

    0

    2

    15

    15

    4

    2

           

    5

    3

    0

    2

    5

    5

    6

    3

    1

    0

    28

    28

    7

    3

    1

    2

    24

    29

    Step 4 signifies that a[4] is taken for traversal but is not traversed as it is empty. Note that the product is 24 in step 7, while c[3][2] gets a value of 29. This happens because c[3][2] already contains a value of 5 in step 5.

  4. Let na, nb be the number of nonzero entries in a and b. The basic step in this algorithm is adding the products of the elements in a and b to c. So even if the outer loop traverses from 0 to M–1, the sAdd() function gets invoked for each entry a[i][j]. This invoking is done for each nonzero element in row j of b. Thus at most, the number of multiplications for a[i][j] will be equal to N, number of columns of b. Thus the complexity of the matrix multiplication algorithm is O(na*N). However, if we add the complexity due to the outermost loop, the complexity is O(na*N+M*nb).

Points to Remember

  1. A slight modification to the usual add(matrix,row,col,value) function resulted in an easing of the implementation of the matrix multiplication. In general, if a[i][j] is inserted and it already exists, then we either overwrite the value or return an error. By adding the new value to the original value, we can add products of elements of a and b to c incrementally.
  2. The sparse matrix, as it is represented by a complicated data structure, should be initialized properly.
  3. An array-based matrix multiplication program has the complexity O(M*P*N). If we use a similar procedure in this representation as in sMatMulBad(), the complexity increases to O(M*P*N*P) as sGetVal is O(P).
  4. Insertions were simplified by the representation of empty rows and columns by head nodes rather than NULL lists.

 

 

PROBLEM IMPLEMENT K WAY SORT MERGE TO SORT A FILE CONTAINING RECORDS

Program

#include #include #define N 80 #define K 2 // K-way merge. #define DATAFILE "main.txt" #define TEMPFILE "temp.txt" typedef struct node node; typedef enum {FALSE, TRUE} bool; struct node { int val; char s[N]; }; node buf[K]; // buffer used for merging. int rec[K]; // record numbers of nodes in buffer. int getNRecords( char *filename ) { /* * returns no of records in file filename using size of the file. */ int off; FILE *fp = fopen( filename, "r" ); fseek (fp, 0, SEEK_END ); off = ftell(fp)/sizeof(node); // no of records. fclose(fp); return off; } void writeToFile( char *filename, node *n ) { /* * writes record n to file filename. */ FILE *fp = fopen( filename, "a" ); fwrite( n, sizeof(node), 1, fp ); fclose(fp); } void readFromFile( char *filename, node *n, int off ) { /* * reads a record at offset off from file filename into n. * off is number of records before the record in the file (NOT bytes). * off starts from 0. */ FILE *fp; //printf( "reading rec no %d... ", off ); if( off >= getNRecords(filename) ) { fprintf( stderr, "ERROR: reading beyond the file. " ); return; } printf("total records are %d ",getNRecords(filename)); fp = fopen( filename, "r" ); fseek( fp, off*sizeof(node), SEEK_CUR ); fread( n, sizeof(node), 1, fp ); fclose(fp); } void writeFun( char *filename ) { /* * writes some data to filename. */ node data[10] = { {5,"five"}, {3,"three"}, {4,"four"}, {8,"eight"}, {7,"seven"}, {6,"six"}, {9,"nine"}, {10,"ten"}, {1,"one"}, {2,"two"} }; int i; for( i=0; i<10; ++i ) writeToFile( filename, data+i ); } void readFun( char *filename ) { /* * reads filename and prints the data. */ node n; int i, nrec = getNRecords(filename); for( i=0; i= nrec ) break; rec[i] = startoff; printf( "buf[%d]=%d. ", i, startoff ); readFromFile( srcfile, buf+i, startoff ); } for( ; i−1; getchar(); } void updatebuf( node *buf, int *rec, int *rec2, int prevrec, int l, char *srcfile, int nrec ) { /* * updates buf+rec2 as rec2[prevrec] was output. * read appropriate record from srcfile if necessary. * rec still contains the original rec nos which can be used for * checking ends of runs. * l is runlength. */ if( rec2[prevrec] < nrec-1 && rec2[prevrec] < rec[prevrec]+l−1 ) { // rec2[prevrec] was NOT the last rec of that run. rec2[prevrec]++; readFromFile( srcfile, buf+prevrec, rec2[prevrec] ); } else { // rec2[prevrec] was the last rec of that run. rec2[prevrec] = −1; // job of this run is over. } } int getMin( node *buf, int *rec2 ) { /* * returns index in buf of that record which has min sorting value. * rec2 is needed for checking whether a buf entry is valid. */ int minval = 9999; int minindex = −1; int i; for( i=0; i−1 && buf[i].val < minval ) { minval = buf[i].val; minindex = i; } return minindex; } void merge( char *srcfile, char *dstfile, node *buf, int *rec2, int l, int nrec ) { /* * rec2 contains record numbers being compared; global rec also contains * the same at this point. * buf contains their actual data. * l is runlength. * srcfile is needed for reading next data. * the data is appended to dstfile. * total no of records being written is min(l*k,nrec-rec[0]). */ int totalrec = l*K; int i; int nrecremaining = nrec-rec[0]; // no of rec in srcfile yet to be written to dstfile. if( nrecremaining < totalrec ) totalrec = nrecremaining; printf( "totalrec=%d nrecremaining=%d. ", totalrec, nrecremaining ); for( i=0; i−1 ) { fprintf( stderr, "ERROR: merge(): all rec2 are −1! " ); return; } //printf( "min=%d. ", nextrec ); // this is the index in rec2 of next record to be output. writeToFile( dstfile, buf+nextrec ); // remove this written record. read new record from srcfile if needed. updatebuf( buf, rec, rec2, nextrec, l, srcfile, nrec ); //printf( "after updatebuf : rec2=%d %d %d. ", rec2[0], rec2[1], rec2[2] ); } } void mergedriver( char *srcfile, char *dstfile ) { /* * sort+merge srcfile and store in dstfile. */ int nrec = getNRecords(srcfile); int i, l; int rec2[K]; char tempname[N]; for( l=1; l

Explanation

  1.  
  2. The function main() creates a test file by using writeFun() and calls mergedriver(). mergedriver() calls the function merge() after reading (fillbuf()) K blocks of the file into memory.
  3.  
  4. The function merge() compares the blocks and sorts them on the predetermined key. The block with the minimum key (getMin()) is written to the file and its block is filled (updatebuf()) with the next record from the file. The sorting and merging thus proceeds simultaneously to successively sort the file.
  5.  
  6. The number of blocks being compared are called runs. Each run length increases with each iteration. It is 1 initially, then it becomes K, then K*K and so on. As it becomes greater than or equal to the number of records in the file (getNRecords()), the file is sorted because all the records in each run are sorted after each iteration.
  7.  
  8. Example: Assume the blocks are saved in a file as shown next and let K=3. Then the algorithm transforms the file as follows:
  9.  
 
 

STEP

FILE

RUNLENGTH

 
 
   
 
     
 
 

1

5 3 4 8 7 6 9 10 2 1

1

 
 
 

2

3 4 5 6 7 8 2 9 10 1

3

 
 
 

3

2 3 4 5 6 7 8 9 10 1

9

 
 
 

4

1 2 3 4 5 6 7 8 9 10

27

 
 

Points to Remember

  1.  
  2. he complexity of merge-sort is O(n logn) where n is the number of records in the file. However, in general, the statements that are added to the complexity are comparisons or a nested assignment. But in case of files, one needs to consider reading and writing of records in the file, as the time required for execution of one such operation is much more than a comparison in memory or a simple assignment.
  3.  
  4. Sorting in files is called external sorting while sorting in main memory is termed internal sort.
  5.  
  6. By exchanging the names of source and destination files in mergedriver(), we avoided copying of files in each iteration.
  7.  

PROBLEM FIND A PLATEAU IN A MATRIX

Find a rectangular region in a matrix with the maximum sum of its elements. The elements may be negative.

Program

#include 

#define COLS 5
#define MININT -99999

int filter(int a[][COLS], int i, int j, int k, int l, int rows, int cols) {
 /*
 * filter the matrix of size k*l starting from a[i][j].
 * size of the matrix is rows*cols.
 * k, l start with 1.
 */
 int iii, jjj;
 int sum = 0;

 if(i+k > rows || j+l > cols) // the matrix was already considered
 // with smaller k, l.
 return MININT;

 for(iii=0; iii maxsum) {
 maxsum=sum, maxrow1=i, maxcol1=j, maxrow2=i+k−1, maxcol2=j+l−1;
 }
 }
 printf("sum=%d.
", filter(a, maxrow1, maxcol1, maxrow2-maxrow1+1, maxcol2-maxcol1+1, rows, cols));
 printMatrix(a, maxrow1, maxcol1, maxrow2-maxrow1+1, maxcol2-maxcol1+1, rows, cols);

}

int main() {
 int a[][COLS] = {
 {5,−1,−2,−4,1},
 {−3,2,10,−6,3},
 {−1,9,−11,−7,9},
 {3,101,3,−2,−96},
 {−1,−2,0,3,−3},
 {−1,−2,−3,2,−3}
 };
 plateau(a, sizeof(a)/COLS/sizeof(int), COLS);

 return 0;
}

Explanation

  1. Considesr the following matrix. The rectangle of elements with the maximum sum is also given.

     | 5 −1 −2 −4 1 | | 5 −1 |
     | −3 2 10 −6 3 | | −3 2 |
     | −1 9 −11−7 9 | | −1 9 | sum = 115.
     | 3 101 3 −2 −96 | | 3 101|
     | −1 −2 0 3 −3 |
     | −1 −2 −3 2 −3 |
    

    An example matrix and its maximum-sum-plateau.

  2. A straightforward algorithm to find a maximum sum is as follows.

    for i=0 to rows-1
     for j=0 to columns-1 {
     maxsum = matrix[i][j].
     for height=1 to rows
     for width=1 to columns {
     let M = matrix of size height x width from matrix[i][j].
     let sum = sum of elements of M.
     if sum > maxsum
     maxsum = sum.
     }
    }
    

    The values of i, j, width, and height at any point represent the matrix. Those values can be saved to print the matrix at the end of the algorithm.

    The complexity of this procedure is O(m*m*m*n*n*n) where the size of the original matrix is m × n. The complexity of finding the sum of elements of M is O(m*n).

  3. This procedure can be made more efficient by noting that not every internal matrix M needs to be generated. For an element matrix[i][j], the maximum width of the matrix starting from the element can be columns-j and maximum height can be rows–i. Thus, the number of iterations of loops over width and height can be decreased. Also, for an element e, if the sum of elements of a rectangular region of size height × width starting from e is negative, then no rectangular region of size more than height × width starting from e can have a sum higher than the final result. We show this in the example matrix.

    Let i=1 and j=2. matrix[i][j] = 10. Let height = 2 and width = 1. Thus the matrix contains two elements in it, {10, −11} and the sum is −1. Since the sum is negative, there is no point in increasing the size of the matrix and considering other elements because whatever their sum (say s), by adding this matrix to it, the total sum is definitely going to be lower than s. The matrix containing sum s may be a candidate for the result. So our matrix cannot have a sum greater than the final result. Thus, if we increase the height of the matrix further so that the elements are {10, −11, 3}, the sum becomes 2 which is less than a 1 × 1 matrix starting from 3. Instead, if we increase the width so that the elements are {10, −11, −6, −7}, the sum is −14 which is even lower.

    The function filter() implements this strategy.

Points to Remember

  1. By considering only those matrices that start from an element e, that is those increasing in width to the right of e and increasing in height downwards from e, we select all the possible rectangular regions without any repetition.
  2. We need not increase the size of a submatrix once the sum of its elements becomes negative.
  3. If all the values in the input matrix are negative, then the result is a 1×1 matrix with the only element having a minimum magnitude.
  4. There can be multiple solutions to this problem.
  5. The complexity of finding the sum of elements of a submatrix can be increased by noting the sum incrementally.

 

 

PROBLEM IMPLEMENTATION OF A HASH SEARCH

Write a program that takes strings as inputs and stores them in a hash table. It should then ask the user for strings to be searched in the hash table. Use shift- folding as the hashing function and chaining for overflow handling.

Program

#include 
#include 
#include 

#define MAXLEN 80
#define HASHSIZE 23 // some prime val.
#define SHIFTBY 3 // each group size in hashing.

typedef struct node node;
typedef char *type;
typedef node *hashtable[HASHSIZE];

struct node {
 int val;
 char *key;
 node *next;
};

int hGetIndex(char *key) {
 /*
 * returns index into hashtable applying hash function.
 * uses shift-folding followed by mod function for hashing.
 */
 int i, n, finaln=0;
 char *keyptr;

 for(keyptr=key; *keyptr; finaln+=n)
 for(i=0, n=0; ikey = strdup(key);
 ptr->val = val;
 ptr->next = h[index];

 h[index] = ptr;
 printf("h[%d] = %s.
", index, key);
}

int hGetVal(hashtable h, char *key) {
 /*
 * returns val corresponding to key if present in h else −1.
 */
 node *ptr;

 for(ptr=h[hGetIndex(key)]; ptr && strcmp(ptr->key, key); ptr=ptr->next)
 ;
 if(ptr)
 return ptr->val;
 return −1;
}

void printHash(hashtable h) {
 /*
 * print the hashtable rowwise.
 */
 int i;
 node *ptr;

 for(i=0; inext)
 printf("%s=%d ", ptr->key, ptr->val);
 printf("
");
 }
}

int main() {
 char s[MAXLEN];
 int i = 0;
 hashtable h = {"abc"};

 printf("Enter the string to be hashed: ");
 gets(s);

 while(*s) {
 hInsert(h, s, i++);
 printf("Enter the string to be hashed(enter to end): ");
 gets(s);
 }
 printf("Enter the string to be searched: ");
 gets(s);

 while(*s) {
 printf("%s was inserted at number %d.
", s, hGetVal(h, s));
 printf("
Enter the string to be searched(enter to end): ");
 gets(s);
 }
 //printHash(h);

 return 0;
}

Explanation

  1. The hash table is maintained as an array of lists. Each list is either empty or contains nodes containing strings that map to the index in the hash table, after application of the hashing function. The string in the node is called the key. Each node also stores an integer that is the number at which the string was inserted in the hash table. Thus, each node contains a (key, value) pair. In a realistic situation, a value can be anything that has a key associated with it.
  2. The program contains two loops. In the first, it asks the user to enter a series of strings and calls hInsert() to insert the strings in the hash table. The second loop asks the user to enter a string and returns the number at which it was inserted. An insertion number of −1 indicates that the string is not present in the hash table. This is done by using the function hGetVal(). Both these functions make use of the hashing function hGetIndex(), which, given a string, returns its hashing index. It folds the string into a pattern of m characters (perhaps except the last), and forms an integer out of each m characters. It then adds all these integers to get another number. This is then divided by the size of the hash table array to get the remainder as an index into the hash table. hInsert() adds this new string and its insertion sequence to a node and this node is added to the start of the list in the index. hGetVal() searches the list at this index for the input string. If it finds such a string, its insertion sequence is returned, otherwise it returns −1.
  3. Since the complexity of insertion of a node in the list is O(1), the complexity of hInsert() is the complexity of the hashing function. The complexity of the hashing function is O(p) where p is the average length of the string. Thus the complexity of hInsert() is O(p). The complexity of hGetVal() is O(p+q) where q is the average number of nodes in each list. Chaining involves a linear search.

Points to Remember

  1. The complexity of insertion in the hash table is decided by the hashing function if simple chaining is used for overflow handling.
  2. The complexity of searching is decided by both the hashing function and the overflow handling technique.
  3. An ideal hash function maps every input string to a different index and thus has zero collisions. Assuming that the complexity of a hash function is O(1), the insertions and searching into an ideal hash table are O(1).
  4. Hash tables are used in compilers for symbol-table management. Hash tables have numerous other applications as well.
  5. Different overflow handling techniques such as linear probing, quadratic probing, random probing, rehashing, etc., are in use depending on the application requirement.

 

 

PROBLEM IMPLEMENTATION OF REHASHING

Write functions to insert and search in a hash table by using the rehashing technique. Use linear probing if the rehashing fails.

Program

#include 
#include 
#include 
#define MAXLEN 80
#define HASHSIZE 23 // some prime val.
 typedef struct node node;
 typedef char *type;
 typedef node *hashtable[HASHSIZE];

 struct node {
 int val;
 type key;
};

int hGetIndex1(type key) {
 /*
 * returns index into hashtable applying hash function.
 * uses sum of elements followed by mod function for hashing.
 */
 int n=0;
 char *keyptr;

 for(keyptr=key; *keyptr; ++keyptr)
 n += *keyptr;

 return n%HASHSIZE;
}

int hGetIndex2(type key) {
 /*
 * returns index into hashtable applying hash function.
 * sums the products of elements with their indices and then mod.
 */
 long n=0;
 int i;
 type keyptr;

 printf("Function 2:).
");
 for(keyptr=key, i=1; *keyptr; ++keyptr, ++i)
 n += i**keyptr;

 return n%HASHSIZE;
}
int hGetEmptySlot(hashtable h, int index) {
 /*
 * search for an empty slot in h starting from index+1.
 */
 int i;

 for(i=index+1; i−1;
}

int hLinearProbe(hashtable h, type key, int index) {
 /*
 * search for node having key in h starting from index+1.
 */
 int i;

 for(i=index+1; ikey, key))
 return i;
 else if(!h[i])
 return −1;
 for(i=0; ikey, key))
 return i;
 else if(!h[i])
 return −1;
 return −1;
}

void hInsert(hashtable h, type key, int val) {
 /*
 * insert s in hashtable h.
 * does NOT check for duplicate insertion.
 */
 node *ptr = (node *)malloc(sizeof(node));
 int index = hGetIndex1(key);

 if(h[index]) {
 index = hGetIndex2(key);
 if(h[index]) {
 index = hGetEmptySlot(h, index);
 if(index == −1) {
 printf("ERROR: Hashtable full.
");
 return;
 }
 }
 }
 ptr->key = strdup(key);
 ptr->val = val;

 h[index] = ptr;
 printf("h[%d] = %s.
", index, key);
}

int hGetVal(hashtable h, type key) {
 /*
 * returns val corresponding to key if present in h else −1.
 */
 int index = hGetIndex1(key);

 if(h[index] && strcmp(h[index]->key, key)) {
 index = hGetIndex2(key);
 if(h[index] && strcmp(h[index]->key, key)) {
 index = hLinearProbe(h, key, index);
 if(index == −1)
 return −1;
 }
 else if(!h[index])
 return −1;
 }
 else if(!h[index])
 return −1;

 printf("index=%d ", index);
 return h[index]->val;
}
void printHash(hashtable h) {
 /*
 * print the hashtable.
 */
 int i;
 for(i=0; ikey, h[i]->val);
 printf("
");
 }
 }

 int main() {
 char s[MAXLEN];
 int i = 0;
 hashtable h = {"asd"};

 printf("Enter the string to be hashed: ");
 gets(s);

 while(*s) {
 hInsert(h, s, i++);
 printf("Enter the string to be hashed(enter to end): ");
 gets(s);
 }
 printf("Enter the string to be searched: ");
 gets(s);

 while(*s) {
 printf("%s was inserted at number %d.
", s, hGetVal(h, s));
 printf("
Enter the string to be searched(enter to end): ");
 gets(s);
 }
 printHash(h);

 return 0;
}

Explanation

  1. Rehashing is one method of handling collisions. If a single application of the hash function results in a collision, and we detect that the space is already occupied by another element, we can use some other hash function to generate another index in the hash table. If this index is also already filled, we can either use another hash function to generate another index or we can use linear probing or chaining. We use linear probing here to guarantee full utilization of the hash table in a finite amount of time.
  2. We use simple mod() function as the first hash function (hGetIndex1()). We add the ASCII values of the input characters and apply mod() to get an index in the range of the hash table. We use a modification of this mod() function as the second hash function (hGetIndex2()). We add the products of the ASCII values of characters and their indices to get a final sum to which we apply the mod function to get the second index. If this slot is also filled, we use linear search (hLinearProbe()) over the hash table starting from index to the end of the hash table, and then starting from the start of the hash table to the index, to get an empty slot. If we do not find an empty slot, we output an error message. Otherwise the index of the empty slot is returned.
  3. An identical procedure is used during searching (hGetVal()). It applies the first hash function to get the first index. If this is empty, −1 is returned. If it contains the node containing the search key, the value corresponding to that key is returned. If the key is different, we apply the second hash function to get another index. If this is empty, −1 is returned. If it contains the node containing the search key, the value corresponding to that key is returned. If the key is different, we apply linear probing (hLinearProbe()) to search for the key in the hash table. If we get such a node, the value corresponding to the key is returned. If such a node does not exist, an error message is printed.
  4. Since linear probing (and not chaining) is used, the hash table is simply an array of pointers to nodes where each node contains only a key and value.
  5. Example:

    • Let HASHSIZE = 23.
    • Let strings to be inserted be as follows: ‘dj’, ‘na’, ‘id’, ‘q’.
    • hGetIndex1(‘dj’) returns 22, so it is inserted in hash table[22].
    • hGetIndex1(‘na’) returns 0, so it is inserted in hash table[0].
    • hGetIndex1(‘id’) returns 21, so it is inserted in hash table[21].
    • hGetIndex1(‘q’) returns 21. Since the space is already filled by ‘id’, hGetIndex2(‘q’) is called. It returns 22. But it is also filled by ‘dj’. So a linear search for an empty slot is done starting from index 22 (wrapping over the hash table). It finds that the next index 0 is filled with ‘na’ and its next index 1 is empty. Hence ‘q’ gets inserted in hash table[1].

Points to Remember

  1. Since no deletions are taking place in the hash table, we can reduce the complexity of searching for an element. This is because during searching, if we get an empty slot, that means the search key cannot be present beyond that location. This follows from the predicate that a value is inserted into the first empty slot we get.
  2. The hash table could have been an array of nodes instead of node pointers. But our implementation can save space if the val field is of a larger size and if many slots of the hash table are empty.
  3. The complexities of both hash functions is O(n) where n is the average length of the string to be inserted. The complexity of linear probing is O(HASHSIZE) where HASHSIZE is the hashtable size.
  4. The advantage of modular programming over hash.c is that the function main() remains the same even after changing the implementation of the hashing procedure.

 

 



C & Data Structures
C & Data Structures (Charles River Media Computer Engineering)
ISBN: 1584503386
EAN: 2147483647
Year: 2006
Pages: 232

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net