Using the Hash Object


Why Use the Hash Object?

The hash object provides an efficient, convenient mechanism for quick data storage and retrieval. The hash object stores and retrieves data based on lookup keys.

To use the DATA step Component Object Interface, follow these steps:

  1. Declare the hash object.

  2. Create an instance of ( instantiate ) the hash object.

  3. Initialize look-up keys and data.

After you declare and instantiate a hash object, you can perform many tasks , including the following:

  • Store and retrieve data.

  • Replace and remove data.

  • Output a data set that contains the data in the hash object.

For example, suppose that you have a large data set that contains numeric lab results that correspond to patient number and weight and a small data set that contains patient numbers (a subset of those in the large data set). You can load the large data set into a hash object using the patient number as the key and the weight values as the data. You can then iterate over the small data set using the patient number to look up the current patient in the hash object whose weight is over a certain value and output that data to a different data set.

Depending on the number of lookup keys and the size of the data set, the hash object lookup can be significantly faster than a standard format lookup.

Declaring and Instantiating a Hash Object

You declare a hash object using the DECLARE statement. After you declare the new hash object, use the _NEW_ statement to instantiate the object.

 declare hash myhash;   myhash = _new_ hash(); 

The DECLARE statement tells the compiler that the variable MYHASH is of type hash. At this point, you have only declared the variable MYHASH. It has the potential to hold a component object of type hash. You should declare the hash object only once. The _NEW_ statement creates an instance of the hash object and assigns it to the variable MYHASH.

As an alternative to the two-step process of using the DECLARE and the _NEW_ statement to declare and instantiate a component object, you can use the DECLARE statement to declare and instantiate the component object in one step.

 declare hash myhash(); 

The above statement is equivalent to the following code:

 declare hash myhash;   myhash = _new_ hash(); 

For more information about the "DECLARE Statement" and the "_NEW_ Statement", see SAS Language Reference: Dictionary .

Initializing Hash Object Data Using a Constructor

When you create a hash object, you might want to provide initialization data. A constructor is a method that you can use to instantiate a hash object and initialize the hash object data.

The hash object constructor can have either of the following formats:

  •  declare hash  variable_name  (  argument_tag-1  :  value-1  <, ...  argument_tag-n  :  value-n  >); 
  •   variable_name  = _new_ hash(  argument_tag-1  :  value-1  <, ...  argument_tag-n  :  value-n  >); 

These are the valid hash object argument tags:

hashexp: n

is the hash object's internal table size, where the size of the hash table is 2 n .

The value of hashexp is used as a power-of-two exponent to create the hash table size. For example, a value of 4 for hashexp equates to a hash table size of 2 4 , or 16. The maximum value for hashexp is 16, which equates to a hash table size of 2 16 or 65536.

The hash table size is not equal to the number of items that can be stored. Think of the hash table as an array of containers. A hash table size of 16 would have 16 containers. Each container can hold an infinite number of items. The efficiency of the hash tables lies in the ability of the hash function to map items to and retrieve items from the containers.

In order to maximize the efficiency of the hash object lookup routines, you should set the hash table size according to the amount of data in the hash object. Try different hashexp values until you get the best result. For example, if the hash object contains one million items, a hash table size of 16 (hashexp = 4) would not be very efficient. A hash table size of 512 or 1024 (hashexp = 9 or 10) would result in better performance.

Default: 8, which equates to a hash table size of 2 8 or 256.

dataset: ˜ dataset_name '

is the name of a SAS data set to load into the hash object.

The name of the SAS data set can be a literal or a character variable. The data set name must be enclosed in single or double quotation marks. Macro variables must be in double quotation marks.

Note  

If the data set contains duplicate keys, the first instance will be in the hash object; subsequent instances will be ignored.

ordered: ˜ option '

specifies whether or how the data is returned in key-value order if you use the hash object with a hash iterator object or if you use the hash object OUTPUT method. option can be one of the following values:

˜ ascending ' ˜a'

Data is returned in ascending key-value order. Specifying ˜ ascending ' is the same as specifying ˜ yes '.

˜descending' ˜d'

Data is returned in descending key-value order.

˜YES' ˜Y'

Data is returned in ascending key-value order. Specifying ˜ yes ' is the same as specifying ˜ ascending '.

˜NO' ˜N'

Data is returned in an undefined order.

Default: NO

The argument can also be enclosed in double quotation marks.

For more information on the "DECLARE Statement" and the "_NEW_ Statement", see SAS Language Reference: Dictionary .

Defining Keys and Data

The hash object uses lookup keys to store and retrieve data. The keys and the data are DATA step variables that you use to initialize the hash object by using dot notation method calls. A key is defined by passing the key variable name to the DEFINEKEY method. Data is defined by passing the data variable name to the DEFINEDATA method. When all key and data variables have been defined, the DEFINEDONE method is called. Keys and data can consist of any number of character or numeric DATA step variables.

For example, the following code initializes a character key and a character data variable.

 length d ;   length k ;   if _N_ = 1 then do;      declare hash h(hashexp: 4);      rc = h.defineKey('k');      rc = h.defineData('d');      rc = h.defineDone();   end; 

You can have multiple key and data variables. You can store more than one data item with a particular key. For example, you could modify the previous example to store auxiliary numeric values with the character key and data. In this example, each key and each data item consists of a character value and a numeric value.

 length d1 8;   length d2 ;   length k1 ;   length k2 8;   if _N_ = 1 then do;      declare hash h(hashexp: 4);      rc = h.defineKey('k1', 'k2');      rc = h.defineData('d1', 'd2');      rc = h.defineDone();   end; 

For more information about the "DEFINEDATA Method", the "DEFINEDONE Method", and the "DEFINEKEY Method", see SAS Language Reference: Dictionary .

Note  

The hash object does not assign values to key variables (for example, h.find(key: ˜abc')), and the SAS compiler cannot detect the implicit key and data variable assignments done by the hash object and the hash iterator. Therefore, if no explicit assignment to a key or data variable appears in the program, SAS will issue a note stating that the variable is uninitialized . To avoid receiving these notes, you can perform one of the following actions:

  • Set the NONOTES system option.

  • Provide an initial assignment statement (typically to a missing value) for each key and data variable.

  • Use the CALL MISSING routine with all the key and data variables as parameters. Here is an example.

 length d ;         length k ;         if _N_ = 1 then do;            declare hash h(hashexp: 4);            rc = h.defineKey('k');            rc = h.defineData('d');            rc = h.defineDone();               call missing(k, d);         end;   

Storing and Retrieving Data

After you initialize the hash object's key and data variables, you can store data in the hash object using the ADD method, or you can use the dataset argument tag to quickly load a data set into the hash object.

You can then use the FIND method to search and retrieve data from the hash object.

For more information about the "ADD Method" and the "FIND Method", see SAS Language Reference: Dictionary .

Note  

You can also use the hash iterator object to retrieve the hash object data, one data element at a time, in forward and reverse order. For more information, see "Using the Hash Iterator Object" on page 445.

Example 1: Using the ADD and FIND Methods to Store and Retrieve Data

The following example uses the ADD method to store the data in the hash object and associate the data with the key. The FIND method is then used to retrieve the data that is associated with the key value ˜Homer'.

 data _null_;   length d ;   length k ;   /* Declare the hash object and key and data variables */   if _N_ = 1 then do;      declare hash h(hashexp: 4);      rc = h.defineKey('k');      rc = h.defineData('d');      rc = h.defineDone();   end;   /* Define constant value for key and data */   k = 'Homer';   d = 'Odyssey';   /* Use the ADD method to add the key and data to the hash object */   rc = h.add();   if (rc ne 0) then      put 'Add failed.';   /* Define constant value for key and data */   k = 'Joyce';   d = 'Ulysses';   /* Use the ADD method to add the key and data to the hash object */  rc = h.add();  if (rc ne 0) then      put 'Add failed.';   k = 'Homer';   /* Use the FIND method to retrieve the data associated with 'Homer' key */  rc = h.find();  if (rc = 0) then      put d=;   else      put 'Key Homer not found.';   run; 

The FIND method assigns the data value ˜Odyssey', which is associated with the key value ˜Homer', to the variable D.

Example 2: Loading a Data Set and Using the FIND Method to Retrieve Data

Assume the data set SMALL contains two numeric variables K (key) and S (data) and another data set, LARGE, contains a corresponding key variable K. The following code loads the SMALL data set into the hash object, and then searches the hash object for key matches on the variable K from the LARGE data set.

 data match;      length k 8;      length s 8;      if _N_ = 1 then do;         /* load SMALL data set into the hash object */  declare hash h(dataset: "work.small", hashexp: 6);  /* define SMALL data set variable K as key and S as value */         h.defineKey('k');         h.defineData('s');         h.defineDone();         /* avoid uninitialized variable notes */         call missing(k, s);      end;   /* use the SET statement to iterate over the LARGE data set using */   /* keys in the LARGE data set to match keys in the hash object */   set large;  rc = h.find();  if (rc = 0) then output;   run; 

The dataset argument tag specifies the SMALL data set whose keys and data will be read and loaded by the hash object during the DEFINEDONE method. The FIND method is then used to retrieve the data.

Replacing and Removing Data

You can remove or replace data in the hash object.

In the following example, the REPLACE method replaces the data ˜Odyssey' with ˜Iliad' and the REMOVE method deletes the entire data entry associated with the ˜Joyce' key from the hash object.

 data _null_;   length d ;   length k ;   /* Declare the hash object and key and data variables */   if _N_ = 1 then do;      declare hash h(hashexp: 4);      rc = h.defineKey('k');      rc = h.defineData('d');      rc = h.defineDone();   end;   /* Define constant value for key and data */   k = 'Joyce';   d = 'Ulysses';   /* Use the ADD method to add the key and data to the hash object */   rc = h.add();   if (rc ne 0) then      put 'Add failed.';   /* Define constant value for key and data */   k = 'Homer';   d = 'Odyssey';   /* Use the ADD method to add the key and data to the hash object */   rc = h.add();   if (rc ne 0) then      put 'Add failed.';   /* Use the REPLACE method to replace 'Odyssey' with 'Iliad' */   k = 'Homer';   d = 'Iliad';  rc = h.replace();  if (rc = 0) then      put d=;   else      put 'Replace not successful.';   /* Use the REMOVE method to remove the 'Joyce' key and data */   k = 'Joyce';  rc = h.remove();  if (rc = 0) then      put k 'removed from hash object';   else      put 'Deletion not successful.';   run; 
Note  

If an associated hash iterator is pointing to the key, the REMOVE method will not remove the key or data from the hash object. An error message is issued.

For more information on the "REMOVE Method" and the "REPLACE Method", see SAS Language Reference: Dictionary .

Saving Hash Object Data in a Data Set

You can create a data set that contains the data in a specified hash object by using the OUTPUT method. In the following example, two keys and data are added to the hash object and then output to the Work.out data set.

 data test;   length d1 8;   length d2 ;   length k1 ;   length k2 8;   /* Declare the hash object and two key and data variables */   if _N_ = 1 then do;      declare hash h(hashexp: 4);      rc = h.defineKey('k1', 'k2');      rc = h.defineData('d1', 'd2');      rc = h.defineDone();   end;   /* Define constant value for key and data */   k1 = 'Joyce';   k2 = 1001;   d1 = 3;   d2 = 'Ulysses';   rc = h.add();   /* Define constant value for key and data */   k1 = 'Homer';   k2 = 1002;   d1 = 5;   d2 = 'Odyssey';   rc = h.add();   /* Use the OUTPUT method to save the hash object data to the OUT data set */  rc = h.output(dataset: "work.out");  run;   proc print data=work.out;   run; 

The following output shows the report that PROC PRINT generates.

Output 24.1: Data Set Created from the Hash Object
start example
 The SAS System                             1                                  Obs    d1      d2                                   1      5    Odyssey                                   2      3    Ulysses 
end example
 

Note that the hash object keys are not stored as part of the output data set. If you want to include the keys in the output data set, you must define the keys as data in the DEFINEDATA method. In the previous example, the DEFINEDATA method would be written this way:

 rc = h.defineData('k1', 'k2', 'd1', 'd2'); 

For more information on the "OUTPUT Method", see SAS Language Reference: Dictionary .




SAS 9.1.3 Language Reference. Concepts
SAS 9.1.3 Language Reference: Concepts, Third Edition, Volumes 1 and 2
ISBN: 1590478401
EAN: 2147483647
Year: 2004
Pages: 258

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net