Procedure features:
CREATE TABLE statement
column-modifier
INSERT statement
VALUES clause
SELECT clause
FROM clause
Table: PROCLIB.PAYLIST
This example creates the table PROCLIB.PAYLIST and inserts data into it.
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the PROCLIB.PAYLIST table. The CREATE TABLE statement creates PROCLIB.PAYLIST with six empty columns. Each column definition indicates whether the column is character or numeric. The number in parentheses specifies the width of the column. INFORMAT= and FORMAT= assign date informats and formats to the Birth and Hired columns .
proc sql; create table proclib.paylist (IdNum char(4), Gender char(1), Jobcode char(3), Salary num, Birth num informat=date7. format=date7., Hired num informat=date7. format=date7.);
Insert values into the PROCLIB.PAYLIST table. The INSERT statement inserts data values into PROCLIB.PAYLIST according to the position in the VALUES clause. Therefore, in the first VALUES clause, 1639 is inserted into the first column, F into the second column, and so forth. Dates in SAS are stored as integers with 0 equal to January 1, 1960. Suffixing the date with a d is one way to use the internal value for dates.
insert into proclib.paylist values('1639','F','TA1',42260,'26JUN70'd,'28JAN91'd) values('1065','M','ME3',38090,'26JAN54'd,'07JAN92'd) values('1400','M','ME1',29769.'05NOV67'd,'16OCT90'd)
Include missing values in the data. The value null represents a missing value for the character column Jobcode. The period represents a missing value for the numeric column Salary.
values('1561','M',null,36514,'30NOV63'd,'07OCT87'd) values('1221','F','FA3',.,'22SEP63'd,'04OCT94'd);
Specify the title.
title 'PROCLIB.PAYLIST Table';
Display the entire PROCLIB.PAYLIST table. The SELECT clause selects columns from PROCLIB.PAYLIST. The asterisk (*) selects all columns. The FROM clause specifies PROCLIB.PAYLIST as the table to select from.
select * from proclib.paylist;
PROCLIB.PAYLIST
PROCLIB.PAYLIST Table Id Num Gender Jobcode Salary Birth Hired ------------------------------------------------- 1639 F TA1 42260 26JUN70 28JAN91 1065 M ME3 38090 26JAN54 07JAN92 1400 M ME1 29769 05NOV67 16OCT90 1561 M 36514 30NOV63 07OCT87 1221 F FA3 . 22SEP63 04OCT94
Procedure features:
CREATE TABLE statement
AS query-expression
SELECT clause
column alias
FORMAT= column-modifier
object-item
Other features:
data set option
OBS=
Tables:
PROCLIB.PAYROLL, PROCLIB.BONUS
This example builds a column with an arithmetic expression and creates the PROCLIB.BONUS table from the query s result.
PROCLIB.PAYROLL (Partial Listing)
PROCLIB.PAYROLL First 10 Rows Only Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1919 M TA2 34376 12SEP60 04JUN87 1653 F ME2 35108 15OCT64 09AUG90 1400 M ME1 29769 05NOV67 16OCT90 1350 F FA3 32886 31AUG65 29JUL90 1401 M TA3 38822 13DEC50 17NOV85 1499 M ME3 43025 26APR54 07JUN80 1101 M SCP 18723 06JUN62 01OCT90 1333 M PT2 88606 30MAR61 10FEB81 1402 M TA2 32615 17JAN63 02DEC90 1479 F TA3 38785 22DEC68 05OCT89
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the PROCLIB.BONUS table. The CREATE TABLE statement creates the table PROCLIB.BONUS from the result of the subsequent query.
proc sql; create table proclib.bonus as
Select the columns to include. The SELECT clause specifies that three columns will be in the new table: IdNumber, Salary, and Bonus. FORMAT= assigns the DOLLAR8. format to Salary. The Bonus column is built with the SQL expression salary*.025 .
select IdNumber, Salary format=dollar8., salary*.025 as Bonus format=dollar8. from proclib.payroll;
Specify the title.
title 'BONUS Information';
Display the first 10 rows of the PROCLIB.BONUS table. The SELECT clause selects columns from PROCLIB.BONUS. The asterisk (*) selects all columns. The FROM clause specifies PROCLIB.BONUS as the table to select from. The OBS= data set option limits the printing of the output to 10 rows.
select * from proclib.bonus(obs=10);
PROCLIB.BONUS
BONUS Information Id Number Salary Bonus -------------------------- 1919 ,376 9 1653 ,108 8 1400 ,769 4 1350 ,886 2 1401 ,822 1 1499 ,025 ,076 1101 ,723 8 1333 ,606 ,215 1402 ,615 5 1479 ,785 0
Procedure features:
ALTER TABLE statement
DROP clause
MODIFY clause
UPDATE statement
SET clause
CASE expression
Table: EMPLOYEES
This example updates data values in the EMPLOYEES table and drops a column.
data Employees; input IdNum . +2 LName . FName . JobCode . +1 Salary 5. +1 Phone .; datalines; 1876 CHIN JACK TA1 42400 212/588-5634 1114 GREENWALD JANICE ME3 38000 212/588-1092 1556 PENNINGTON MICHAEL ME1 29860 718/383-5681 1354 PARKER MARY FA3 65800 914/455-2337 1130 WOOD DEBORAH PT2 36514 212/587-0013 ;
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Display the entire EMPLOYEES table. The SELECT clause displays the table before the updates. The asterisk (*) selects all columns for display. The FROM clause specifies EMPLOYEES as the table to select from.
proc sql; title 'Employees Table'; select * from Employees;
Update the values in the Salary column. The UPDATE statement updates the values in EMPLOYEES. The SET clause specifies that the data in the Salary column be multiplied by 1.04 when the job code ends with a 1 and 1.025 for all other job codes. (The two underscores represent any character.) The CASE expression returns a value for each row that completes the SET clause.
update employees set salary=salary* case when jobcode like '__1' then 1.04 else 1.025 end;
Modify the format of the Salary column and delete the Phone column. The ALTER TABLE statement specifies EMPLOYEES as the table to alter. The MODIFY clause permanently modifies the format of the Salary column. The DROP clause permanently drops the Phone column.
alter table employees modify salary num format=dollar8. drop phone;
Specify the title.
title 'Updated Employees Table';
Display the entire updated EMPLOYEES table. The SELECT clause displays the EMPLOYEES table after the updates. The asterisk (*) selects all columns.
select * from employees;
Employees Table 1 Id Job Num LName FName Code Salary Phone ------------------------------------------------------------ 1876 CHIN JACK TA1 42400 212/588-5634 1114 GREENWALD JANICE ME3 38000 212/588-1092 1556 PENNINGTON MICHAEL ME1 29860 718/383-5681 1354 PARKER MARY FA3 65800 914/455-2337 1130 WOOD DEBORAH PT2 36514 212/587-0013
Updated Employees Table 2 Id Job Num LName FName Code Salary ---------------------------------------------- 1876 CHIN Jack TA1 ,096 1114 GREENWALD JANICE ME3 ,950 1556 PENNINGTON MICHAEL ME1 ,054 1354 PARKER MARY FA3 ,445 1130 WOOD DEBORAH PT2 ,427
Procedure features:
FROM clause
table alias
inner join
joined-table component
PROC SQL statement option
NUMBER
WHERE clause
IN condition
Tables: PROCLIB.STAFF, PROCLIB.PAYROLL
This example joins two tables in order to get more information about data that are common to both tables.
PROCLIB.STAFF (Partial Listing)
PROCLIB.STAFF First 10 Rows Only Id Num Lname Fname City State Hphone ---------------------------------------------------------------------------- 1919 ADAMS GERALD STAMFORD CT 203/781-1255 1653 ALIBRANDI MARIA BRIDGEPORT CT 203/675-7715 1400 ALHERTANI ABDULLAH NEW YORK NY 212/586-0808 1350 ALVAREZ MERCEDES NEW YORK NY 718/383-1549 1401 ALVAREZ CARLOS PATERSON NJ 201/732-8787 1499 BAREFOOT JOSEPH PRINCETON NJ 201/812-5665 1101 BAUCOM WALTER NEW YORK NY 212/586-8060 1333 BANADYGA JUSTIN STAMFORD CT 203/781-1777 1402 BLALOCK RALPH NEW YORK NY 718/384-2849 1479 BALLETTI MARIE NEW YORK NY 718/384-8816
PROCLIB.PAYROLL (Partial Listing)
PROCLIB.PAYROLL First 10 Rows Only Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1919 M TA2 34376 12SEP60 04JUN87 1653 F ME2 35108 15OCT64 09AUG90 1400 M ME1 29769 05NOV67 16OCT90 1350 F FA3 32886 31AUG65 29JUL90 1401 M TA3 38822 13DEC50 17NOV85 1499 M ME3 43025 26APR54 07JUN80 1101 M SCP 18723 06JUN62 01OCT90 1333 M PT2 88606 30MAR61 10FEB81 1402 M TA2 32615 17JAN63 02DEC90 1479 F TA3 38785 22DEC68 05OCT89
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
Add row numbers to PROC SQL output. NUMBER adds a column that contains the row number.
proc sql number;
Specify the title.
title 'Information for Certain Employees Only';
Select the columns to display. The SELECT clause selects the columns to show in the output.
select Lname, Fname, City, State, IdNumber, Salary, Jobcode
Specify the tables from which to obtain the data. The FROM clause lists the tables to select from.
from proclib.staff, proclib.payroll
Specify the join criterion and subset the query. The WHERE clause specifies that the tables are joined on the ID number from each table. WHERE also further subsets the query with the IN condition, which returns rows for only four employees.
where idnumber=idnum and idnum in ('1919', '1400', '1350', '1333');
Information for Certain Employees Only Id Row Lname Fname City State Number Salary Jobcode ------------------------------------------------------------------------ 1 ADAMS GERALD STAMFORD CT 1919 34376 TA2 2 ALHERTANI ABDULLAH NEW YORK NY 1400 29769 ME1 3 ALVAREZ MERCEDES NEW YORK NY 1350 32886 FA3 4 BANADYGA JUSTIN STAMFORD CT 1333 88606 PT2
Procedure features:
DELETE statement
IS condition
RESET statement option
DOUBLE
UNION set operator
Tables: PROCLIB.NEWPAY, PROCLIB.PAYLIST, PROCLIB.PAYLIST2
This example creates a new table, PROCLIB.NEWPAY, by concatenating two other tables: PROCLIB.PAYLIST and PROCLIB.PAYLIST2.
PROCLIB.PAYLIST
Information for Certain Employees Only Id Num Gender Jobcode Salary Birth Hired ------------------------------------------------- 1639 F TA1 42260 26JUN70 28JAN91 1065 M ME3 38090 26JAN54 07JAN92 1400 M ME1 29769 05NOV67 16OCT90 1561 M 36514 30NOV63 07OCT87 1221 F FA3 . 22SEP63 04OCT94
PROCLIB.PAYLIST2
PROCLIB.PAYLIST2 Table Id Num Gender Jobcode Salary Birth Hired ------------------------------------------------- 1919 M TA2 34376 12SEP66 04JUN87 1653 F ME2 31896 15OCT64 09AUG92 1350 F FA3 36886 31AUG55 29JUL91 1401 M TA3 38822 13DEC55 17NOV93 1499 M ME1 23025 26APR74 07JUN92
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the PROCLIB.NEWPAY table. The SELECT clauses select all the columns from the tables that are listed in the FROM clauses. The UNION set operator concatenates the query results that are produced by the two SELECT clauses.
proc sql; create table proclib.newpay as select * from proclib.paylist union select * from proclib.paylist2;
Delete rows with missing Jobcode or Salary values. The DELETE statement deletes rows from PROCLIB.NEWPAY that satisfy the WHERE expression. The IS condition specifies rows that contain missing values in the Jobcode or Salary column.
delete from proclib.newpay where jobcode is missing or salary is missing;
Reset the PROC SQL environment and double-space the output. RESET changes the procedure environment without stopping and restarting PROC SQL. The DOUBLE option double-spaces the output. (The DOUBLE option has no effect on ODS output.)
reset double;
Specify the title.
title 'Personnel Data';
Display the entire PROCLIB.NEWPAY table. The SELECT clause selects all columns from the newly created table, PROCLIB.NEWPAY.
select * from proclib.newpay;
Personnel Data Id Num Gender Jobcode Salary Birth Hired ------------------------------------------------- 1065 M ME3 38090 26JAN54 07JAN92 1350 F FA3 36886 31AUG55 29JUL91 1400 M ME1 29769 05NOV67 16OCT90 1401 M TA3 38822 13DEC55 17NOV93 1499 M ME1 23025 26APR74 07JUN92 1639 F TA1 42260 26JUN70 28JAN91 1653 F ME2 31896 15OCT64 09AUG92 1919 M TA2 34376 12SEP66 04JUN87
Procedure features:
DESCRIBE TABLE statement
DICTIONARY. table- name component
Table: DICTIONARY. MEMBERS
This example uses DICTIONARY tables to show a list of the SAS files in a SAS data library. If you do not know the names of the columns in the DICTIONARY table that you are querying, then use a DESCRIBE TABLE statement with the table.
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page. SOURCE writes the programming statements to the SAS log.
options nodate pageno=1 source linesize=80 pagesize=60;
List the column names from the DICTIONARY.MEMBERS table. DESCRIBE TABLE writes the column names from DICTIONARY.MEMBERS to the SAS log.
proc sql; describe table dictionary.members;
Specify the title.
title 'SAS Files in the PROCLIB Library';
Display a list of files in the PROCLIB library. The SELECT clause selects the MEMNAME and MEMTYPE columns. The FROM clause specifies DICTIONARY.MEMBERS as the table to select from. The WHERE clause subsets the output to include only those rows that have a libref of PROCLIB in the LIBNAME column.
select memname, memtype from dictionary.members where libname='PROCLIB';
277 options nodate pageno=1 source linesize=80 pagesize=60; 278 279 proc sql; 280 describe table dictionary.members; NOTE: SQL table DICTIONARY.MEMBERS was created like: create table DICTIONARY.MEMBERS ( libname char(8) label='Library Name', memname char(32) label='Member Name', memtype char(8) label='Member Type', engine char(8) label='Engine Name', index char(32) label='Indexes', path char(1024) label='Path Name' ); 281 title 'SAS Files in the PROCLIB Library'; 282 283 select memname, memtype 284 from dictionary.members 285 where libname='PROCLIB';
SAS Files in the PROCLIB Library Member Member Name Type ------------------------------------------ ALL DATA BONUS DATA BONUS95 DATA DELAY DATA HOUSES DATA INTERNAT DATA MARCH DATA NEWPAY DATA PAYLIST DATA PAYLIST2 DATA PAYROLL DATA PAYROLL2 DATA SCHEDULE DATA SCHEDULE2 DATA STAFF DATA STAFF2 DATA SUPERV DATA SUPERV2 DATA
Procedure features:
joined-table component
left outer join
SELECT clause
COALESCE function
WHERE clause
CONTAINS condition
Tables: PROCLIB.PAYROLL, PROCLIB.PAYROLL2
This example illustrates a left outer join of the PROCLIB.PAYROLL and PROCLIB.PAYROLL2 tables.
PROCLIB.PAYROLL (Partial Listing)
PROCLIB.PAYROLL First 10 Rows Only Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1009 M TA1 28880 02MAR59 26MAR92 1017 M TA3 40858 28DEC57 16OCT81 1036 F TA3 39392 19MAY65 23OCT84 1037 F TA1 28558 10APR64 13SEP92 1038 F TA1 26533 09NOV69 23NOV91 1050 M ME2 35167 14JUL63 24AUG86 1065 M ME2 35090 26JAN44 07JAN87 1076 M PT1 66558 14OCT55 03OCT91 1094 M FA1 22268 02APR70 17APR91 1100 M BCK 25004 01DEC60 07MAY88
PROCLIB.PAYROLL2
PROCLIB.PAYROLL2 Id Num Sex Jobcode Salary Birth Hired ---------------------------------------------- 1036 F TA3 42465 19MAY65 23OCT84 1065 M ME3 38090 26JAN44 07JAN87 1076 M PT1 69742 14OCT55 03OCT91 1106 M PT3 94039 06NOV57 16AUG84 1129 F ME3 36758 08DEC61 17AUG91 1221 F FA3 29896 22SEP67 04OCT91 1350 F FA3 36098 31AUG65 29JUL90 1369 M TA3 36598 28DEC61 13MAR87 1447 F FA1 22123 07AUG72 29OCT92 1561 M TA3 36514 30NOV63 07OCT87 1639 F TA3 42260 26JUN57 28JAN84 1998 M SCP 23100 10SEP70 02NOV92
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Limit the number of output rows. OUTOBS= limits the output to 10 rows.
proc sql outobs=10;
Specify the title for the first query.
title 'Most Current Jobcode and Salary Information';
Select the columns. The SELECT clause lists the columns to select. Some column names are prefixed with a table alias because they are in both tables. LABEL= and FORMAT= are column modifiers.
select p.IdNumber, p.Jobcode, p.Salary, p2.jobcode label='New Jobcode', p2.salary label='New Salary' format=dollar8.
Specify the type of join. The FROM clause lists the tables to join and assigns table aliases. The keywords LEFT JOIN specify the type of join. The order of the tables in the FROM clause is important. PROCLIB.PAYROLL is listed first and is considered the left table. PROCLIB.PAYROLL2 is the right table.
from proclib.payroll as p left join proclib.payroll2 as p2
Specify the join criterion. The ON clause specifies that the join be performed based on the values of the ID numbers from each table.
on p.IdNumber=p2.idnum;
As the output shows, all rows from the left table, PROCLIB.PAYROLL, are returned. PROC SQL assigns missing values for rows in the left table, PAYROLL, that have no matching values for IdNum in PAYROLL2.
Most Current Jobcode and Salary Information Id New New Number Jobcode Salary Jobcode Salary -------------------------------------------- 1009 TA1 28880 . 1017 TA3 40858 . 1036 TA3 39392 TA3 ,465 1037 TA1 28558 . 1038 TA1 26533 . 1050 ME2 35167 . 1065 ME2 35090 ME3 ,090 1076 PT1 66558 PT1 ,742 1094 FA1 22268 . 1100 BCK 25004 .
Specify the title for the second query.
title 'Most Current Jobcode and Salary Information';
Select the columns and coalesce the Jobcode columns. The SELECT clause lists the columns to select. COALESCE overlays the like-named columns. For each row, COALESCE returns the first nonmissing value of either P2.JOBCODE or P.JOBCODE. Because P2.JOBCODE is the first argument, if there is a nonmissing value for P2.JOBCODE, COALESCE returns that value. Thus, the output contains the most recent job code information for every employee. LABEL= assigns a column label.
select p.idnumber, coalesce(p2.jobcode,p.jobcode) label='Current Jobcode',
Coalesce the Salary columns. For each row, COALESCE returns the first nonmissing value of either P2.SALARY or P.SALARY. Because P2.SALARY is the first argument, if there is a nonmissing value for P2.SALARY, then COALESCE returns that value. Thus, the output contains the most recent salary information for every employee.
coalesce(p2.salary,p.salary) label='Current Salary' format=dollar8.
Specify the type of join and the join criterion. The FROM clause lists the tables to join and assigns table aliases. The keywords LEFT JOIN specify the type of join. The ON clause specifies that the join is based on the ID numbers from each table.
from proclib.payroll p left join proclib.payroll2 p2 on p.IdNumber=p2.idnum;
Most Current Jobcode and Salary Information Id Current Current Number Jobcode Salary ------------------------- 1009 TA1 ,880 1017 TA3 ,858 1036 TA3 ,465 1037 TA1 ,558 1038 TA1 ,533 1050 ME2 ,167 1065 ME3 ,090 1076 PT1 ,742 1094 FA1 ,268 1100 BCK ,004
Subset the query. The WHERE clause subsets the left join to include only those rows containing the value TA .
title 'Most Current Information for Ticket Agents'; select p.IdNumber, coalesce(p2.jobcode,p.jobcode) label='Current Jobcode', coalesce(p2.salary,p.salary) label='Current Salary' from proclib.payroll p left join proclib.payroll2 p2 on p.IdNumber=p2.idnum where p2.jobcode contains 'TA';
Most Current Information for Ticket Agents Id Current Current Number Jobcode Salary ------------------------- 1036 TA3 42465 1369 TA3 36598 1561 TA3 36514 1639 TA3 42260
Procedure features:
CREATE VIEW statement
GROUP BY clause
SELECT clause
COUNT function
HAVING clause
Other features:
AVG summary function
data set option
PW=
Tables: PROCLIB.PAYROLL, PROCLIB.JOBS
This example creates the PROC SQL view PROCLIB.JOBS from the result of a query-expression.
PROCLIB.PAYROLL (Partial Listing)
PROCLIB.PAYROLL First 10 Rows Only Id Number Gender Jobcode Salary Birth Hired --------------------------------------------------- 1009 M TA1 28880 02MAR59 26MAR92 1017 M TA3 40858 28DEC57 16OCT81 1036 F TA3 39392 19MAY65 23OCT84 1037 F TA1 28558 10APR64 13SEP92 1038 F TA1 26533 09NOV69 23NOV91 1050 M ME2 35167 14JUL63 24AUG86 1065 M ME2 35090 26JAN44 07JAN87 1076 M PT1 66558 14OCT55 03OCT91 1094 M FA1 22268 02APR70 17APR91 1100 M BCK 25004 01DEC60 07MAY88
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the PROCLIB.JOBS view. CREATE VIEW creates the PROC SQL view PROCLIB.JOBS. The PW= data set option assigns password protection to the data that is generated by this view.
proc sql; create view proclib.jobs(pw=red) as
Select the columns. The SELECT clause specifies four columns for the view: Jobcode and three columns, Number, AVGAGE, and AVGSAL, whose values are the products functions. COUNT returns the number of nonmissing values for each job code because the data is grouped by Jobcode. LABEL= assigns a label to the column.
select Jobcode, count(jobcode) as number label='Number',
Calculate the Avgage and Avgsal columns. The AVG summary function calculates the average age and average salary for each job code.
avg(int((today()-birth)/365.25)) as avgage format=2. label='Average Age', avg(salary) as avgsal format=dollar8. label='Average Salary'
Specify the table from which the data is obtained. The FROM clause specifies PAYROLL as the table to select from. PROC SQL assumes the libref of PAYROLL to be PROCLIB because PROCLIB is used in the CREATE VIEW statement.
from payroll
Organize the data into groups and specify the groups to include in the output. The GROUP BY clause groups the data by the values of Jobcode. Thus, any summary statistics are calculated for each grouping of rows by value of Jobcode. The HAVING clause subsets the grouped data and returns rows for job codes that contain an average age of greater than or equal to 30.
group by jobcode having avgage ge 30;
Specify the titles.
title 'Current Summary Information for Each Job Category'; title2 'Average Age Greater Than or Equal to 30';
Display the entire PROCLIB.JOBS view. The SELECT statement selects all columns from PROCLIB.JOBS. PW=RED is necessary because the view is password protected.
select * from proclib.jobs(pw=red);
Current Summary Information for Each Job Category Average Age Greater Than Or Equal to 30 Average Average Jobcode Number Age Salary ------------------------------------ BCK 9 36 ,794 FA1 11 33 ,039 FA2 16 37 ,987 FA3 7 39 ,934 ME1 8 34 ,500 ME2 14 39 ,577 ME3 7 42 ,411 NA1 5 30 ,032 NA2 3 42 ,383 PT1 8 38 ,908 PT2 10 43 ,925 PT3 2 54 ,505 SCP 7 37 ,309 TA1 9 36 ,721 TA2 20 36 ,575 TA3 12 40 ,680
Procedure features:
FROM clause
joined-table component
WHERE clause
Tables: PROCLIB.STAFF2, PROCLIB.SCHEDULE2, PROCLIB.SUPERV2
This example joins three tables and produces a report that contains columns from each table.
PROCLIB.STAFF2
PROCLIB.STAFF2 Id Num Lname Fname City State Hphone ---------------------------------------------------------------------------- 1106 MARSHBURN JASPER STAMFORD CT 203/781-1457 1430 DABROWSKI SANDRA BRIDGEPORT CT 203/675-1647 1118 DENNIS ROGER NEW YORK NY 718/383-1122 1126 KIMANI ANNE NEW YORK NY 212/586-1229 1402 BLALOCK RALPH NEW YORK NY 718/384-2849 1882 TUCKER ALAN NEW YORK NY 718/384-0216 1479 BALLETTI MARIE NEW YORK NY 718/384-8816 1420 ROUSE JEREMY PATERSON NJ 201/732-9834 1403 BOWDEN EARL BRIDGEPORT CT 203/675-3434 1616 FUENTAS CARLA NEW YORK NY 718/384-3329
PROCLIB.SCHEDULE2
PROCLIB.SCHEDULE2 Id Flight Date Dest Num --------------------------- 132 01MAR94 BOS 1118 132 01MAR94 BOS 1402 219 02MAR94 PAR 1616 219 02MAR94 PAR 1478 622 03MAR94 LON 1430 622 03MAR94 LON 1882 271 04MAR94 NYC 1430 271 04MAR94 NYC 1118 579 05MAR94 RDU 1126 579 05MAR94 RDU 1106
PROCLIB.SUPERV2
PROCLIB.SUPERV2 Supervisor Job Id State Category --------------------------- 1417 NJ NA 1352 NY NA 1106 CT PT 1442 NJ PT 1118 NY PT 1405 NJ SC 1564 NY SC 1639 CT TA 1126 NY TA 1882 NY ME
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Select the columns. The SELECT clause specifies the columns to select. IdNum is prefixed with a table alias because it appears in two tables.
proc sql; title 'All Flights for Each Supervisor'; select s.IdNum, Lname, City 'Hometown', Jobcat, Flight, Date
Specify the tables to include in the join. The FROM clause lists the three tables for the join and assigns an alias to each table.
from proclib.schedule2 s, proclib.staff2 t, proclib.superv2 v
Specify the join criteria. The WHERE clause specifies the columns that join the tables. The STAFF2 and SCHEDULE2 tables have an IdNum column, which has related values in both tables. The STAFF2 and SUPERV2 tables have the IdNum and SUPID columns, which have related values in both tables.
where s.idnum=t.idnum and t.idnum=v.supid;
All Flights for Each Supervisor Id Job Num Lname Hometown Category Flight Date ----------------------------------------------------------------- 1106 MARSHBURN STAMFORD PT 579 05MAR94 1118 DENNIS NEW YORK PT 132 01MAR94 1118 DENNIS NEW YORK PT 271 04MAR94 1126 KIMANI NEW YORK TA 579 05MAR94 1882 TUCKER NEW YORK ME 622 03MAR94
Procedure features:
FROM clause
in-line view
Tables: PROCLIB.STAFF2, PROCLIB.SCHEDULE2, PROCLIB.SUPERV2
This example shows an alternative way to construct the query that is explained in Example 9 on page 1159 by joining one of the tables with the results of an in-line view. The example also shows how to rename columns with an in-line view.
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Select the columns. The SELECT clause selects all columns that are returned by the in-line view (which will have the alias Three assigned to it), plus one column from the third table (which will have the alias V assigned to it).
proc sql; title 'All Flights for Each Supervisor'; select three.*, v.jobcat
Specify the in-line query. Instead of including the name of a table or view, the FROM clause includes a query that joins two of the three tables. In the in-line query, the SELECT clause lists the columns to select. IdNum is prefixed with a table alias because it appears in both tables. The FROM clause lists the two tables for the join and assigns an alias to each table. The WHERE clause specifies the columns that join the tables. The STAFF2 and SCHEDULE2 tables have an IdNum column, which has related values in both tables.
from (select lname, s.idnum, city, flight, date from proclib.schedule2 s, proclib.staff2 t where s.idnum=t.idnum)
Specify an alias for the query and names for the columns. The alias Three refers to the results of the in-line view. The names in parentheses become the names for the columns in the view.
as three (Surname, Emp_ID, Hometown, FlightNumber, FlightDate),
Join the results of the in-line view with the third table. The WHERE clause specifies the columns that join the table with the in-line view. Note that the WHERE clause specifies the renamed Emp_ID column from the in-line view.
proclib.superv2 v where three.Emp_ID=v.supid;
All Flights for Each Supervisor 1 Job Surname Emp_ID Hometown FlightNumber FlightDate Category ---------------------------------------------------------------------------- MARSHBURN 1106 STAMFORD 579 05MAR94 PT DENNIS 1118 NEW YORK 132 01MAR94 PT DENNIS 1118 NEW YORK 271 04MAR94 PT KIMANI 1126 NEW YORK 579 05MAR94 TA TUCKER 1882 NEW YORK 622 03MAR94 ME
Procedure features:
ORDER BY clause
SOUNDS-LIKE operator
Table: PROCLIB.STAFF
This example returns rows based on the functionality of the SOUNDS-LIKE operator in a WHERE clause.
Note: The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for identifying words that sound alike. The SOUNDEX algorithm is English- biased and is less useful for languages other than English. For more information on the SOUNDEX algorithm, see SAS Language Reference: Dictionary .
PROCLIB.STAFF
PROCLIB.STAFF First 10 Rows Only Id Num Lname Fname City State Hphone ---------------------------------------------------------------------------- 1919 ADAMS GERALD STAMFORD CT 203/781-1255 1653 ALIBRANDI MARIA BRIDGEPORT CT 203/675-7715 1400 ALHERTANI ABDULLAH NEW YORK NY 212/586-0808 1350 ALVAREZ MERCEDES NEW YORK NY 718/383-1549 1401 ALVAREZ CARLOS PATERSON NJ 201/732-8787 1499 BAREFOOT JOSEPH PRINCETON NJ 201/812-5665 1101 BAUCOM WALTER NEW YORK NY 212/586-8060 1333 BANADYGA JUSTIN STAMFORD CT 203/781-1777 1402 BLALOCK RALPH NEW YORK NY 718/384-2849 1479 BALLETTI MARIE NEW YORK NY 718/384-8816
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library '; options nodate pageno=1 linesize=80 pagesize=60;
Select the columns and the table from which the data is obtained. The SELECT clause selects all columns from the table in the FROM clause, PROCLIB.STAFF.
proc sql; title "Employees Whose Last Name Sounds Like 'Johnson'"; select idnum, upcase(lname), fname from proclib.staff
Subset the query and sort the output. The WHERE clause uses the SOUNDS-LIKE operator to subset the table by those employees whose last name sounds like Johnson . The ORDER BY clause orders the output by the second column.
where lname=*"Johnson" order by 2;
Employees Whose Last Name Sounds Like 'Johnson' 1 Id Num Fname -------------------------------------- 1411 JOHNSEN JACK 1113 JOHNSON LESLIE 1369 JONSON ANTHONY
SOUNDS-LIKE is useful, but there might be instances where it does not return every row that seems to satisfy the condition. PROCLIB.STAFF has an employee with the last name SANDERS and an employee with the last name SANYERS . The algorithm does not find SANYERS , but it does find SANDERS and SANDERSON .
title "Employees Whose Last Name Sounds Like 'Sanders'"; select * from proclib.staff where lname=*"Sanders" order by 2;
Employees Whose Last Name Sounds Like 'Sanders' 2 Id Num Lname Fname City State Hphone ---------------------------------------------------------------------------- 1561 SANDERS RAYMOND NEW YORK NY 212/588-6615 1414 SANDERSON NATHAN BRIDGEPORT CT 203/675-1715 1434 SANDERSON EDITH STAMFORD CT 203/781-1333
Procedure features:
GROUP BY clause
HAVING clause
SELECT clause
ABS function
FORMAT= column-modifier
LABEL= column-modifier
MIN summary function
** operator, exponentiation
SQRT function
Tables: STORES, HOUSES
This example joins two tables in order to compare and analyze values that are unique to each table yet have a relationship with a column that is common to both tables.
options ls=80 ps=60 nodate pageno=1 ; data stores; input Store $ x y; datalines; store1 5 1 store2 5 3 store3 3 5 store4 7 5 ; data houses; input House $ x y; datalines; house1 1 1 house2 3 3 house3 2 3 house4 7 7 ;
STORES and HOUSES
The tables contain X and Y coordinates that represent the location of the stores and houses.
STORES Table 1 Coordinates of Stores Store x y ---------------------------- store1 6 1 store2 5 2 store3 3 5 store4 7 5
HOUSES Table 2 Coordinates of Houses House x y ---------------------------- house1 1 1 house2 3 3 house3 2 3 house4 7 7
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the query. The SELECT clause specifies three columns: HOUSE, STORE, and DIST. The arithmetic expression uses the square root function (SQRT) to create the values of DIST, which contain the distance from HOUSE to STORE for each row. The double asterisk (**) represents exponentiation. LABEL= assigns a label to STORE and to DIST.
proc sql; title 'Each House and the Closest Store'; select house, store label='Closest Store', sqrt((abs(s.x-h.x)**2)+(abs(h.y-s.y)**2)) as dist label='Distance' format=4.2 from stores s, houses h
Organize the data into groups and subset the query. The minimum distance from each house to all the stores is calculated because the data are grouped by house. The HAVING clause specifies that each row be evaluated to determine if its value of DIST is the same as the minimum distance from that house to any store.
group by house having dist=min(dist);
Note that two stores are tied for shortest distance from house2.
Each House and the Closest Store 1 Closest House Store Distance ---------------------------- house1 store1 4.00 house2 store2 2.00 house2 store3 2.00 house3 store3 2.24 house4 store4 2.00
Procedure features:
CASE expression
joined-table component
Cross join
SELECT clause
DISTINCT keyword
Tables: PROCLIB.MARCH, FLIGHTS
This example joins a table with itself to get all the possible combinations of the values in a column.
PROCLIB.MARCH (Partial Listing)
PROCLIB.MARCH 1 First 10 Rows Only Flight Date Depart Orig Dest Miles Boarded Capacity ----------------------------------------------------------------- 114 01MAR94 7:10 LGA LAX 2475 172 210 202 01MAR94 10:43 LGA ORD 740 151 210 219 01MAR94 9:31 LGA LON 3442 198 250 622 01MAR94 12:19 LGA FRA 3857 207 250 132 01MAR94 15:35 LGA YYZ 366 115 178 271 01MAR94 13:17 LGA PAR 3635 138 250 302 01MAR94 20:22 LGA WAS 229 105 180 114 02MAR94 7:10 LGA LAX 2475 119 210 202 02MAR94 10:43 LGA ORD 740 120 210 219 02MAR94 9:31 LGA LON 3442 147 250
Declare the PROCLIB library. The PROCLIB library is used in these examples to store created tables.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the FLIGHTS table. The CREATE TABLE statement creates the table FLIGHTS from the output of the query. The SELECT clause selects the unique values of Dest. DISTINCT specifies that only one row for each value of city be returned by the query and stored in the table FLIGHTS. The FROM clause specifies PROCLIB.MARCH as the table to select from.
proc sql; create table flights as select distinct dest from proclib.march;
Specify the title.
title 'Cities Serviced by the Airline';
Display the entire FLIGHTS table.
select * from flights;
FLIGHTS Table
Cities Serviced by the Airline 1 Dest ---- FRA LAX LON ORD PAR WAS YYZ
Specify the title.
title 'All Possible Connections';
Select the columns. The SELECT clause specifies three columns for the output. The prefixes on DEST are table aliases to specify which table to take the values of Dest from. The CASE expression creates a column that contains the character string to and from .
select f1.Dest, case when f1.dest ne ' ' then 'to and from' end, f2.Dest
Specify the type of join. The FROM clause joins FLIGHTS with itself and creates a table that contains every possible combination of rows (a Cartesian product). The table contains two rows for each possible route, for example, PAR < - > WAS and WAS < - > PAR .
from flights as f1, flights as f2
Specify the join criterion. The WHERE clause subsets the internal table by choosing only those rows where the name in F1.Dest sorts before the name in F2.Dest. Thus, there is only one row for each possible route.
where f1.dest < f2.dest
Sort the output. ORDER BY sorts the result by the values of F1.Dest.
order by f1.dest;
All Possible Connections 2 Dest Dest ----------------------- FRA to and from LAX FRA to and from LON FRA to and from WAS FRA to and from ORD FRA to and from PAR FRA to and from YYZ LAX to and from LON LAX to and from PAR LAX to and from WAS LAX to and from ORD LAX to and from YYZ LON to and from ORD LON to and from WAS LON to and from PAR LON to and from YYZ ORD to and from WAS ORD to and from PAR ORD to and from YYZ PAR to and from WAS PAR to and from YYZ WAS to and from YYZ
Specify a cross join. Because a cross join is functionally the same as a Cartesian product join, the cross join syntax can be substituted for the conventional join syntax.
proc sql; title 'All Possible Connections'; select f1.Dest, case when f1.dest ne ' ' then 'to and from' end, f2.Dest from flights as f1 cross join flights as f2 where f1.dest < f2.dest order by f1.dest;
All Possible Connections 1 Dest Dest ----------------------- FRA to and from LAX FRA to and from LON FRA to and from WAS FRA to and from ORD FRA to and from PAR FRA to and from YYZ LAX to and from LON LAX to and from PAR LAX to and from WAS LAX to and from ORD LAX to and from YYZ LON to and from ORD LON to and from WAS LON to and from PAR LON to and from YYZ ORD to and from WAS ORD to and from PAR ORD to and from YYZ PAR to and from WAS PAR to and from YYZ WAS to and from YYZ
Procedure features:
joined-table component
Tables: MATCH_11 on page 1402, MATCH
This example uses a table that contains data for a case-control study. Each row contains information for a case or a control. To perform statistical analysis, you need a table with one row for each case-control pair. PROC SQL joins the table with itself in order to match the cases with their appropriate controls. After the rows are matched, differencing can be performed on the appropriate columns.
The input table MATCH_11 contains one row for each case and one row for each control. Pair contains a number that associates the case with its control. Low is 0 for the controls and 1 for the cases. The remaining columns contain information about the cases and controls.
MATCH_11 Table 1 First 10 Rows Only Pair Low Age Lwt Race Smoke Ptd Ht UI race1 race2 ----------------------------------------------------------------------------------------------------------- 1 0 14 135 1 0 0 0 0 0 0 1 1 14 101 3 1 1 0 0 0 1 2 0 15 98 2 0 0 0 0 1 0 2 1 15 115 3 0 0 0 1 0 1 3 0 16 95 3 0 0 0 0 0 1 3 1 16 130 3 0 0 0 0 0 1 4 0 17 103 3 0 0 0 0 0 1 4 1 17 130 3 1 1 0 1 0 1 5 0 17 122 1 1 0 0 0 0 0 5 1 17 110 1 1 0 0 0 0 0
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the MATCH table. The SELECT clause specifies the columns for the table MATCH. SQL expressions in the SELECT clause calculate the differences for the appropriate columns and create new columns.
proc sql; create table match as select one.Low, one.Pair, (one.lwt - two.lwt) as Lwt_d, (one.smoke - two.smoke) as Smoke_d, (one.ptd - two.ptd) as Ptd_d, (one.ht - two.ht) as Ht_d, (one.ui - two.ui) as UI_d
Specify the type of join and the join criterion. The FROM clause lists the table MATCH_11 twice. Thus, the table is joined with itself. The WHERE clause returns only the rows for each pair that show the difference when the values for control are subtracted from the values for case.
from match_11 one, match_11 two where (one.pair=two.pair and one.low>two.low);
Specify the title.
title 'Differences for Cases and Controls';
Display the first five rows of the MATCH table. The SELECT clause selects all the columns from MATCH. The OBS= data set option limits the printing of the output to five rows.
select * from match(obs=5);
MATCH Table
Differences for Cases and Controls 1 Low Pair Lwt_d Smoke_d Ptd_d Ht_d UI_d ------------------------------------------------------------------- 1 1 -34 1 1 0 0 1 2 17 0 0 0 1 1 3 35 0 0 0 0 1 4 27 1 1 0 1 1 5 -12 0 0 0 0
Procedure feature:
COUNT function
Table: SURVEY
This example uses a SAS macro to create columns. The SAS macro is not explained here. See SAS Macro Language: Reference for information on SAS macros.
SURVEY contains data from a questionnaire about diet and exercise habits. SAS enables you to use a special notation for missing values. In the EDUC column, the .x notation indicates that the respondent gave an answer that is not valid, and .n indicates that the respondent did not answer the question. A period as a missing value indicates a data entry error.
data survey; input id $ diet $ exer $ hours xwk educ; datalines; 1001 yes yes 1 3 1 1002 no yes 1 4 2 1003 no no . . .n 1004 yes yes 2 3 .x 1005 no yes 2 3 .x 1006 yes yes 2 4 .x 1007 no yes .5 3 . 1008 no no . . . ;
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Count the nonmissing responses. The COUNTM macro uses the COUNT function to perform various counts for a column. Each COUNT function uses a CASE expression to select the rows to be counted. The first COUNT function uses only the column as an argument to return the number of nonmissing rows.
%macro countm(col); count(&col) "Valid Responses for &col",
Count missing or invalid responses. The NMSS function returns the number of rows for which the column has any type of missing value: .n , .x , or a period.
nmiss(&col) "Missing or NOT VALID Responses for &col",
Count the occurrences of various sources of missing or invalid responses. The last three COUNT functions use CASE expressions to count the occurrences of the three notations for missing values. The count me character string gives the COUNT function a nonmissing value to count.
count(case when &col=.n then "count me" end) "Coded as NO ANSWER for &col", count(case when &col=.x then "count me" end) "Coded as NOT VALID answers for &col", count(case when &col=. then "count me" end) "Data Entry Errors for &col" %mend;
Use the COUNTM macro to create the columns. The SELECT clause specifies the columns that are in the output. COUNT(*) returns the total number of rows in the table. The COUNTM macro uses the values of the EDUC column to create the columns that are defined in the macro.
proc sql; title 'Counts for Each Type of Missing Response'; select count(*) "Total No. of Rows", %countm(educ) from survey;
Counts for Each Type of Missing Response 1 Missing Coded as or NOT Coded as NOT Data Total Valid VALID NO VALID Entry No. of Responses Responses ANSWER answers Errors Rows for educ for educ for educ for educ for educ ------------------------------------------------------------- 8 2 6 1 3 2