This section describes the components that are used in SQL procedure statements. Components are the items in PROC SQL syntax that appear in roman type.
Most components are contained in clauses within the statements. For example, the basic SELECT statement is composed of the SELECT and FROM clauses, where each clause contains one or more components. Components can also contain other components.
For easy reference, components appear in alphabetical order, and some terms are referred to before they are defined. Use the index or the See Also references to refer to other statement or component descriptions that may be helpful.
Selects rows where column values are within a range of values.
sql-expression <NOT> BETWEEN sql-expression
AND sql-expression
sql-expression
is described in sql-expression on page 1113.
The sql-expressions must be of compatible data types. They must be either all numeric or all character types.
Because a BETWEEN condition evaluates the boundary values as a range, it is not necessary to specify the smaller quantity first.
You can use the NOT logical operator to exclude a range of numbers, for example, to eliminate customer numbers between 1 and 15 (inclusive) so that you can retrieve data on more recently acquired customers.
PROC SQL supports the same comparison operators that the DATA step supports. For example:
x between 1 and 3 x between 3 and 1 1<=x<=3 x>=1 and x<=3
Removes blanks or specified characters from the beginning, the end, or both the beginning and end of a character string.
BTRIM (<< btrim-specification >< btrim-character FROM>> sql-expression)
btrim-specification
is one of the following:
LEADING
removes the blanks or specified characters from the beginning of the character string.
TRAILING
removes the blanks or specified characters from the end of the character string.
BOTH
removes the blanks or specified characters from both the beginning and the end of the character string.
Default: BOTH
btrim-character
is a single character that is to be removed from the character string. The default character is a blank.
sql-expression
must resolve to a character string or character variable and is described in sql-expression on page 1113.
The BTRIM function operates on character strings. BTRIM removes one or more instances of a single character (the value of btrim-character ) from the beginning, the end, or both the beginning and end of a string, depending whether LEADING, TRAILING, or BOTH is specified. If btrim-specification is not specified, then BOTH is used. If btrim-character is omitted, then blanks are removed.
Note: SAS adds trailing blanks to character values that are shorter than the length of the variable. Suppose you have a character variable Z, with length 10, and a value xxabcxx . SAS stores the value with three blanks after the last x (for a total length of 10). If you attempt to remove all the x characters with
btrim(both 'x' from z)
then the result is abcxx because PROC SQL sees the trailing characters as blanks, not the x character. In order to remove all the x characters, use
btrim(both 'x' from btrim(z))
The inner BTRIM function removes the trailing blanks before passing the value to the outer BTRIM function.
Refers to columns already calculated in the SELECT clause.
CALCULATED column-alias
column-alias
is the name that is assigned to the column in the SELECT clause.
CALCULATED enables you to use the results of an expression in the same SELECT clause or in the WHERE clause. It is valid only when used to refer to columns that are calculated in the immediate query expression.
Selects result values that satisfy specified conditions.
Featured in: Example 3 on page 1143 and Example 13 on page 1168
CASE < case-operand >
WHEN when-condition THEN result-expression
< WHEN when-condition THEN result-expression >
ELSE result-expression >
END
case-operand
is a valid sql-expression that resolves to a table column whose values are compared to all the when-conditions . See sql-expression on page 1113.
when-condition
When case-operand is specified, when-condition is a shortened sql-expression that assumes case-operand as one of its operands and that resolves to true or false.
When case-operand is not specified, when-condition is an sql-expression that resolves to true or false.
result-expression
is an sql-expression that resolves to a value.
The CASE expression selects values if certain conditions are met. A CASE expression returns a single value that is conditionally evaluated for each row of a table (or view). Use the WHEN-THEN clauses when you want to execute a CASE expression for some but not all of the rows in the table that is being queried or created. An optional ELSE expression gives an alternative action if no THEN expression is executed.
When you omit case-operand , when-condition is evaluated as a Boolean (true or false) value. If when-condition returns a nonzero, nonmissing result, then the WHEN clause is true. If case-operand is specified, then it is compared with when-condition for equality. If case-operand equals when-condition , then the WHEN clause is true.
If the when-condition is true for the row that is being executed, then the result-expression that follows THEN is executed. If when-condition is false, then PROC SQL evaluates the next when-condition until they are all evaluated. If every when-condition is false, then PROC SQL executes the ELSE expression, and its result becomes the CASE expression s result. If no ELSE expression is present and every when-condition is false, then the result of the CASE expression is a missing value.
You can use a CASE expression as an item in the SELECT clause and as either operand in an sql-expression.
The following two PROC SQL steps show two equivalent CASE expressions that create a character column with the strings in the THEN clause. The CASE expression in the second PROC SQL step is a shorthand method that is useful when all the comparisons are with the same column.
proc sql; select Name, case when Continent = 'North America' then 'Continental U.S.' when Continent = 'Oceania' then 'Pacific Islands' else 'None' end as Region from states; proc sql; select Name, case Continent when 'North America' then 'Continental U.S.' when 'Oceania' then 'Pacific Islands' else 'None' end as Region from states;
Note: When you use the shorthand method, the conditions must all be equality tests. That is, they cannot use comparison operators or other types of operators.
Returns the first nonmissing value from a list of columns.
Featured in: Example 7 on page 1152
COALESCE (column-name <, column-name>)
column-name
is described in column-name on page 1091.
COALESCE accepts one or more column names of the same data type. The COALESCE function checks the value of each column in the order in which they are listed and returns the first nonmissing value. If only one column is listed, the COALESCE function returns the value of that column. If all the values of all arguments are missing, the COALESCE function returns a missing value.
In some SQL DBMSs, the COALESCE function is called the IFNULL function. See PROC SQL and the ANSI Standard on page 1136 for more information.
Note: If your query contains a large number of COALESCE function calls, it might be more efficient to use a natural join instead. See Natural Joins on page 1102.
Defines PROC SQL s data types and dates.
See also: column-modifier on page 1090
Featured in: Example 1 on page 1139
column data-type <column-modifier < column-modifier>>
column
is a column name.
column-modifier
is described in column-modifier on page 1090.
data-type
is one of the following data types:
CHARACTERVARCHAR <( width )>
indicates a character column with a column width of width . The default column width is eight characters.
INTEGERSMALLINT
indicates an integer column.
DECIMALNUMERICFLOAT <( width <, ndec >)>
indicates a floating-point column with a column width of width and ndec decimal places.
REALDOUBLE PRECISION
indicates a floating-point column.
DATE
indicates a date column.
SAS supports many but not all of the data types that SQL-based databases support.
For all the numeric data types (INTEGER, SMALLINT, DECIMAL, NUMERIC, FLOAT, REAL, DOUBLE PRECISION, and DATE), the SQL procedure defaults to the SAS data type NUMERIC. The width and ndec arguments are ignored; PROC SQL creates all numeric columns with the maximum precision allowed by SAS. If you want to create numeric columns that use less storage space, then use the LENGTH statement in the DATA step. The various numeric data type names, along with the width and ndec arguments, are included for compatibility with other SQL software.
For the character data types (CHARACTER and VARCHAR), the SQL procedure defaults to the SAS data type CHARACTER. The width argument is honored.
The CHARACTER, INTEGER, and DECIMAL data types can be abbreviated to CHAR, INT, and DEC, respectively.
A column that is declared with DATE is a SAS numeric variable with a date informat or format. You can use any of the column-modifiers to set the appropriate attributes for the column that is being defined. See SAS Language Reference: Dictionary for more information on dates.
Sets column attributes.
See also: column-definition on page 1089 and SELECT Clause on page 1072
Featured in: Example 1 on page 1139 and Example 2 on page 1141
column-modifier
column-modifier
is one of the following:
INFORMAT= informatw.d
specifies a SAS informat to be used when SAS accesses data from a table or view. You can change one permanent informat to another by using the ALTER statement. PROC SQL stores informats in its table definitions so that other SAS procedures and the DATA step can use this information when they reference tables created by PROC SQL.
See SAS Language Reference: Dictionary for more information about informats.
FORMAT= formatw.d
specifies a SAS format for determining how character and numeric values in a column are displayed by the query-expression. If the FORMAT= modifier is used in the ALTER, CREATE TABLE, or CREATE VIEW statements, then it specifies the permanent format to be used when SAS displays data from that table or view. You can change one permanent format to another by using the ALTER statement.
See SAS Language Reference: Dictionary for more information about formats.
LABEL= label
specifies a column label. If the LABEL= modifier is used in the ALTER, CREATE TABLE, or CREATE VIEW statements, then it specifies the permanent label to be used when displaying that column. You can change one permanent label to another by using the ALTER statement.
A label can begin with the following characters: a through z, A through Z, 0 through 9, an underscore (_), or a blank space. If you begin a label with any other character, such as pound sign (#), then that character is used as a split character and it splits the label onto the next line wherever it appears. For example:
select dropout label= '#Percentage of#Students Who#Dropped Out' from educ(obs=5);
If a special character must appear as the first character in the output, then precede it with a space or a forward slash (/).
You can omit the LABEL= part of the column-modifier and still specify a label. Be sure to enclose the label in quotation marks, as in this example:
select empname "Names of Employees" from sql.employees;
If an apostrophe must appear in the label, then type it twice so that SAS reads the apostrophe as a literal. Alternatively, you can use single and double quotation marks alternately (for example, Date Rec d).
LENGTH= length
specifies the length of the column. This column modifier is valid only in the context of a SELECT statement.
TRANSCODE=YESNO
for character columns, specifies whether values can be transcoded. Use TRANSCODE=NO to suppress transcoding. Note that when you create a table by using the CREATE TABLE AS statement, the transcoding attribute for a given character column in the created table is the same as it is in the source table unless you change it with the TRANSCODE= column modifier. For more information about transcoding, see SAS National Language Support (NLS): User s Guide .
Default: YES
Restriction: Suppression of transcoding is not supported for the V6TAPE engine.
Interaction: If the TRANSCODE= attribute is set to NO for any character variable in a table, then PROC CONTENTS prints a transcode column that contains the TRANSCODE= value for each variable in the data set. If all variables in the table are set to the default TRANSCODE= value (YES), then no transcode column is printed.
If you refer to a labeled column in the ORDER BY or GROUP BY clause, then you must use either the column name (not its label), the column s alias, or its ordering integer (for example, ORDER BY 2 ). See the section on SAS statements in SAS Language Reference: Dictionary for more information about labels.
Specifies the column to select.
See also: column-modifier on page 1090 and SELECT Clause on page 1072
column-name
column-name
is one of the following:
column
is the name of a column.
table-name.column
is the name of a column in the table table-name .
table-alias.column
is the name of a column in the table that is referenced by table-alias .
view-name.column
is the name of a column in the view view-name .
view-alias.column
is the name of a column in the view that is referenced by view-alias .
A column can be referred to by its name alone if it is the only column by that name in all the tables or views listed in the current query-expression. If the same column name exists in more than one table or view in the query-expression, then you must qualify each use of the column name by prefixing a reference to the table that contains it. Consider the following examples:
SALARY /* name of the column */ EMP.SALARY /* EMP is the table or view name */ E.SALARY /* E is an alias for the table or view that contains the SALARY column */
Retrieves and uses DBMS data in a PROC SQL query or view.
Tip: You can use CONNECTION TO in the SELECT statement s FROM clause as part of the from-list.
See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on page 1129 and your SAS/ACCESS documentation.
CONNECTION TO dbms-name (dbms-query)
CONNECTION TO alias (dbms-query)
alias
specifies an alias, if one was defined in the CONNECT statement.
dbms-name
identifies the DBMS that you are using.
dbms-query
specifies the query to send to a DBMS. The query uses the DBMS s dynamic SQL. You can use any SQL syntax that the DBMS understands, even if that is not valid for PROC SQL. However, your DBMS query cannot contain a semicolon because that represents the end of a statement to SAS.
The number of tables that you can join with dbms-query is determined by the DBMS. Each CONNECTION TO component counts as one table toward the 32-table PROC SQL limit for joins.
See SAS/ACCESS for Relational Databases: Reference for more information about DBMS queries.
Tests whether a string is part of a column s value.
Alias: ?
Restriction: The CONTAINS condition is used only with character operands.
Featured in: Example 7 on page 1152
sql-expression <NOT> CONTAINS sql-expression
sql-expression
is described in sql-expression on page 1113.
Tests if a subquery returns one or more rows.
See also: Query Expressions (Subqueries) on page 1116
<NOT> EXISTS (query-expression)
query-expression
is described in query-expression on page 1107.
The EXISTS condition is an operator whose right operand is a subquery. The result of an EXISTS condition is true if the subquery resolves to at least one row. The result of a NOT EXISTS condition is true if the subquery evaluates to zero rows. For example, the following query subsets PROCLIB.PAYROLL (which is shown in Example 2 on page 1141) based on the criteria in the subquery. If the value for STAFF.IDNUM is on the same row as the value CT in PROCLIB.STAFF (which is shown in Example 4 on page 1145), then the matching IDNUM in PROCLIB.PAYROLL is included in the output. Thus, the query returns all the employees from PROCLIB.PAYROLL who live in CT .
proc sql; select * from proclib.payroll p where exists (select * from proclib.staff s where p.idnumber=s.idnum and state='CT');
Tests set membership.
Featured in: Example 4 on page 1145
sql-expression <NOT> IN (query-expression constant <, constant >)
constant
is a number or a quoted character string (or other special notation) that indicates a fixed value. Constants are also called literals .
query-expression
is described in query-expression on page 1107.
sql-expression
is described in sql-expression on page 1113.
An IN condition tests if the column value that is returned by the sql-expression on the left is a member of the set (of constants or values returned by the query-expression) on the right. The IN condition is true if the value of the left-hand operand is in the set of values that are defined by the right-hand operand.
Tests for a missing value.
Featured in: Example 5 on page 1148
sql-expression IS <NOT> NULL MISSING
sql-expression
is described in sql-expression on page 1113.
IS NULL and IS MISSING are predicates that test for a missing value. IS NULL and IS MISSING are used in the WHERE, ON, and HAVING expressions. Each predicate resolves to true if the sql-expression s result is missing and false if it is not missing.
SAS stores a numeric missing value as a period (.) and a character missing value as a blank space. Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:
3>null -3>null 0>null
The SAS way of evaluating missing values differs from that of the ANSI Standard for SQL. According to the Standard, these expressions are NULL. See sql-expression on page 1113 for more information on predicates and operators. See PROC SQL and the ANSI Standard on page 1136 for more information on the ANSI Standard.
Joins a table with itself or with other tables or views.
Restrictions: Joins are limited to 32 tables.
See also: FROM Clause on page 1077 and query-expression on page 1107
Featured in: Example 4 on page 1145, Example 7 on page 1152, Example 9 on page 1159, Example 13 on page 1168, and Example 14 on page 1172
[1] table-name << AS > alias >, table-name << AS > alias > <, table-name << AS > alias >> [2]<(> table-name < INNER > JOIN table-name ON sql-expression<)> [3]<(> table-name LEFT JOIN RIGHT JOIN FULL JOIN table-name ON sql-expression<)> [4]<(> table-name CROSS JOIN table-name <)> [5]<(> table-name UNION JOIN table-name <)> [6]<(> table-name NATURAL < INNER FULL < OUTER > LEFT < OUTER > RIGHT < OUTER >> JOIN table-name <)>
alias
specifies an alias for table-name . The AS keyword is optional.
sql-expression
is described in sql-expression on page 1113.
table-name
can be one of the following:
the name of a PROC SQL table.
the name of a SAS data view or PROC SQL view.
a query-expression. A query-expression in the FROM clause is usually referred to as an in-line view . See FROM Clause on page 1077 for more information about in-line views.
a connection to a DBMS in the form of the CONNECTION TO component. See CONNECTION TO on page 1092 for more information.
table-name can be a one-level name, a two-level libref.table name, or a physical pathname that is enclosed in single quotation marks.
Note: If you include parentheses, then be sure to include them in pairs. Parentheses are not valid around comma joins (type 1).
[1] | Inner join. See Inner Joins on page 1097. |
[2] | Inner join. See Inner Joins on page 1097. |
[3] | Outer join. See Outer Joins on page 1099. |
[4] | Cross join. See Cross Joins on page 1101. |
[5] | Union join. See Union Joins on page 1102. |
[6] | Natural join. See Natural Joins on page 1102. |
When multiple tables, views, or query-expressions are listed in the FROM clause, they are processed to form one table. The resulting table contains data from each contributing table. These queries are referred to as joins .
Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table ( Cartesian product ) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows may be eliminated by the WHERE clause or summarized by a summary function.
A common type of join is an equijoin , in which the values from a column in the first table must equal the values of a column in the second table.
PROC SQL can process a maximum of 32 tables for a join. If you are using views in a join, then the number of tables on which the views are based count toward the 32-table limit. Each CONNECTION TO component in the Pass-Through Facility counts as one table.
The WHERE clause or ON clause contains the conditions (sql-expression) under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins.
The expression is evaluated for each row from each table in the intermediate table described earlier in Joining Tables on page 1096. The row is considered to be matching if the result of the expression is true (a nonzero, nonmissing value) for that row.
Note: You can follow the ON clause with a WHERE clause to further subset the query result. See Example 7 on page 1152 for an example.
Table aliases are used in joins to distinguish the columns of one table from those in the other table(s). A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. See FROM Clause on page 1077 for more information on table aliases.
A single table can be joined with itself to produce more information. These joins are sometimes called reflexive joins . In these joins, the same table is listed twice in the FROM clause. Each instance of the table must have a table alias or you will not be able to distinguish between references to columns in either instance of the table. See Example 13 on page 1168 and Example 14 on page 1172 for examples.
An inner join returns a result table for all the rows in a table that have one or more matching rows in the other table(s), as specified by the sql-expression. Inner joins can be performed on up to 32 tables in the same query-expression.
You can perform an inner join by using a list of table-names separated by commas or by using the INNER, JOIN, and ON keywords.
The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join:
Left Table - LEFTTAB Continent Export Country ----------------------------- NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt
Right Table - RIGHTTAB Continent Export Country ----------------------------- NA sugar USA EUR corn Spain EUR beets Belgium ASIA rice Vietnam
The following example joins the LEFTTAB and RIGHTTAB tables to get the Cartesian product of the two tables. The Cartesian product is the result of combining every row from one table with every row from another table. You get the Cartesian product when you join two tables and do not subset them with a WHERE clause or ON clause.
proc sql; title 'The Cartesian Product of'; title2 'LEFTTAB and RIGHTTAB'; select * from lefttab, righttab;
The Cartesian Product of LEFTTAB and RIGHTTAB Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA NA wheat Canada EUR corn Spain NA wheat Canada EUR beets Belgium NA wheat Canada ASIA rice Vietnam EUR corn France NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR corn France ASIA rice Vietnam EUR rice Italy NA sugar USA EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium EUR rice Italy ASIA rice Vietnam AFR oil Egypt NA sugar USA AFR oil Egypt EUR corn Spain AFR oil Egypt EUR beets Belgium AFR oil Egypt ASIA rice Vietnam
The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the FROM clause. The following query represents an equijoin because the values of Continent from each table are matched. The column names are prefixed with the table aliases so that the correct columns can be selected.
proc sql; title 'Inner Join'; select * from lefttab as l, righttab as r where l.continent=r.continent;
Inner Join Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium
The following PROC SQL step is equivalent to the previous one and shows how to write an equijoin using the INNER JOIN and ON keywords.
proc sql; title 'Inner Join'; select * from lefttab as l inner join righttab as r on l.continent=r.continent;
See Example 4 on page 1145, Example 13 on page 1168, and Example 14 on page 1172 for more examples.
Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the join. The three types of outer joins are left, right, and full.
A left outer join, specified with the keywords LEFT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the first (LEFTTAB) table that do not match any row in the second (RIGHTTAB) table.
proc sql; title 'Left Outer Join'; select * from lefttab as l left join righttab as r on l.continent=r.continent;
Left Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- AFR oil Egypt EUR rice Italy EUR beets Belgium EUR corn France EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR corn Spain NA wheat Canada NA sugar USA
A right outer join, specified with the keywords RIGHT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the second (RIGHTTAB) table that do not match any row in the first (LEFTTAB) table.
proc sql; title 'Right Outer Join'; select * from lefttab as l right join righttab as r on l.continent=r.continent;
Right Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- ASIA rice Vietnam EUR rice Italy EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR beets Belgium EUR corn France EUR corn Spain NA wheat Canada NA sugar USA
A full outer join, specified with the keywords FULL JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from each table that do not match any row in the other table.
proc sql; title 'Full Outer Join'; select * from lefttab as l full join righttab as r on l.continent=r.continent;
Full Outer Join Continent Export Country Continent Export Country ---------------------------------------------------------------- AFR oil Egypt ASIA rice Vietnam EUR rice Italy EUR beets Belgium EUR rice Italy EUR corn Spain EUR corn France EUR beets Belgium EUR corn France EUR corn Spain NA wheat Canada NA sugar USA
See Example 7 on page 1152 for another example.
A cross join returns as its result table the product of the two tables.
Using the LEFTTAB and RIGHTTAB example tables, the following program demonstrates the cross join:
proc sql; title 'Cross Join'; select * from lefttab as l cross join righttab as r;
Cross Join Continent Export Country Continent Export Country ---------------------------------------------------------------- NA wheat Canada NA sugar USA NA wheat Canada EUR corn Spain NA wheat Canada EUR beets Belgium NA wheat Canada ASIA rice Vietnam EUR corn France NA sugar USA EUR corn France EUR corn Spain EUR corn France EUR beets Belgium EUR corn France ASIA rice Vietnam EUR rice Italy NA sugar USA EUR rice Italy EUR corn Spain EUR rice Italy EUR beets Belgium EUR rice Italy ASIA rice Vietnam AFR oil Egypt NA sugar USA AFR oil Egypt EUR corn Spain AFR oil Egypt EUR beets Belgium AFR oil Egypt ASIA rice Vietnam
The cross join is not functionally different from a Cartesian product join. You would get the same result by submitting the following program:
proc sql; select * from lefttab, righttab;
Do not use an ON clause with a cross join. An ON clause will cause a cross join to fail. However, you can use a WHERE clause to subset the output.
A union join returns a union of the columns of both tables. The union join places in the results all rows with their respective column values from each input table. Columns that do not exist in one table will have null (missing) values for those rows in the result table. The following example demonstrates a union join.
proc sql; title 'Union Join'; select * from lefttab union join righttab;
Union Join Continent Export Country Continent Export Country ------------------------------------------------------------- NA sugar USA EUR corn Spain EUR beets Belgium ASIA rice Vietnam NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt
Using a union join is similar to concatenating tables with the OUTER UNION set operator. See query-expression on page 1107 for more information.
Do not use an ON clause with a union join. An ON clause will cause a union join to fail.
A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. An error results if two columns have the same name but different types. If join-specification is omitted when specifying a natural join, then INNER is implied . If no like columns are found, then a cross join is performed.
The following examples use these two tables:
table1 x y z ----------------------------- 1 2 3 2 1 8 6 5 4 2 5 6
table2 x b z ----------------------------- 1 5 3 3 5 4 2 7 8 6 0 4
The following program demonstrates a natural inner join.
proc sql; title 'Natural Inner Join'; select * from table1 natural join table2;
Natural Inner Join x z b y --------------------------------------- 1 3 5 2 2 8 7 1 6 4 0 5
The following program demonstrates a natural left outer join.
proc sql; title 'Natural Left Outer Join'; select * from table1 natural left join table2;
Natural Left Outer Join x z b y --------------------------------------- 1 3 5 2 2 6 . 5 2 8 7 1 6 4 0 5
Do not use an ON clause with a natural join. An ON clause will cause a natural join to fail. When using a natural join, an ON clause is implied, matching all like columns.
Inner joins are usually performed on two or three tables, but they can be performed on up to 32 tables in PROC SQL. A join on three tables is described here to explain how and why the relationships work among the tables.
In a three-way join, the sql-expression consists of two conditions: one relates the first table to the second table and the other relates the second table to the third table. It is possible to break this example into stages, performing a two-way join into a temporary table and then joining that table with the third one for the same result. However, PROC SQL can do it all in one step as shown in the next example.
The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To calculate the total revenue from exports for each country, you need to multiply the amount exported (AMOUNT table) by the price of each unit (PRICE table), and you must know the commodity that each country exports (COMM table).
COMM Table Continent Export Country ----------------------------- NA wheat Canada EUR corn France EUR rice Italy AFR oil Egypt
PRICE Table Export Price ------------------- rice 3.56 corn 3.45 oil 18 wheat 2.98
AMOUNT Table Country Quantity ------------------- Canada 16000 France 2400 Italy 500 Egypt 10000
proc sql; title 'Total Export Revenue'; select c.Country, p.Export, p.Price, a.Quantity,a.quantity*p.price as Total from comm c, price p, amount a where c.export=p.export and c.country=a.country;
Total Export Revenue Country Export Price Quantity Total -------------------------------------------------- Italy rice 3.56 500 1780 France corn 3.45 2400 8280 Egypt oil 18 10000 180000 Canada wheat 2.98 16000 47680
See Example 9 on page 1159 for another example.
You can often use a subquery or a join to get the same result. However, it is often more efficient to use a join if the outer query and the subquery do not return duplicate rows. For example, the following queries produce the same result. The second query is more efficient:
proc sql; select IDNumber, Birth from proclib.payroll where IDNumber in (select idnum from proclib.staff where lname like 'B%'); proc sql; select p.IDNumber, p.Birth from proclib.payroll p, proclib.staff s where p.idnumber=s.idnum and s.lname like 'B%';
Note: PROCLIB.PAYROLL is shown in Example 2 on page 1141.
Tests for a matching pattern.
sql-expression <NOT> LIKE sql-expression <ESCAPE character-expression >
sql-expression
is described in sql-expression on page 1113.
character-expression
is an sql-expression that evaluates to a single character. The operands of character-expression must be character or string literals; they cannot be column names.
Note: If you use an ESCAPE clause, then the pattern-matching specification must be a quoted string or quoted concatenated string; it cannot contain column names.
The LIKE condition selects rows by comparing character strings with a pattern-matching specification. It resolves to true and displays the matched string(s) if the left operand matches the pattern specified by the right operand.
The ESCAPE clause is used to search for literal instances of the percent (%) and underscore (_) characters, which are usually used for pattern matching.
Patterns are composed of three classes of characters:
underscore (_)
matches any single character.
percent sign (%)
matches any sequence of zero or more characters.
any other character
matches that character.
These patterns can appear before, after, or on both sides of characters that you want to match. The LIKE condition is case-sensitive.
The following list uses these values: Smith , Smooth , Smothers , Smart , and Smuggle .
'Sm%'
matches Smith , Smooth , Smothers , Smart , Smuggle .
'%th'
matches Smith , Smooth .
'S__gg%'
matches Smuggle .
'S_o'
matches a three-letter word, so it has no matches here.
'S_o%'
matches Smooth , Smothers .
'S%th'
matches Smith , Smooth .
'Z'
matches the single, uppercase character Z only, so it has no matches here.
Because the % and _ characters have special meaning in the context of the LIKE condition, you must use the ESCAPE clause to search for these character literals in the input character string.
These example use the values app , a_% , a__ , bbaa1 , and ba_1 .
The condition like 'a_%' matches app , a_% , and a__ , because the underscore (_) in the search pattern matches any single character (including the underscore), and the percent (%) in the search pattern matches zero or more characters, including % and _ .
The condition like 'a_^%' escape '^' matches only a_% , because the escape character (^) specifies that the pattern search for a literal % .
The condition like 'a_%' escape '_' matches none of the values, because the escape character (_) specifies that the pattern search for an a followed by a literal % , which does not apply to any of these values.
To search for mixed-case strings, use the UPCASE function to make all the names uppercase before entering the LIKE condition:
upcase(name) like 'SM%';
Note: When you are using the % character, be aware of the effect of trailing blanks. You may have to use the TRIM function to remove trailing blanks in order to match values.
Converts the case of a character string to lowercase.
See also: UPPER function on page 1128
LOWER (sql-expression)
sql-expression
must resolve to a character string and is described in sql-expression on page 1113.
The LOWER function operates on character strings. LOWER changes the case of its argument to all lowercase.
Note: The LOWER function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function LOWCASE.
Retrieves data from tables.
See also: table-expression on page 1127, Query Expressions (Subqueries) on page 1116, and In-Line Views on page 1078
table-expression < set-operator table-expression> < set-operator table-expression>
table-expression
is described in table-expression on page 1127.
set-operator
is one of the following:
INTERSECT <CORRESPONDING> <ALL>
OUTER UNION <CORRESPONDING>
UNION <CORRESPONDING> <ALL>
EXCEPT <CORRESPONDING> <ALL>
A query-expression is one or more table-expressions. Multiple table expressions are linked by set operators. The following figure illustrates the relationship between table-expressions and query-expressions.
PROC SQL provides these set operators:
OUTER UNION
concatenates the query results.
UNION
produces all unique rows from both queries.
EXCEPT
produces rows that are part of the first query only.
INTERSECT
produces rows that are common to both query results.
A query-expression with set operators is evaluated as follows.
Each table-expression is evaluated to produce an (internal) intermediate result table.
Each intermediate result table then becomes an operand linked with a set operator to form an expression, for example, A UNION B.
If the query-expression involves more than two table-expressions, then the result from the first two becomes an operand for the next set operator and operand, such as (A UNION B) EXCEPT C, ((A UNION B) EXCEPT C) INTERSECT D, and so on.
Evaluating a query-expression produces a single output table.
Set operators follow this order of precedence unless they are overridden by parentheses in the expression(s): INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT have the same level of precedence.
PROC SQL performs set operations even if the tables or views that are referred to in the table-expressions do not have the same number of columns. The reason for this behavior is that the ANSI Standard for SQL requires that tables or views that are involved in a set operation have the same number of columns and that the columns have matching data types. If a set operation is performed on a table or view that has fewer columns than the one(s) with which it is being linked, then PROC SQL extends the table or view with fewer columns by creating columns with missing values of the appropriate data type. This temporary alteration enables the set operation to be performed correctly.
The CORRESPONDING keyword is used only when a set operator is specified. CORR causes PROC SQL to match the columns in table-expressions by name and not by ordinal position. Columns that do not match by name are excluded from the result table, except for the OUTER UNION operator. See OUTER UNION on page 1109.
For example, when performing a set operation on two table-expressions, PROC SQL matches the first specified column-name (listed in the SELECT clause) from one table-expression with the first specified column-name from the other. If CORR is omitted, then PROC SQL matches the columns by ordinal position.
The set operators automatically eliminate duplicate rows from their output tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query-expression s performance. You use it when you want to display all the rows resulting from the table-expressions, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.
Performing an OUTER UNION is very similar to performing the SAS DATA step with a SET statement. The OUTER UNION concatenates the intermediate results from the table-expressions. Thus, the result table for the query-expression contains all the rows produced by the first table-expression followed by all the rows produced by the second table-expression. Columns with the same name are in separate columns in the result table.
For example, the following query expression concatenates the ME1 and ME2 tables but does not overlay like-named columns. Output 45.1 shows the result.
ME1 IDnum Jobcode Salary Bonus --------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986
ME2 IDnum Jobcode Salary ----------------------------- 1653 ME2 35108 1782 ME2 35345 1244 ME2 36925
proc sql; title 'ME1 and ME2: OUTER UNION'; select * from me1 outer union select * from me2;
ME1 and ME2: OUTER UNION IDnum Jobcode Salary Bonus IDnum Jobcode Salary --------------------------------------------------------------------- 1400 ME1 29769 587 . 1403 ME1 28072 342 . 1120 ME1 28619 986 . 1120 ME1 28619 986 . . . 1653 ME2 35108 . . 1782 ME2 35345 . . 1244 ME2 36925
Concatenating tables with the OUTER UNION set operator is similar to performing a union join. See Union Joins on page 1102 for more information.
To overlay columns with the same name, use the CORRESPONDING keyword.
proc sql; title 'ME1 and ME2: OUTER UNION CORRESPONDING'; select * from me1 outer union corr select * from me2;
ME1 and ME2: OUTER UNION CORRESPONDING IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .
In the resulting concatenated table, notice the following:
OUTER UNION CORRESPONDING retains all nonmatching columns.
For columns with the same name, if a value is missing from the result of the first table-expression, then the value in that column from the second table-expression is inserted.
The ALL keyword is not used with OUTER UNION because this operator s default action is to include all rows in a result table. Thus, both rows from the table ME1 where IDnum is 1120 appear in the output.
The UNION operator produces a table that contains all the unique rows that result from both table-expressions. That is, the output table contains rows produced by the first table-expression, the second table-expression, or both.
Columns are appended by position in the tables, regardless of the column names. However, the data type of the corresponding columns must match or the union will not occur. PROC SQL issues a warning message and stops executing.
The names of the columns in the output table are the names of the columns from the first table-expression unless a column (such as an expression) has no name in the first table-expression. In such a case, the name of that column in the output table is the name of the respective column in the second table-expression.
In the following example, PROC SQL combines the two tables:
proc sql; title 'ME1 and ME2: UNION'; select * from me1 union select * from me2;
ME1 and ME2: UNION IDnum Jobcode Salary Bonus -------------------------------------- 1120 ME1 28619 986 1244 ME2 36925 . 1400 ME1 29769 587 1403 ME1 28072 342 1653 ME2 35108 . 1782 ME2 35345 .
In the following example, ALL includes the duplicate row from ME1. In addition, ALL changes the sorting by specifying that PROC SQL make one pass only. Thus, the values from ME2 are simply appended to the values from ME1.
proc sql; title 'ME1 and ME2: UNION ALL'; select * from me1 union all select * from me2;
ME1 and ME2: UNION ALL IDnum Jobcode Salary Bonus -------------------------------------- 1400 ME1 29769 587 1403 ME1 28072 342 1120 ME1 28619 986 1120 ME1 28619 986 1653 ME2 35108 . 1782 ME2 35345 . 1244 ME2 36925 .
See Example 5 on page 1148 for another example.
The EXCEPT operator produces (from the first table-expression) an output table that has unique rows that are not in the second table-expression. If the intermediate result from the first table-expression has at least one occurrence of a row that is not in the intermediate result of the second table-expression, then that row (from the first table-expression) is included in the result table.
In the following example, the IN_USA table contains flights to cities within and outside the USA. The OUT_USA table contains flights only to cities outside the USA. This example returns only the rows from IN_USA that are not also in OUT_USA:
proc sql; title 'Flights from IN_USA Only'; select * from in_usa except select * from out_usa;
IN_USA Flight Dest ------------------ 145 ORD 156 WAS 188 LAX 193 FRA 207 LON
OUT_USA Flight Dest ------------------ 193 FRA 207 LON 311 SJA
Flights from IN_USA Only Flight Dest ------------------ 145 ORD 156 WAS 188 LAX
The INTERSECT operator produces an output table that has rows that are common to both tables. For example, using the IN_USA and OUT_USA tables shown above, the following example returns rows that are in both tables:
proc sql; title 'Flights from Both IN_USA and OUT_USA'; select * from in_usa intersect select * from out_usa;
Flights from Both IN_USA and OUT_USA Flight Dest ------------------ 193 FRA 207 LON
Produces a value from a sequence of operands and operators.
operand operator operand
operand
is one of the following:
a constant , which is a number or a quoted character string (or other special notation) that indicates a fixed value. Constants are also called literals . Constants are described in SAS Language Reference: Dictionary .
a column-name, which is described in column-name on page 1091.
a CASE expression, which is described in CASE expression on page 1087.
a SAS function, which is any SAS function except LAG, DIF, and SOUND. Functions are described in SAS Language Reference: Dictionary .
the ANSI SQL functions COALESCE, BTRIM, LOWER, UPPER, and SUBSTRING.
a summary-function, which is described in summary-function on page 1121.
a query-expression, which is described in query-expression on page 1107.
the USER literal, which references the userid of the person who submitted the program. The userid that is returned is operating environment-dependent, but PROC SQL uses the same value that the &SYSJOBID macro variable has on the operating environment.
operator
is described in Operators and the Order of Evaluation on page 1114.
Note: SAS functions, including summary functions, can stand alone as SQL expressions. For example
select min(x) from table ; select scan(y,4) from table ;
PROC SQL supports the same SAS functions as the DATA step, except for the functions LAG, DIF, and SOUND. For example, the SCAN function is used in the following query:
select style, scan(street,1) format=. from houses;
See SAS Language Reference: Dictionary for complete documentation on SAS functions. Summary functions are also SAS functions. See summary-function on page 1121 for more information.
USER can be specified in a view definition, for example, to create a view that restricts access to those in the user s department. Note that the USER literal value is stored in uppercase, so it is advisable to use the UPCASE function when comparing to this value:
create view myemp as select * from dept12.employees where upcase(manager)=user;
This view produces a different set of employee information for each manager who references it.
The order in which operations are evaluated is the same as in the DATA step with this one exception: NOT is grouped with the logical operators AND and OR in PROC SQL; in the DATA step, NOT is grouped with the unary plus and minus signs.
Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:
3>null -3>null 0>null
You can use parentheses to group values or to nest mathematical expressions. Parentheses make expressions easier to read and can also be used to change the order of evaluation of the operators. Evaluating expressions with parentheses begins at the deepest level of parentheses and moves outward. For example, SAS evaluates A+B*C as A+(B*C), although you can add parentheses to make it evaluate as (A+B)*C for a different result.
Higher priority operations are performed first: that is, group 0 operators are evaluated before group 5 operators. The following table shows the operators and their order of evaluation, including their priority groups.
Group | Operator | Description |
---|---|---|
| () | forces the expression enclosed to be evaluated first |
1 | case-expression | selects result values that satisfy specified conditions |
2 | ** | raises to a power |
unary +, unary - | indicates a positive or negative number | |
3 | * | multiplies |
/ | divides | |
4 | + | adds |
ˆ’ | subtracts | |
5 |
| concatenates |
6 | <NOT> BETWEEN condition | See BETWEEN condition on page 1085. |
<NOT> CONTAINS condition | see CONTAINS condition on page 1093. | |
<NOT> EXISTS condition | See EXISTS condition on page 1093. | |
<NOT> IN condition | See IN condition on page 1094. | |
IS <NOT> condition | See IS condition on page 1095. | |
<NOT> LIKE condition | See LIKE condition on page 1105. | |
7 | =, eq | equals |
=, ^=, < >, ne | does not equal | |
>, gt | is greater than | |
<, lt | is less than | |
>=, ge | is greater than or equal to | |
<=, le | is less than or equal to | |
=* | sounds like (use with character operands only). See Example 11 on page 1164. | |
eqt | equal to truncated strings (use with character operands only). See Truncated String Comparison Operators on page 1116. | |
gtt | greater than truncated strings | |
ltt | less than truncated strings | |
get | greater than or equal to truncated strings | |
let | less than or equal to truncated strings | |
net | not equal to truncated strings | |
8 | &, AND | indicates logical AND |
9 | , OR | indicates logical OR |
10 | , ^, NOT | indicates logical NOT |
Symbols for operators might vary, depending on your operating environment. See SAS Language Reference: Dictionary for more information on operators and expressions.
PROC SQL supports truncated string comparison operators (see Group 7 in Table 45.1 on page 1115). In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string. For example, the expression 'TWOSTORY' eqt 'TWO' is true because the string 'TWOSTORY is reduced to TWO before the comparison is performed. Note that the truncation is performed internally; neither operand is permanently changed.
Note: Unlike the DATA step, PROC SQL does not support the colon operators (such as =:, >:, and <=:) for truncated string comparisons. Use the alphabetic operators (such as EQT, GTT, and LET).
A query-expression is called a subquery when it is used in a WHERE or HAVING clause. A subquery is a query-expression that is nested as part of another query-expression. A subquery selects one or more rows from a table based on values in another table.
Depending on the clause that contains it, a subquery can return a single value or multiple values. If more than one subquery is used in a query-expression, then the innermost query is evaluated first, then the next innermost query, and so on, moving outward.
PROC SQL allows a subquery (contained in parentheses) at any point in an expression where a simple column value or constant can be used. In this case, a subquery must return a single value , that is, one row with only one column.
The following is an example of a subquery that returns one value. This PROC SQL step subsets the PROCLIB.PAYROLL table based on information in the PROCLIB.STAFF table. (PROCLIB.PAYROLL is shown in Example 2 on page 1141, and PROCLIB.STAFF is shown in Example 4 on page 1145.) PROCLIB.PAYROLL contains employee identification numbers (IdNumber) and their salaries (Salary) but does not contain their names. If you want to return only the row from PROCLIB.PAYROLL for one employee, then you can use a subquery that queries the PROCLIB.STAFF table, which contains the employees identification numbers and their names (Lname and Fname).
options ls=64 nodate nonumber; proc sql; title 'Information for Earl Bowden'; select * from proclib.payroll where idnumber= (select idnum from proclib.staff where upcase(lname)='BOWDEN');
Information for Earl Bowden Id Number Gender Jobcode Salary Birth Hired ---------------------------------------------------- 1403 M ME1 28072 28JAN69 21DEC91
Subqueries can return multiple values . The following example uses the tables PROCLIB.DELAY and PROCLIB.MARCH. These tables contain information about the same flights and have the Flight column in common. The following subquery returns all the values for Flight in PROCLIB.DELAY for international flights. The values from the subquery complete the WHERE clause in the outer query. Thus, when the outer query is executed, only the international flights from PROCLIB. MARCH are in the output.
options ls=64 nodate nonumber; proc sql outobs=5; title 'International Flights from'; title2 'PROCLIB.MARCH'; select Flight, Date, Dest, Boarded from proclib.march where flight in (select flight from proclib.delay where destype='International');
International Flights from PROCLIB.MARCH Flight Date Dest Boarded ------------------------------- 219 01MAR94 LON 198 622 01MAR94 FRA 207 132 01MAR94 YYZ 115 271 01MAR94 PAR 138 219 02MAR94 LON 147
Sometimes it is helpful to compare a value with a set of values returned by a subquery. The keywords ANY or ALL can be specified before a subquery when the subquery is the right-hand operand of a comparison. If ALL is specified, then the comparison is true only if it is true for all values that are returned by the subquery. If a subquery returns no rows, then the result of an ALL comparison is true for each row of the outer query.
If ANY is specified, then the comparison is true if it is true for any one of the values that are returned by the subquery. If a subquery returns no rows, then the result of an ANY comparison is false for each row of the outer query.
The following example selects all those in PROCLIB.PAYROLL who earn more than the highest paid ME3 :
options ls=64 nodate nonumber ; proc sql; title ''Employees who Earn More than''; title2 ''All ME's''; select * from proclib.payroll where salary > all (select salary from proclib.payroll where jobcode='ME3');
Employees who Earn More than All ME's Id Number Gender Jobcode Salary Birth Hired ------------------------------------------------------- 1333 M PT2 88606 30MAR61 10FEB81 1739 M PT1 66517 25DEC64 27JAN91 1428 F PT1 68767 04APR60 16NOV91 1404 M PT2 91376 24FEB53 01JAN80 1935 F NA2 51081 28MAR54 16OCT81 1905 M PT1 65111 16APR72 29MAY92 1407 M PT1 68096 23MAR69 18MAR90 1410 M PT2 84685 03MAY67 07NOV86 1439 F PT1 70736 06MAR64 10SEP90 1545 M PT1 66130 12AUG59 29MAY90 1106 M PT2 89632 06NOV57 16AUG84 1442 F PT2 84536 05SEP66 12APR88 1417 M NA2 52270 27JUN64 07MAR89 1478 M PT2 84203 09AUG59 24OCT90 1556 M PT1 71349 22JUN64 11DEC91 1352 M NA2 53798 02DEC60 16OCT86 1890 M PT2 91908 20JUL51 25NOV79 1107 M PT2 89977 09JUN54 10FEB79 1830 F PT2 84471 27MAY57 29JAN83 1928 M PT2 89858 16SEP54 13JUL90 1076 M PT1 66558 14OCT55 03OCT91
Note: See the first item in Subqueries and Efficiency on page 1119 for a note about efficiency when using ALL.
In order to visually separate a subquery from the rest of the query, you can enclose the subquery in any number of pairs of parentheses.
In a correlated subquery, the WHERE expression in a subquery refers to values in a table in the outer query. The correlated subquery is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the subquery and the outer query together.
The following example uses the PROCLIB.DELAY and PROCLIB.MARCH tables. A DATA step (PROCLIB.DELAY on page 1404) creates PROCLIB.DELAY. PROCLIB.MARCH is shown in Example 13 on page 1168. PROCLIB.DELAY has the Flight, Date, Orig, and Dest columns in common with PROCLIB.MARCH:
proc sql outobs=5; title 'International Flights'; select * from proclib.march where 'International' in (select destype from proclib.delay where march.Flight=delay.Flight);
The subquery resolves by substituting every value for MARCH.Flight into the subquery s WHERE clause, one row at a time. For example, when MARCH.Flight= 219 , the subquery resolves as follows:
PROC SQL retrieves all the rows from DELAY where Flight= 219 and passes their DESTYPE values to the WHERE clause.
PROC SQL uses the DESTYPE values to complete the WHERE clause:
where 'International' in ('International','International', ...)
The WHERE clause checks to see if International is in the list. Because it is, all rows from MARCH that have a value of 219 for Flight become part of the output.
The following output contains the rows from MARCH for international flights only.
International Flights Flight Date Depart Orig Dest Miles Boarded Capacity -------------------------------------------------------------------- 219 01MAR94 9:31 LGA LON 3442 198 250 622 01MAR94 12:19 LGA FRA 3857 207 250 132 01MAR94 15:35 LGA YYZ 366 115 178 271 01MAR94 13:17 LGA PAR 3635 138 250 219 02MAR94 9:31 LGA LON 3442 147 250
Use the MAX function in a subquery instead of the ALL keyword before the subquery. For example, the following queries produce the same result, but the second query is more efficient:
proc sql; select * from proclib.payroll where salary> all(select salary from proclib.payroll where jobcode='ME3'); proc sql; select * from proclib.payroll where salary> (select max(salary) from proclib.payroll where jobcode='ME3');
With subqueries, use IN instead of EXISTS when possible. For example, the following queries produce the same result, but the second query is usually more efficient:
proc sql; select * from proclib.payroll p where exists (select * from staff s where p.idnum=s.idnum and state='CT'); proc sql; select * from proclib.payroll where idnum in (select idnum from staff where state='CT');
Returns a part of a character expression.
SUBSTRING (sql-expression FROM start <FOR length >)
sql-expression must be a character string and is described in sql-expression on page 1113.
start is a number (not a variable or column name) that specifies the position, counting from the left end of the character string, at which to begin extracting the substring.
length is a number (not a variable or column name) that specifies the length of the substring that is to be extracted.
The SUBSTRING function operates on character strings. SUBSTRING returns a specified part of the input character string, beginning at the position that is specified by start . If length is omitted, then the SUBSTRING function returns all characters from start to the end of the input character string. The values of start and length must be numbers (not variables) and can be positive, negative, or zero.
If start is greater than the length of the input character string, then the SUBSTRING function returns a zero-length string.
If start is less than 1, then the SUBSTRING function begins extraction at the beginning of the input character string.
If length is specified, then the sum of start and length cannot be less than start or an error is returned. If the sum of start and length is greater than the length of the input character string, then the SUBSTRING function returns all characters from start to the end of the input character string. If the sum of start and length is less than 1, then the SUBSTRING function returns a zero-length string.
Note: The SUBSTRING function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function SUBSTR.
Performs statistical summary calculations.
Restriction: A summary function cannot appear in an ON clause or a WHERE clause.
See also: GROUP BY on page 1079, HAVING Clause on page 1080, SELECT Clause on page 1072, and table-expression on page 1127
Featured in: Example 8 on page 1157, Example 12 on page 1166, and Example 15 on page 1174
summary-function (<DISTINCT ALL> sql-expression)
summary-function
is one of the following:
AVGMEAN
arithmetic mean or average of values
COUNTFREQN
number of nonmissing values
CSS
corrected sum of squares
CV
coefficient of variation (percent)
MAX
largest value
MIN
smallest value
NMISS
number of missing values
PRT
probability of a greater absolute value of Student s t
RANGE
range of values
STD
standard deviation
STDERR
standard error of the mean
SUM
sum of values
SUMWGT
sum of the WEIGHT variable values [*]
T
Student s t value for testing the hypothesis that the population mean is zero
USS
uncorrected sum of squares
VAR
variance
For a description and the formulas used for these statistics, see Appendix 1, SAS Elementary Statistics Procedures, on page 1353.
DISTINCT
specifies that only the unique values of sql-expression be used in the calculation.
ALL
specifies that all values of sql-expression be used in the calculation. If neither DISTINCT nor ALL is specified, then ALL is used.
sql-expression
is described in sql-expression on page 1113.
Summary functions produce a statistical summary of the entire table or view that is listed in the FROM clause or for each group that is specified in a GROUP BY clause. If GROUP BY is omitted, then all the rows in the table or view are considered to be a single group. These functions reduce all the values in each row or column in a table to one summarizing or aggregate value. For this reason, these functions are often called aggregate functions . For example, the sum (one value) of a column results from the addition of all the values in the column.
The COUNT function counts rows. COUNT(*) returns the total number of rows in a group or in a table. If you use a column name as an argument to COUNT, then the result is the total number of rows in a group or in a table that have a nonmissing value for that column. If you want to count the unique values in a column, then specify COUNT(DISTINCT column ).
If the SELECT clause of a table-expression contains one or more summary functions and that table-expression resolves to no rows, then the summary function results are missing values. The following are exceptions that return zeros:
COUNT(*)
COUNT(<DISTINCT> sql-expression)
NMISS(<DISTINCT> sql-expression)
See Example 8 on page 1157 and Example 15 on page 1174 for examples.
The number of arguments that is specified in a summary function affects how the calculation is performed. If you specify a single argument, then the values in the column are calculated. If you specify multiple arguments, then the arguments or columns that are listed are calculated for each row. For example, consider calculations on the following table.
proc sql; title 'Summary Table'; select * from summary;
Summary Table X Y Z ---------------------------- 1 3 4 2 4 5 8 9 4 4 5 4
If you use one argument in the function, then the calculation is performed on that column only. If you use more than one argument, then the calculation is performed on each row of the specified columns. In the following PROC SQL step, the MIN and MAX functions return the minimum and maximum of the columns they are used with. The SUM function returns the sum of each row of the columns specified as arguments:
proc sql; select min(x) as Colmin_x, min(y) as Colmin_y, max(z) as Colmax_z, sum(x,y,z) as Rowsum from summary;
Summary Table Colmin_x Colmin_y Colmax_z Rowsum --------------------------------------- 1 3 5 8 1 3 5 11 1 3 5 21 1 3 5 13
When you use a summary function in a SELECT clause or a HAVING clause, you might see the following message in the SAS log:
NOTE: The query requires remerging summary statistics back with the original data.
The process of remerging involves two passes through the data. On the first pass, PROC SQL
calculates and returns the value of summary functions. It then uses the result to calculate the arithmetic expressions in which the summary function participates.
groups data according to the GROUP BY clause.
On the second pass, PROC SQL retrieves any additional columns and rows that it needs to show in the output.
The following examples use the PROCLIB.PAYROLL table (shown in Example 2 on page 1141) to show when remerging of data is and is not necessary.
The first query requires remerging. The first pass through the data groups the data by Jobcode and resolves the AVG function for each group. However, PROC SQL must make a second pass in order to retrieve the values of IdNumber and Salary.
proc sql outobs=10; title 'Salary Information'; title2 '(First 10 Rows Only)'; select IdNumber, Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode;
Salary Information (First 10 Rows Only) Id Number Jobcode Salary AvgSalary ------------------------------------- 1704 BCK 25465 25794.22 1677 BCK 26007 25794.22 1383 BCK 25823 25794.22 1845 BCK 25996 25794.22 1100 BCK 25004 25794.22 1663 BCK 26452 25794.22 1673 BCK 25477 25794.22 1389 BCK 25028 25794.22 1834 BCK 26896 25794.22 1132 FA1 22413 23039.36
You can change the previous query to return only the average salary for each jobcode. The following query does not require remerging because the first pass of the data does the summarizing and the grouping. A second pass is not necessary.
proc sql outobs=10; title 'Average Salary for Each Jobcode'; select Jobcode, avg(salary) as AvgSalary from proclib.payroll group by jobcode;
Average Salary for Each Jobcode Jobcode AvgSalary ------------------ BCK 25794.22 FA1 23039.36 FA2 27986.88 FA3 32933.86 ME1 28500.25 ME2 35576.86 ME3 42410.71 NA1 42032.2 NA2 52383 PT1 67908
When you use the HAVING clause, PROC SQL may have to remerge data to resolve the HAVING expression.
First, consider a query that uses HAVING but that does not require remerging. The query groups the data by values of Jobcode, and the result contains one row for each value of Jobcode and summary information for people in each Jobcode. On the first pass, the summary functions provide values for the Number , Average Age , and Average Salary columns. The first pass provides everything that PROC SQL needs to resolve the HAVING clause, so no remerging is necessary.
proc sql outobs=10; title 'Summary Information for Each Jobcode'; title2 '(First 10 Rows Only)'; select Jobcode, count(jobcode) as number label='Number', avg(int((today()-birth)/365.25)) as avgage format=2. label='Average Age', avg(salary) as avgsal format=dollar8. label='Average Salary' from proclib.payroll group by jobcode having avgage ge 30;
Summary Information for Each Jobcode (First 10 Rows Only) Average Average Jobcode Number Age Salary ------------------------------------ BCK 9 36 ,794 FA1 11 33 ,039 FA2 16 37 ,987 FA3 7 39 ,934 ME1 8 34 ,500 ME2 14 39 ,577 ME3 7 42 ,411 NA1 5 30 ,032 NA2 3 42 ,383 PT1 8 38 ,908
In the following query, PROC SQL remerges the data because the HAVING clause uses the SALARY column in the comparison and SALARY is not in the GROUP BY clause.
proc sql outobs=10; title 'Employees who Earn More than the'; title2 'Average for Their Jobcode'; title3 '(First 10 Rows Only)'; select Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode having salary > AvgSalary;
Employees who Earn More than the Average for Their Jobcode (First 10 Rows Only) Jobcode Salary AvgSalary ---------------------------- BCK 26007 25794.22 BCK 25823 25794.22 BCK 25996 25794.22 BCK 26452 25794.22 BCK 26896 25794.22 FA1 23177 23039.36 FA1 23738 23039.36 FA1 23979 23039.36 FA1 23916 23039.36 FA1 23644 23039.36
Keep in mind that PROC SQL remerges data when
the values returned by a summary function are used in a calculation. For example, the following query returns the values of X and the percent of the total for each row. On the first pass, PROC SQL computes the sum of X, and on the second pass PROC SQL computes the percentage of the total for each value of X:
proc sql; title 'Percentage of the Total'; select X, (100*x/sum(X)) as Pct_Total from summary;
Percentage of the Total x Pct_Total -------------------- 32 14.81481 86 39.81481 49 22.68519 49 22.68519
the values returned by a summary function are compared to values of a column that is not specified in the GROUP BY clause. For example, the following query uses the PROCLIB.PAYROLL table. PROC SQL remerges data because the column Salary is not specified in the GROUP BY clause:
proc sql; select jobcode, salary, avg(salary) as avsal from proclib.payroll group by jobcode having salary > avsal;
a column from the input table is specified in the SELECT clause and is not specified in the GROUP BY clause. This rule does not refer to columns used as arguments to summary functions in the SELECT clause.
For example, in the following query, the presence of IdNumber in the SELECT clause causes PROC SQL to remerge the data because IdNumber is not involved in grouping or summarizing during the first pass. In order for PROC SQL to retrieve the values for IdNumber, it must make a second pass through the data.
proc sql; select IdNumber, jobcode, avg(salary) as avsal from proclib.payroll group by jobcode;
Defines part or all of a query-expression.
See also: query-expression on page 1107
SELECT < DISTINCT > object-item <, object-item >
INTO : macro-variable-specification
<, : macro-variable-specification >>
FROM from-list
< WHERE sql-expression>
< GROUP BY group-by -item <, group-by-item >>
< HAVING sql-expression>
See SELECT Statement on page 1071 for complete information on the SELECT statement.
A table-expression is a SELECT statement. It is the fundamental building block of most SQL procedure statements. You can combine the results of multiple table-expressions with set operators, which creates a query-expression. Use one ORDER BY clause for an entire query-expression. Place a semicolon only at the end of the entire query-expression. A query-expression is often only one SELECT statement or table-expression.
Converts the case of a character string to uppercase.
See also: LOWER function on page 1107
UPPER (sql-expression)
sql-expression must be a character string and is described in sql-expression on page 1113.
The UPPER function operates on character strings. UPPER converts the case of its argument to all uppercase.
[*] Currently, there is no way to designate a WEIGHT variable for a table in PROC SQL. Thus, each row (or observation) has a weight of 1.