SQL Procedure Component Dictionary


This section describes the components that are used in SQL procedure statements. Components are the items in PROC SQL syntax that appear in roman type.

Most components are contained in clauses within the statements. For example, the basic SELECT statement is composed of the SELECT and FROM clauses, where each clause contains one or more components. Components can also contain other components.

For easy reference, components appear in alphabetical order, and some terms are referred to before they are defined. Use the index or the See Also references to refer to other statement or component descriptions that may be helpful.

BETWEEN condition

Selects rows where column values are within a range of values.

sql-expression <NOT> BETWEEN sql-expression

  • AND sql-expression

Argument

sql-expression

  • is described in sql-expression on page 1113.

Details

  • The sql-expressions must be of compatible data types. They must be either all numeric or all character types.

  • Because a BETWEEN condition evaluates the boundary values as a range, it is not necessary to specify the smaller quantity first.

  • You can use the NOT logical operator to exclude a range of numbers, for example, to eliminate customer numbers between 1 and 15 (inclusive) so that you can retrieve data on more recently acquired customers.

  • PROC SQL supports the same comparison operators that the DATA step supports. For example:

     x between 1 and 3  x between 3 and 1  1<=x<=3  x>=1 and x<=3 

BTRIM function

Removes blanks or specified characters from the beginning, the end, or both the beginning and end of a character string.

BTRIM (<< btrim-specification >< btrim-character FROM>> sql-expression)

Arguments

btrim-specification

  • is one of the following:

  • LEADING

    • removes the blanks or specified characters from the beginning of the character string.

  • TRAILING

    • removes the blanks or specified characters from the end of the character string.

  • BOTH

    • removes the blanks or specified characters from both the beginning and the end of the character string.

  • Default: BOTH

btrim-character

  • is a single character that is to be removed from the character string. The default character is a blank.

sql-expression

  • must resolve to a character string or character variable and is described in sql-expression on page 1113.

Details

The BTRIM function operates on character strings. BTRIM removes one or more instances of a single character (the value of btrim-character ) from the beginning, the end, or both the beginning and end of a string, depending whether LEADING, TRAILING, or BOTH is specified. If btrim-specification is not specified, then BOTH is used. If btrim-character is omitted, then blanks are removed.

Note: SAS adds trailing blanks to character values that are shorter than the length of the variable. Suppose you have a character variable Z, with length 10, and a value xxabcxx . SAS stores the value with three blanks after the last x (for a total length of 10). If you attempt to remove all the x characters with

 btrim(both 'x' from z) 

then the result is abcxx because PROC SQL sees the trailing characters as blanks, not the x character. In order to remove all the x characters, use

 btrim(both 'x' from btrim(z)) 

The inner BTRIM function removes the trailing blanks before passing the value to the outer BTRIM function.

CALCULATED

Refers to columns already calculated in the SELECT clause.

CALCULATED column-alias

Argument

column-alias

  • is the name that is assigned to the column in the SELECT clause.

Referencing a CALCULATED Column

CALCULATED enables you to use the results of an expression in the same SELECT clause or in the WHERE clause. It is valid only when used to refer to columns that are calculated in the immediate query expression.

CASE expression

Selects result values that satisfy specified conditions.

Featured in: Example 3 on page 1143 and Example 13 on page 1168

CASE < case-operand >

  • WHEN when-condition THEN result-expression

  • < WHEN when-condition THEN result-expression >

  • ELSE result-expression >

  • END

Arguments

case-operand

  • is a valid sql-expression that resolves to a table column whose values are compared to all the when-conditions . See sql-expression on page 1113.

when-condition

  • When case-operand is specified, when-condition is a shortened sql-expression that assumes case-operand as one of its operands and that resolves to true or false.

  • When case-operand is not specified, when-condition is an sql-expression that resolves to true or false.

result-expression

  • is an sql-expression that resolves to a value.

Details

The CASE expression selects values if certain conditions are met. A CASE expression returns a single value that is conditionally evaluated for each row of a table (or view). Use the WHEN-THEN clauses when you want to execute a CASE expression for some but not all of the rows in the table that is being queried or created. An optional ELSE expression gives an alternative action if no THEN expression is executed.

When you omit case-operand , when-condition is evaluated as a Boolean (true or false) value. If when-condition returns a nonzero, nonmissing result, then the WHEN clause is true. If case-operand is specified, then it is compared with when-condition for equality. If case-operand equals when-condition , then the WHEN clause is true.

If the when-condition is true for the row that is being executed, then the result-expression that follows THEN is executed. If when-condition is false, then PROC SQL evaluates the next when-condition until they are all evaluated. If every when-condition is false, then PROC SQL executes the ELSE expression, and its result becomes the CASE expression s result. If no ELSE expression is present and every when-condition is false, then the result of the CASE expression is a missing value.

You can use a CASE expression as an item in the SELECT clause and as either operand in an sql-expression.

Example

The following two PROC SQL steps show two equivalent CASE expressions that create a character column with the strings in the THEN clause. The CASE expression in the second PROC SQL step is a shorthand method that is useful when all the comparisons are with the same column.

 proc sql;     select Name, case                  when Continent = 'North America' then 'Continental U.S.'                  when Continent = 'Oceania' then 'Pacific Islands'                  else 'None'                  end as Region        from states;  proc sql;     select Name, case Continent                  when 'North America' then 'Continental U.S.'                  when 'Oceania' then 'Pacific Islands'                  else 'None'                  end as Region        from states; 

Note: When you use the shorthand method, the conditions must all be equality tests. That is, they cannot use comparison operators or other types of operators.

COALESCE Function

Returns the first nonmissing value from a list of columns.

Featured in: Example 7 on page 1152

COALESCE (column-name <, column-name>)

Arguments

column-name

  • is described in column-name on page 1091.

Details

COALESCE accepts one or more column names of the same data type. The COALESCE function checks the value of each column in the order in which they are listed and returns the first nonmissing value. If only one column is listed, the COALESCE function returns the value of that column. If all the values of all arguments are missing, the COALESCE function returns a missing value.

In some SQL DBMSs, the COALESCE function is called the IFNULL function. See PROC SQL and the ANSI Standard on page 1136 for more information.

Note: If your query contains a large number of COALESCE function calls, it might be more efficient to use a natural join instead. See Natural Joins on page 1102.

column-definition

Defines PROC SQL s data types and dates.

See also: column-modifier on page 1090

Featured in: Example 1 on page 1139

column data-type <column-modifier < column-modifier>>

Arguments

column

  • is a column name.

column-modifier

  • is described in column-modifier on page 1090.

data-type

  • is one of the following data types:

  • CHARACTERVARCHAR <( width )>

    • indicates a character column with a column width of width . The default column width is eight characters.

  • INTEGERSMALLINT

    • indicates an integer column.

  • DECIMALNUMERICFLOAT <( width <, ndec >)>

    • indicates a floating-point column with a column width of width and ndec decimal places.

  • REALDOUBLE PRECISION

    • indicates a floating-point column.

  • DATE

    • indicates a date column.

Details

  • SAS supports many but not all of the data types that SQL-based databases support.

  • For all the numeric data types (INTEGER, SMALLINT, DECIMAL, NUMERIC, FLOAT, REAL, DOUBLE PRECISION, and DATE), the SQL procedure defaults to the SAS data type NUMERIC. The width and ndec arguments are ignored; PROC SQL creates all numeric columns with the maximum precision allowed by SAS. If you want to create numeric columns that use less storage space, then use the LENGTH statement in the DATA step. The various numeric data type names, along with the width and ndec arguments, are included for compatibility with other SQL software.

  • For the character data types (CHARACTER and VARCHAR), the SQL procedure defaults to the SAS data type CHARACTER. The width argument is honored.

  • The CHARACTER, INTEGER, and DECIMAL data types can be abbreviated to CHAR, INT, and DEC, respectively.

  • A column that is declared with DATE is a SAS numeric variable with a date informat or format. You can use any of the column-modifiers to set the appropriate attributes for the column that is being defined. See SAS Language Reference: Dictionary for more information on dates.

column-modifier

Sets column attributes.

See also: column-definition on page 1089 and SELECT Clause on page 1072

Featured in: Example 1 on page 1139 and Example 2 on page 1141

column-modifier

Arguments

column-modifier

  • is one of the following:

  • INFORMAT= informatw.d

    • specifies a SAS informat to be used when SAS accesses data from a table or view. You can change one permanent informat to another by using the ALTER statement. PROC SQL stores informats in its table definitions so that other SAS procedures and the DATA step can use this information when they reference tables created by PROC SQL.

    • See SAS Language Reference: Dictionary for more information about informats.

  • FORMAT= formatw.d

    • specifies a SAS format for determining how character and numeric values in a column are displayed by the query-expression. If the FORMAT= modifier is used in the ALTER, CREATE TABLE, or CREATE VIEW statements, then it specifies the permanent format to be used when SAS displays data from that table or view. You can change one permanent format to another by using the ALTER statement.

    • See SAS Language Reference: Dictionary for more information about formats.

  • LABEL= label

    • specifies a column label. If the LABEL= modifier is used in the ALTER, CREATE TABLE, or CREATE VIEW statements, then it specifies the permanent label to be used when displaying that column. You can change one permanent label to another by using the ALTER statement.

    • A label can begin with the following characters: a through z, A through Z, 0 through 9, an underscore (_), or a blank space. If you begin a label with any other character, such as pound sign (#), then that character is used as a split character and it splits the label onto the next line wherever it appears. For example:

       select dropout label=  '#Percentage of#Students Who#Dropped Out'     from educ(obs=5); 
    • If a special character must appear as the first character in the output, then precede it with a space or a forward slash (/).

    • You can omit the LABEL= part of the column-modifier and still specify a label. Be sure to enclose the label in quotation marks, as in this example:

       select empname "Names of Employees"     from sql.employees; 
    • If an apostrophe must appear in the label, then type it twice so that SAS reads the apostrophe as a literal. Alternatively, you can use single and double quotation marks alternately (for example, Date Rec d).

  • LENGTH= length

    • specifies the length of the column. This column modifier is valid only in the context of a SELECT statement.

  • TRANSCODE=YESNO

    • for character columns, specifies whether values can be transcoded. Use TRANSCODE=NO to suppress transcoding. Note that when you create a table by using the CREATE TABLE AS statement, the transcoding attribute for a given character column in the created table is the same as it is in the source table unless you change it with the TRANSCODE= column modifier. For more information about transcoding, see SAS National Language Support (NLS): User s Guide .

    • Default: YES

    • Restriction: Suppression of transcoding is not supported for the V6TAPE engine.

    • Interaction: If the TRANSCODE= attribute is set to NO for any character variable in a table, then PROC CONTENTS prints a transcode column that contains the TRANSCODE= value for each variable in the data set. If all variables in the table are set to the default TRANSCODE= value (YES), then no transcode column is printed.

Details

If you refer to a labeled column in the ORDER BY or GROUP BY clause, then you must use either the column name (not its label), the column s alias, or its ordering integer (for example, ORDER BY 2 ). See the section on SAS statements in SAS Language Reference: Dictionary for more information about labels.

column-name

Specifies the column to select.

See also: column-modifier on page 1090 and SELECT Clause on page 1072

column-name

column-name

  • is one of the following:

  • column

    • is the name of a column.

  • table-name.column

    • is the name of a column in the table table-name .

  • table-alias.column

    • is the name of a column in the table that is referenced by table-alias .

  • view-name.column

    • is the name of a column in the view view-name .

  • view-alias.column

    • is the name of a column in the view that is referenced by view-alias .

Details

A column can be referred to by its name alone if it is the only column by that name in all the tables or views listed in the current query-expression. If the same column name exists in more than one table or view in the query-expression, then you must qualify each use of the column name by prefixing a reference to the table that contains it. Consider the following examples:

 SALARY       /* name of the column */  EMP.SALARY   /* EMP is the table or view name */  E.SALARY     /* E is an alias for the table                  or view that contains the                  SALARY column */ 

CONNECTION TO

Retrieves and uses DBMS data in a PROC SQL query or view.

Tip: You can use CONNECTION TO in the SELECT statement s FROM clause as part of the from-list.

See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on page 1129 and your SAS/ACCESS documentation.

CONNECTION TO dbms-name (dbms-query)

CONNECTION TO alias (dbms-query)

Arguments

alias

  • specifies an alias, if one was defined in the CONNECT statement.

dbms-name

  • identifies the DBMS that you are using.

dbms-query

  • specifies the query to send to a DBMS. The query uses the DBMS s dynamic SQL. You can use any SQL syntax that the DBMS understands, even if that is not valid for PROC SQL. However, your DBMS query cannot contain a semicolon because that represents the end of a statement to SAS.

  • The number of tables that you can join with dbms-query is determined by the DBMS. Each CONNECTION TO component counts as one table toward the 32-table PROC SQL limit for joins.

  • See SAS/ACCESS for Relational Databases: Reference for more information about DBMS queries.

CONTAINS condition

Tests whether a string is part of a column s value.

Alias: ?

Restriction: The CONTAINS condition is used only with character operands.

Featured in: Example 7 on page 1152

sql-expression <NOT> CONTAINS sql-expression

Argument

sql-expression

  • is described in sql-expression on page 1113.

EXISTS condition

Tests if a subquery returns one or more rows.

See also: Query Expressions (Subqueries) on page 1116

<NOT> EXISTS (query-expression)

Argument

query-expression

  • is described in query-expression on page 1107.

Details

The EXISTS condition is an operator whose right operand is a subquery. The result of an EXISTS condition is true if the subquery resolves to at least one row. The result of a NOT EXISTS condition is true if the subquery evaluates to zero rows. For example, the following query subsets PROCLIB.PAYROLL (which is shown in Example 2 on page 1141) based on the criteria in the subquery. If the value for STAFF.IDNUM is on the same row as the value CT in PROCLIB.STAFF (which is shown in Example 4 on page 1145), then the matching IDNUM in PROCLIB.PAYROLL is included in the output. Thus, the query returns all the employees from PROCLIB.PAYROLL who live in CT .

 proc sql;    select *      from proclib.payroll p      where exists (select *                       from proclib.staff s                       where p.idnumber=s.idnum                             and state='CT'); 

IN condition

Tests set membership.

Featured in: Example 4 on page 1145

sql-expression <NOT> IN (query-expression constant <, constant >)

Arguments

constant

  • is a number or a quoted character string (or other special notation) that indicates a fixed value. Constants are also called literals .

query-expression

  • is described in query-expression on page 1107.

sql-expression

  • is described in sql-expression on page 1113.

Details

An IN condition tests if the column value that is returned by the sql-expression on the left is a member of the set (of constants or values returned by the query-expression) on the right. The IN condition is true if the value of the left-hand operand is in the set of values that are defined by the right-hand operand.

IS condition

Tests for a missing value.

Featured in: Example 5 on page 1148

sql-expression IS <NOT> NULL MISSING

Argument

sql-expression

  • is described in sql-expression on page 1113.

Details

IS NULL and IS MISSING are predicates that test for a missing value. IS NULL and IS MISSING are used in the WHERE, ON, and HAVING expressions. Each predicate resolves to true if the sql-expression s result is missing and false if it is not missing.

SAS stores a numeric missing value as a period (.) and a character missing value as a blank space. Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:

 3>null  -3>null   0>null 

The SAS way of evaluating missing values differs from that of the ANSI Standard for SQL. According to the Standard, these expressions are NULL. See sql-expression on page 1113 for more information on predicates and operators. See PROC SQL and the ANSI Standard on page 1136 for more information on the ANSI Standard.

joined-table

Joins a table with itself or with other tables or views.

Restrictions: Joins are limited to 32 tables.

See also: FROM Clause on page 1077 and query-expression on page 1107

Featured in: Example 4 on page 1145, Example 7 on page 1152, Example 9 on page 1159, Example 13 on page 1168, and Example 14 on page 1172

 [1]  table-name  <<  AS  >  alias  >,  table-name  <<  AS  >  alias  >    <,    table-name  <<  AS  >  alias  >> [2]<(>  table-name  <  INNER  >  JOIN   table-name   ON  sql-expression<)> [3]<(>  table-name   LEFT JOIN   RIGHT JOIN   FULL JOIN   table-name   ON  sql-expression<)> [4]<(>  table-name   CROSS JOIN   table-name  <)> [5]<(>  table-name   UNION JOIN   table-name  <)> [6]<(>  table-name   NATURAL  <  INNER   FULL  <  OUTER  >  LEFT  <  OUTER  >  RIGHT  <  OUTER  >>  JOIN   table-name  <)> 

Arguments

alias

  • specifies an alias for table-name . The AS keyword is optional.

sql-expression

  • is described in sql-expression on page 1113.

table-name

  • can be one of the following:

    • the name of a PROC SQL table.

    • the name of a SAS data view or PROC SQL view.

    • a query-expression. A query-expression in the FROM clause is usually referred to as an in-line view . See FROM Clause on page 1077 for more information about in-line views.

    • a connection to a DBMS in the form of the CONNECTION TO component. See CONNECTION TO on page 1092 for more information.

  • table-name can be a one-level name, a two-level libref.table name, or a physical pathname that is enclosed in single quotation marks.

Note: If you include parentheses, then be sure to include them in pairs. Parentheses are not valid around comma joins (type 1).

Types of Joins

[1]  

Inner join. See Inner Joins on page 1097.

[2]  

Inner join. See Inner Joins on page 1097.

[3]  

Outer join. See Outer Joins on page 1099.

[4]  

Cross join. See Cross Joins on page 1101.

[5]  

Union join. See Union Joins on page 1102.

[6]  

Natural join. See Natural Joins on page 1102.

Joining Tables

When multiple tables, views, or query-expressions are listed in the FROM clause, they are processed to form one table. The resulting table contains data from each contributing table. These queries are referred to as joins .

Conceptually, when two tables are specified, each row of table A is matched with all the rows of table B to produce an internal or intermediate table. The number of rows in the intermediate table ( Cartesian product ) is equal to the product of the number of rows in each of the source tables. The intermediate table becomes the input to the rest of the query in which some of its rows may be eliminated by the WHERE clause or summarized by a summary function.

A common type of join is an equijoin , in which the values from a column in the first table must equal the values of a column in the second table.

Table Limit

PROC SQL can process a maximum of 32 tables for a join. If you are using views in a join, then the number of tables on which the views are based count toward the 32-table limit. Each CONNECTION TO component in the Pass-Through Facility counts as one table.

Specifying the Rows to Be Returned

The WHERE clause or ON clause contains the conditions (sql-expression) under which the rows in the Cartesian product are kept or eliminated in the result table. WHERE is used to select rows from inner joins. ON is used to select rows from inner or outer joins.

The expression is evaluated for each row from each table in the intermediate table described earlier in Joining Tables on page 1096. The row is considered to be matching if the result of the expression is true (a nonzero, nonmissing value) for that row.

Note: You can follow the ON clause with a WHERE clause to further subset the query result. See Example 7 on page 1152 for an example.

Table Aliases

Table aliases are used in joins to distinguish the columns of one table from those in the other table(s). A table name or alias must be prefixed to a column name when you are joining tables that have matching column names. See FROM Clause on page 1077 for more information on table aliases.

Joining a Table with Itself

A single table can be joined with itself to produce more information. These joins are sometimes called reflexive joins . In these joins, the same table is listed twice in the FROM clause. Each instance of the table must have a table alias or you will not be able to distinguish between references to columns in either instance of the table. See Example 13 on page 1168 and Example 14 on page 1172 for examples.

Inner Joins

An inner join returns a result table for all the rows in a table that have one or more matching rows in the other table(s), as specified by the sql-expression. Inner joins can be performed on up to 32 tables in the same query-expression.

You can perform an inner join by using a list of table-names separated by commas or by using the INNER, JOIN, and ON keywords.

The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join:

 Left Table - LEFTTAB  Continent  Export    Country  ----------------------------- NA         wheat     Canada  EUR        corn      France  EUR        rice      Italy  AFR        oil       Egypt 
 Right Table - RIGHTTAB  Continent  Export    Country  ----------------------------- NA         sugar     USA  EUR        corn      Spain  EUR        beets     Belgium  ASIA       rice      Vietnam 

The following example joins the LEFTTAB and RIGHTTAB tables to get the Cartesian product of the two tables. The Cartesian product is the result of combining every row from one table with every row from another table. You get the Cartesian product when you join two tables and do not subset them with a WHERE clause or ON clause.

 proc sql;     title 'The Cartesian Product of';     title2 'LEFTTAB and RIGHTTAB';     select *        from lefttab, righttab; 
 The Cartesian Product of                         LEFTTAB and RIGHTTAB  Continent    Export     Country   Continent   Export    Country  ---------------------------------------------------------------- NA           wheat      Canada    NA          sugar     USA  NA           wheat      Canada    EUR         corn      Spain  NA           wheat      Canada    EUR         beets     Belgium  NA           wheat      Canada    ASIA        rice      Vietnam  EUR          corn       France    NA          sugar     USA  EUR          corn       France    EUR         corn      Spain  EUR          corn       France    EUR         beets     Belgium  EUR          corn       France    ASIA        rice      Vietnam  EUR          rice       Italy     NA          sugar     USA  EUR          rice       Italy     EUR         corn      Spain  EUR          rice       Italy     EUR         beets     Belgium  EUR          rice       Italy     ASIA        rice      Vietnam  AFR          oil        Egypt     NA          sugar     USA  AFR          oil        Egypt     EUR         corn      Spain  AFR          oil        Egypt     EUR         beets     Belgium  AFR          oil        Egypt     ASIA        rice      Vietnam 

The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the FROM clause. The following query represents an equijoin because the values of Continent from each table are matched. The column names are prefixed with the table aliases so that the correct columns can be selected.

 proc sql;     title 'Inner Join';     select *        from lefttab as l, righttab as r        where l.continent=r.continent; 
 Inner Join  Continent   Export      Country   Continent  Export    Country  ---------------------------------------------------------------- NA           wheat      Canada    NA         sugar     USA  EUR          corn       France    EUR        corn      Spain  EUR          corn       France    EUR        beets     Belgium  EUR          rice       Italy     EUR        corn      Spain  EUR          rice       Italy     EUR        beets     Belgium 

The following PROC SQL step is equivalent to the previous one and shows how to write an equijoin using the INNER JOIN and ON keywords.

 proc sql;     title 'Inner Join';     select *        from lefttab as l inner join             righttab as r        on l.continent=r.continent; 

See Example 4 on page 1145, Example 13 on page 1168, and Example 14 on page 1172 for more examples.

Outer Joins

Outer joins are inner joins that have been augmented with rows that did not match with any row from the other table in the join. The three types of outer joins are left, right, and full.

A left outer join, specified with the keywords LEFT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the first (LEFTTAB) table that do not match any row in the second (RIGHTTAB) table.

 proc sql;     title 'Left Outer Join';     select *        from lefttab as l left join             righttab as r        on l.continent=r.continent; 
 Left Outer Join  Continent    Export     Country   Continent  Export    Country  ---------------------------------------------------------------- AFR          oil        Egypt  EUR          rice       Italy     EUR        beets     Belgium  EUR          corn       France    EUR        beets     Belgium  EUR          rice       Italy     EUR        corn      Spain  EUR          corn       France    EUR        corn      Spain  NA           wheat      Canada    NA         sugar     USA 

A right outer join, specified with the keywords RIGHT JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from the second (RIGHTTAB) table that do not match any row in the first (LEFTTAB) table.

 proc sql;     title 'Right Outer Join';     select *        from lefttab as l right join             righttab as r        on l.continent=r.continent; 
 Right Outer Join  Continent   Export      Country   Continent  Export    Country  ----------------------------------------------------------------                                   ASIA       rice      Vietnam  EUR          rice       Italy     EUR        beets     Belgium  EUR          rice       Italy     EUR        corn      Spain  EUR          corn       France    EUR        beets     Belgium  EUR          corn       France    EUR        corn      Spain  NA           wheat      Canada    NA         sugar     USA 

A full outer join, specified with the keywords FULL JOIN and ON, has all the rows from the Cartesian product of the two tables for which the sql-expression is true, plus rows from each table that do not match any row in the other table.

 proc sql;     title 'Full Outer Join';     select *        from lefttab as l full join             righttab as r        on l.continent=r.continent; 
 Full Outer Join  Continent    Export     Country   Continent  Export    Country  ---------------------------------------------------------------- AFR          oil        Egypt                                    ASIA       rice      Vietnam  EUR          rice       Italy     EUR        beets     Belgium  EUR          rice       Italy     EUR        corn      Spain  EUR          corn       France    EUR        beets     Belgium  EUR          corn       France    EUR        corn      Spain  NA           wheat      Canada    NA         sugar     USA 

See Example 7 on page 1152 for another example.

Cross Joins

A cross join returns as its result table the product of the two tables.

Using the LEFTTAB and RIGHTTAB example tables, the following program demonstrates the cross join:

 proc sql;     title 'Cross Join';     select *        from lefttab as l cross join           righttab as r; 
 Cross Join  Continent    Export     Country   Continent  Export    Country  ---------------------------------------------------------------- NA           wheat      Canada    NA         sugar     USA  NA           wheat      Canada    EUR        corn      Spain  NA           wheat      Canada    EUR        beets     Belgium  NA           wheat      Canada    ASIA       rice      Vietnam  EUR          corn       France    NA         sugar     USA  EUR          corn       France    EUR        corn      Spain  EUR          corn       France    EUR        beets     Belgium  EUR          corn       France    ASIA       rice      Vietnam  EUR          rice       Italy     NA         sugar     USA  EUR          rice       Italy     EUR        corn      Spain  EUR          rice       Italy     EUR        beets     Belgium  EUR          rice       Italy     ASIA       rice      Vietnam  AFR          oil        Egypt     NA         sugar     USA  AFR          oil        Egypt     EUR        corn      Spain  AFR          oil        Egypt     EUR        beets     Belgium  AFR          oil        Egypt     ASIA       rice      Vietnam 

The cross join is not functionally different from a Cartesian product join. You would get the same result by submitting the following program:

 proc sql;     select *        from lefttab, righttab; 

Do not use an ON clause with a cross join. An ON clause will cause a cross join to fail. However, you can use a WHERE clause to subset the output.

Union Joins

A union join returns a union of the columns of both tables. The union join places in the results all rows with their respective column values from each input table. Columns that do not exist in one table will have null (missing) values for those rows in the result table. The following example demonstrates a union join.

 proc sql;     title 'Union Join';     select *        from lefttab union join righttab; 
 Union Join  Continent  Export    Country   Continent  Export    Country  -------------------------------------------------------------                                NA         sugar     USA                                 EUR        corn      Spain                                 EUR        beets     Belgium                                 ASIA       rice      Vietnam  NA         wheat     Canada  EUR        corn      France  EUR        rice      Italy  AFR        oil       Egypt 

Using a union join is similar to concatenating tables with the OUTER UNION set operator. See query-expression on page 1107 for more information.

Do not use an ON clause with a union join. An ON clause will cause a union join to fail.

Natural Joins

A natural join selects rows from two tables that have equal values in columns that share the same name and the same type. An error results if two columns have the same name but different types. If join-specification is omitted when specifying a natural join, then INNER is implied . If no like columns are found, then a cross join is performed.

The following examples use these two tables:

 table1         x         y         z  -----------------------------        1         2         3         2         1         8         6         5         4         2         5         6 
 table2         x         b         z  -----------------------------        1         5         3         3         5         4         2         7         8         6         0         4 

The following program demonstrates a natural inner join.

 proc sql;     title 'Natural Inner Join';     select *     from table1 natural join table2; 
 Natural Inner Join         x         z         b         y  ---------------------------------------        1         3         5         2         2         8         7         1         6         4         0         5 

The following program demonstrates a natural left outer join.

 proc sql;     title 'Natural Left Outer Join';     select *        from table1 natural left join table2; 
 Natural Left Outer Join         x         z         b         y  ---------------------------------------        1         3         5         2         2         6         .         5         2         8         7         1         6         4         0         5 

Do not use an ON clause with a natural join. An ON clause will cause a natural join to fail. When using a natural join, an ON clause is implied, matching all like columns.

Joining More Than Two Tables

Inner joins are usually performed on two or three tables, but they can be performed on up to 32 tables in PROC SQL. A join on three tables is described here to explain how and why the relationships work among the tables.

In a three-way join, the sql-expression consists of two conditions: one relates the first table to the second table and the other relates the second table to the third table. It is possible to break this example into stages, performing a two-way join into a temporary table and then joining that table with the third one for the same result. However, PROC SQL can do it all in one step as shown in the next example.

The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To calculate the total revenue from exports for each country, you need to multiply the amount exported (AMOUNT table) by the price of each unit (PRICE table), and you must know the commodity that each country exports (COMM table).

 COMM Table  Continent  Export    Country  ----------------------------- NA         wheat     Canada  EUR        corn      France  EUR        rice      Italy  AFR        oil       Egypt 
 PRICE Table  Export       Price  ------------------- rice          3.56  corn          3.45  oil             18  wheat         2.98 
 AMOUNT Table  Country   Quantity  ------------------- Canada       16000  France        2400  Italy          500  Egypt        10000 
 proc sql;     title  'Total Export Revenue';     select c.Country, p.Export, p.Price,            a.Quantity,a.quantity*p.price            as Total        from comm c, price p, amount a        where c.export=p.export              and c.country=a.country; 
 Total Export Revenue  Country     Export       Price  Quantity    Total  -------------------------------------------------- Italy       rice          3.56       500     1780  France      corn          3.45      2400     8280  Egypt       oil             18     10000   180000  Canada      wheat         2.98     16000    47680 

See Example 9 on page 1159 for another example.

Comparison of Joins and Subqueries

You can often use a subquery or a join to get the same result. However, it is often more efficient to use a join if the outer query and the subquery do not return duplicate rows. For example, the following queries produce the same result. The second query is more efficient:

 proc sql;     select IDNumber, Birth        from proclib.payroll        where IDNumber in (select idnum                        from proclib.staff                        where lname like 'B%');  proc sql;     select   p.IDNumber, p.Birth        from proclib.payroll p, proclib.staff s        where p.idnumber=s.idnum              and s.lname like 'B%'; 

Note: PROCLIB.PAYROLL is shown in Example 2 on page 1141.

LIKE condition

Tests for a matching pattern.

sql-expression <NOT> LIKE sql-expression <ESCAPE character-expression >

Arguments

sql-expression

  • is described in sql-expression on page 1113.

character-expression

  • is an sql-expression that evaluates to a single character. The operands of character-expression must be character or string literals; they cannot be column names.

  • Note: If you use an ESCAPE clause, then the pattern-matching specification must be a quoted string or quoted concatenated string; it cannot contain column names.

Details

The LIKE condition selects rows by comparing character strings with a pattern-matching specification. It resolves to true and displays the matched string(s) if the left operand matches the pattern specified by the right operand.

The ESCAPE clause is used to search for literal instances of the percent (%) and underscore (_) characters, which are usually used for pattern matching.

Patterns for Searching

Patterns are composed of three classes of characters:

underscore (_)

  • matches any single character.

percent sign (%)

  • matches any sequence of zero or more characters.

any other character

  • matches that character.

These patterns can appear before, after, or on both sides of characters that you want to match. The LIKE condition is case-sensitive.

The following list uses these values: Smith , Smooth , Smothers , Smart , and Smuggle .

  • 'Sm%'

    • matches Smith , Smooth , Smothers , Smart , Smuggle .

  • '%th'

    • matches Smith , Smooth .

  • 'S__gg%'

    • matches Smuggle .

  • 'S_o'

    • matches a three-letter word, so it has no matches here.

  • 'S_o%'

    • matches Smooth , Smothers .

  • 'S%th'

    • matches Smith , Smooth .

  • 'Z'

    • matches the single, uppercase character Z only, so it has no matches here.

Searching for Literal % and _

Because the % and _ characters have special meaning in the context of the LIKE condition, you must use the ESCAPE clause to search for these character literals in the input character string.

These example use the values app , a_% , a__ , bbaa1 , and ba_1 .

  • The condition like 'a_%' matches app , a_% , and a__ , because the underscore (_) in the search pattern matches any single character (including the underscore), and the percent (%) in the search pattern matches zero or more characters, including % and _ .

  • The condition like 'a_^%' escape '^' matches only a_% , because the escape character (^) specifies that the pattern search for a literal % .

  • The condition like 'a_%' escape '_' matches none of the values, because the escape character (_) specifies that the pattern search for an a followed by a literal % , which does not apply to any of these values.

Searching for Mixed-Case Strings

To search for mixed-case strings, use the UPCASE function to make all the names uppercase before entering the LIKE condition:

 upcase(name) like 'SM%'; 

Note: When you are using the % character, be aware of the effect of trailing blanks. You may have to use the TRIM function to remove trailing blanks in order to match values.

LOWER function

Converts the case of a character string to lowercase.

See also: UPPER function on page 1128

LOWER (sql-expression)

Argument

sql-expression

  • must resolve to a character string and is described in sql-expression on page 1113.

Details

The LOWER function operates on character strings. LOWER changes the case of its argument to all lowercase.

Note: The LOWER function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function LOWCASE.

query-expression

Retrieves data from tables.

See also: table-expression on page 1127, Query Expressions (Subqueries) on page 1116, and In-Line Views on page 1078

table-expression < set-operator table-expression> < set-operator table-expression>

Arguments

table-expression

  • is described in table-expression on page 1127.

set-operator

  • is one of the following:

    • INTERSECT <CORRESPONDING> <ALL>

    • OUTER UNION <CORRESPONDING>

    • UNION <CORRESPONDING> <ALL>

    • EXCEPT <CORRESPONDING> <ALL>

Query Expressions and Table Expressions

A query-expression is one or more table-expressions. Multiple table expressions are linked by set operators. The following figure illustrates the relationship between table-expressions and query-expressions.

click to expand

Set Operators

PROC SQL provides these set operators:

OUTER UNION

  • concatenates the query results.

UNION

  • produces all unique rows from both queries.

EXCEPT

  • produces rows that are part of the first query only.

INTERSECT

  • produces rows that are common to both query results.

A query-expression with set operators is evaluated as follows.

  • Each table-expression is evaluated to produce an (internal) intermediate result table.

  • Each intermediate result table then becomes an operand linked with a set operator to form an expression, for example, A UNION B.

  • If the query-expression involves more than two table-expressions, then the result from the first two becomes an operand for the next set operator and operand, such as (A UNION B) EXCEPT C, ((A UNION B) EXCEPT C) INTERSECT D, and so on.

  • Evaluating a query-expression produces a single output table.

Set operators follow this order of precedence unless they are overridden by parentheses in the expression(s): INTERSECT is evaluated first. OUTER UNION, UNION, and EXCEPT have the same level of precedence.

PROC SQL performs set operations even if the tables or views that are referred to in the table-expressions do not have the same number of columns. The reason for this behavior is that the ANSI Standard for SQL requires that tables or views that are involved in a set operation have the same number of columns and that the columns have matching data types. If a set operation is performed on a table or view that has fewer columns than the one(s) with which it is being linked, then PROC SQL extends the table or view with fewer columns by creating columns with missing values of the appropriate data type. This temporary alteration enables the set operation to be performed correctly.

CORRESPONDING (CORR) Keyword

The CORRESPONDING keyword is used only when a set operator is specified. CORR causes PROC SQL to match the columns in table-expressions by name and not by ordinal position. Columns that do not match by name are excluded from the result table, except for the OUTER UNION operator. See OUTER UNION on page 1109.

For example, when performing a set operation on two table-expressions, PROC SQL matches the first specified column-name (listed in the SELECT clause) from one table-expression with the first specified column-name from the other. If CORR is omitted, then PROC SQL matches the columns by ordinal position.

ALL Keyword

The set operators automatically eliminate duplicate rows from their output tables. The optional ALL keyword preserves the duplicate rows, reduces the execution by one step, and thereby improves the query-expression s performance. You use it when you want to display all the rows resulting from the table-expressions, rather than just the unique rows. The ALL keyword is used only when a set operator is also specified.

OUTER UNION

Performing an OUTER UNION is very similar to performing the SAS DATA step with a SET statement. The OUTER UNION concatenates the intermediate results from the table-expressions. Thus, the result table for the query-expression contains all the rows produced by the first table-expression followed by all the rows produced by the second table-expression. Columns with the same name are in separate columns in the result table.

For example, the following query expression concatenates the ME1 and ME2 tables but does not overlay like-named columns. Output 45.1 shows the result.

 ME1  IDnum     Jobcode     Salary     Bonus  --------------------------------------- 1400      ME1          29769       587  1403      ME1          28072       342  1120      ME1          28619       986  1120      ME1          28619       986 
 ME2  IDnum     Jobcode     Salary  ----------------------------- 1653      ME2          35108  1782      ME2          35345  1244      ME2          36925 
 proc sql;     title 'ME1 and ME2: OUTER UNION';     select *        from me1     outer union     select *        from me2; 
Output 45.1: OUTER UNION of ME1 and ME2 Tables
start example
 ME1 and ME2: OUTER UNION  IDnum     Jobcode     Salary     Bonus  IDnum     Jobcode      Salary  --------------------------------------------------------------------- 1400      ME1          29769       587                              .  1403      ME1          28072       342                              .  1120      ME1          28619       986                              .  1120      ME1          28619       986                              .                             .         .  1653      ME2           35108                             .         .  1782      ME2           35345                             .         .  1244      ME2           36925 
end example
 

Concatenating tables with the OUTER UNION set operator is similar to performing a union join. See Union Joins on page 1102 for more information.

To overlay columns with the same name, use the CORRESPONDING keyword.

 proc sql;     title 'ME1 and ME2: OUTER UNION CORRESPONDING';     select *        from me1     outer union corr     select *        from me2; 
 ME1 and ME2: OUTER UNION CORRESPONDING  IDnum     Jobcode     Salary     Bonus  -------------------------------------- 1400      ME1          29769       587  1403      ME1          28072       342  1120      ME1          28619       986  1120      ME1          28619       986  1653      ME2          35108         .  1782      ME2          35345         .  1244      ME2          36925         . 

In the resulting concatenated table, notice the following:

  • OUTER UNION CORRESPONDING retains all nonmatching columns.

  • For columns with the same name, if a value is missing from the result of the first table-expression, then the value in that column from the second table-expression is inserted.

  • The ALL keyword is not used with OUTER UNION because this operator s default action is to include all rows in a result table. Thus, both rows from the table ME1 where IDnum is 1120 appear in the output.

UNION

The UNION operator produces a table that contains all the unique rows that result from both table-expressions. That is, the output table contains rows produced by the first table-expression, the second table-expression, or both.

Columns are appended by position in the tables, regardless of the column names. However, the data type of the corresponding columns must match or the union will not occur. PROC SQL issues a warning message and stops executing.

The names of the columns in the output table are the names of the columns from the first table-expression unless a column (such as an expression) has no name in the first table-expression. In such a case, the name of that column in the output table is the name of the respective column in the second table-expression.

In the following example, PROC SQL combines the two tables:

 proc sql;     title 'ME1 and ME2: UNION';     select *        from me1     union     select *        from me2; 
 ME1 and ME2: UNION  IDnum     Jobcode     Salary     Bonus  -------------------------------------- 1120      ME1          28619       986  1244      ME2          36925         .  1400      ME1          29769       587  1403      ME1          28072       342  1653      ME2          35108         .  1782      ME2          35345         . 

In the following example, ALL includes the duplicate row from ME1. In addition, ALL changes the sorting by specifying that PROC SQL make one pass only. Thus, the values from ME2 are simply appended to the values from ME1.

 proc sql;     title 'ME1 and ME2: UNION ALL';     select *        from me1     union all     select *        from me2; 
 ME1 and ME2: UNION ALL  IDnum     Jobcode     Salary     Bonus  -------------------------------------- 1400      ME1          29769       587  1403      ME1          28072       342  1120      ME1          28619       986  1120      ME1          28619       986  1653      ME2          35108         .  1782      ME2          35345         .  1244      ME2          36925         . 

See Example 5 on page 1148 for another example.

EXCEPT

The EXCEPT operator produces (from the first table-expression) an output table that has unique rows that are not in the second table-expression. If the intermediate result from the first table-expression has at least one occurrence of a row that is not in the intermediate result of the second table-expression, then that row (from the first table-expression) is included in the result table.

In the following example, the IN_USA table contains flights to cities within and outside the USA. The OUT_USA table contains flights only to cities outside the USA. This example returns only the rows from IN_USA that are not also in OUT_USA:

 proc sql;      title 'Flights from IN_USA Only';      select * from in_usa      except      select * from out_usa; 
 IN_USA  Flight    Dest  ------------------ 145       ORD  156       WAS  188       LAX  193       FRA  207       LON 
 OUT_USA  Flight    Dest  ------------------ 193       FRA  207       LON  311       SJA 
 Flights from IN_USA Only        Flight    Dest        ------------------       145       ORD        156       WAS        188       LAX 

INTERSECT

The INTERSECT operator produces an output table that has rows that are common to both tables. For example, using the IN_USA and OUT_USA tables shown above, the following example returns rows that are in both tables:

 proc sql;      title 'Flights from Both IN_USA and OUT_USA';      select * from in_usa      intersect      select * from out_usa; 
 Flights from Both IN_USA and OUT_USA            Flight    Dest            ------------------           193       FRA            207       LON 

sql-expression

Produces a value from a sequence of operands and operators.

operand operator operand

Arguments

operand

  • is one of the following:

    • a constant , which is a number or a quoted character string (or other special notation) that indicates a fixed value. Constants are also called literals . Constants are described in SAS Language Reference: Dictionary .

    • a column-name, which is described in column-name on page 1091.

    • a CASE expression, which is described in CASE expression on page 1087.

    • a SAS function, which is any SAS function except LAG, DIF, and SOUND. Functions are described in SAS Language Reference: Dictionary .

    • the ANSI SQL functions COALESCE, BTRIM, LOWER, UPPER, and SUBSTRING.

    • a summary-function, which is described in summary-function on page 1121.

    • a query-expression, which is described in query-expression on page 1107.

    • the USER literal, which references the userid of the person who submitted the program. The userid that is returned is operating environment-dependent, but PROC SQL uses the same value that the &SYSJOBID macro variable has on the operating environment.

operator

  • is described in Operators and the Order of Evaluation on page 1114.

Note: SAS functions, including summary functions, can stand alone as SQL expressions. For example

 select min(x) from  table  ;  select scan(y,4) from  table  ; 

SAS Functions

PROC SQL supports the same SAS functions as the DATA step, except for the functions LAG, DIF, and SOUND. For example, the SCAN function is used in the following query:

 select style, scan(street,1) format=.     from houses; 

See SAS Language Reference: Dictionary for complete documentation on SAS functions. Summary functions are also SAS functions. See summary-function on page 1121 for more information.

USER Literal

USER can be specified in a view definition, for example, to create a view that restricts access to those in the user s department. Note that the USER literal value is stored in uppercase, so it is advisable to use the UPCASE function when comparing to this value:

 create view myemp as     select * from dept12.employees        where upcase(manager)=user; 

This view produces a different set of employee information for each manager who references it.

Operators and the Order of Evaluation

The order in which operations are evaluated is the same as in the DATA step with this one exception: NOT is grouped with the logical operators AND and OR in PROC SQL; in the DATA step, NOT is grouped with the unary plus and minus signs.

Unlike missing values in some versions of SQL, missing values in SAS always appear first in the collating sequence. Therefore, in Boolean and comparison operations, the following expressions resolve to true in a predicate:

 3>null  -3>null   0>null 

You can use parentheses to group values or to nest mathematical expressions. Parentheses make expressions easier to read and can also be used to change the order of evaluation of the operators. Evaluating expressions with parentheses begins at the deepest level of parentheses and moves outward. For example, SAS evaluates A+B*C as A+(B*C), although you can add parentheses to make it evaluate as (A+B)*C for a different result.

Higher priority operations are performed first: that is, group 0 operators are evaluated before group 5 operators. The following table shows the operators and their order of evaluation, including their priority groups.

Table 45.1: Operators and Order of Evaluation

Group

Operator

Description

()

forces the expression enclosed to be evaluated first

1

case-expression

selects result values that satisfy specified conditions

2

**

raises to a power

 

unary +, unary -

indicates a positive or negative number

3

*

multiplies

 

/

divides

4

+

adds

 

ˆ’

subtracts

5

concatenates

6

<NOT> BETWEEN condition

See BETWEEN condition on page 1085.

 

<NOT> CONTAINS condition

see CONTAINS condition on page 1093.

 

<NOT> EXISTS condition

See EXISTS condition on page 1093.

 

<NOT> IN condition

See IN condition on page 1094.

 

IS <NOT> condition

See IS condition on page 1095.

 

<NOT> LIKE condition

See LIKE condition on page 1105.

7

=, eq

equals

 

=, ^=, < >, ne

does not equal

 

>, gt

is greater than

 

<, lt

is less than

 

>=, ge

is greater than or equal to

 

<=, le

is less than or equal to

 

=*

sounds like (use with character operands only). See Example 11 on page 1164.

 

eqt

equal to truncated strings (use with character operands only). See Truncated String Comparison Operators on page 1116.

 

gtt

greater than truncated strings

 

ltt

less than truncated strings

 

get

greater than or equal to truncated strings

 

let

less than or equal to truncated strings

 

net

not equal to truncated strings

8

&, AND

indicates logical AND

9

, OR

indicates logical OR

10

, ^, NOT

indicates logical NOT

Symbols for operators might vary, depending on your operating environment. See SAS Language Reference: Dictionary for more information on operators and expressions.

Truncated String Comparison Operators

PROC SQL supports truncated string comparison operators (see Group 7 in Table 45.1 on page 1115). In a truncated string comparison, the comparison is performed after making the strings the same length by truncating the longer string to be the same length as the shorter string. For example, the expression 'TWOSTORY' eqt 'TWO' is true because the string 'TWOSTORY is reduced to TWO before the comparison is performed. Note that the truncation is performed internally; neither operand is permanently changed.

Note: Unlike the DATA step, PROC SQL does not support the colon operators (such as =:, >:, and <=:) for truncated string comparisons. Use the alphabetic operators (such as EQT, GTT, and LET).

Query Expressions (Subqueries)

A query-expression is called a subquery when it is used in a WHERE or HAVING clause. A subquery is a query-expression that is nested as part of another query-expression. A subquery selects one or more rows from a table based on values in another table.

Depending on the clause that contains it, a subquery can return a single value or multiple values. If more than one subquery is used in a query-expression, then the innermost query is evaluated first, then the next innermost query, and so on, moving outward.

PROC SQL allows a subquery (contained in parentheses) at any point in an expression where a simple column value or constant can be used. In this case, a subquery must return a single value , that is, one row with only one column.

The following is an example of a subquery that returns one value. This PROC SQL step subsets the PROCLIB.PAYROLL table based on information in the PROCLIB.STAFF table. (PROCLIB.PAYROLL is shown in Example 2 on page 1141, and PROCLIB.STAFF is shown in Example 4 on page 1145.) PROCLIB.PAYROLL contains employee identification numbers (IdNumber) and their salaries (Salary) but does not contain their names. If you want to return only the row from PROCLIB.PAYROLL for one employee, then you can use a subquery that queries the PROCLIB.STAFF table, which contains the employees identification numbers and their names (Lname and Fname).

 options ls=64 nodate nonumber;  proc sql;      title 'Information for Earl Bowden';      select *         from proclib.payroll         where idnumber=               (select idnum               from proclib.staff               where upcase(lname)='BOWDEN'); 
 Information for Earl Bowden  Id  Number   Gender  Jobcode    Salary    Birth    Hired  ---------------------------------------------------- 1403     M       ME1         28072  28JAN69  21DEC91 

Subqueries can return multiple values . The following example uses the tables PROCLIB.DELAY and PROCLIB.MARCH. These tables contain information about the same flights and have the Flight column in common. The following subquery returns all the values for Flight in PROCLIB.DELAY for international flights. The values from the subquery complete the WHERE clause in the outer query. Thus, when the outer query is executed, only the international flights from PROCLIB. MARCH are in the output.

 options ls=64 nodate nonumber;  proc sql outobs=5;     title 'International Flights from';     title2 'PROCLIB.MARCH';     select Flight, Date, Dest, Boarded        from proclib.march        where flight in              (select flight               from proclib.delay               where destype='International'); 
 International Flights from           PROCLIB.MARCH  Flight     Date  Dest   Boarded  ------------------------------- 219     01MAR94  LON        198  622     01MAR94  FRA        207  132     01MAR94  YYZ        115  271     01MAR94  PAR        138  219     02MAR94  LON        147 

Sometimes it is helpful to compare a value with a set of values returned by a subquery. The keywords ANY or ALL can be specified before a subquery when the subquery is the right-hand operand of a comparison. If ALL is specified, then the comparison is true only if it is true for all values that are returned by the subquery. If a subquery returns no rows, then the result of an ALL comparison is true for each row of the outer query.

If ANY is specified, then the comparison is true if it is true for any one of the values that are returned by the subquery. If a subquery returns no rows, then the result of an ANY comparison is false for each row of the outer query.

The following example selects all those in PROCLIB.PAYROLL who earn more than the highest paid ME3 :

 options ls=64 nodate nonumber ;  proc sql;  title ''Employees who Earn More than'';  title2 ''All ME's'';     select *       from proclib.payroll       where salary > all (select salary                           from proclib.payroll                           where jobcode='ME3'); 
 Employees who Earn More than                           All ME's  Id  Number     Gender  Jobcode    Salary    Birth     Hired  ------------------------------------------------------- 1333       M       PT2         88606  30MAR61   10FEB81  1739       M       PT1         66517  25DEC64   27JAN91  1428       F       PT1         68767  04APR60   16NOV91  1404       M       PT2         91376  24FEB53   01JAN80  1935       F       NA2         51081  28MAR54   16OCT81  1905       M       PT1         65111  16APR72   29MAY92  1407       M       PT1         68096  23MAR69   18MAR90  1410       M       PT2         84685  03MAY67   07NOV86  1439       F       PT1         70736  06MAR64   10SEP90  1545       M       PT1         66130  12AUG59   29MAY90  1106       M       PT2         89632  06NOV57   16AUG84  1442       F       PT2         84536  05SEP66   12APR88  1417       M       NA2         52270  27JUN64   07MAR89  1478       M       PT2         84203  09AUG59   24OCT90  1556       M       PT1         71349  22JUN64   11DEC91  1352       M       NA2         53798  02DEC60   16OCT86  1890       M       PT2         91908  20JUL51   25NOV79  1107       M       PT2         89977  09JUN54   10FEB79  1830       F       PT2         84471  27MAY57   29JAN83  1928       M       PT2         89858  16SEP54   13JUL90  1076       M       PT1         66558  14OCT55   03OCT91 

Note: See the first item in Subqueries and Efficiency on page 1119 for a note about efficiency when using ALL.

In order to visually separate a subquery from the rest of the query, you can enclose the subquery in any number of pairs of parentheses.

Correlated Subqueries

In a correlated subquery, the WHERE expression in a subquery refers to values in a table in the outer query. The correlated subquery is evaluated for each row in the outer query. With correlated subqueries, PROC SQL executes the subquery and the outer query together.

The following example uses the PROCLIB.DELAY and PROCLIB.MARCH tables. A DATA step (PROCLIB.DELAY on page 1404) creates PROCLIB.DELAY. PROCLIB.MARCH is shown in Example 13 on page 1168. PROCLIB.DELAY has the Flight, Date, Orig, and Dest columns in common with PROCLIB.MARCH:

 proc sql outobs=5;     title 'International Flights';     select *        from proclib.march        where 'International' in             (select destype              from proclib.delay              where march.Flight=delay.Flight); 

The subquery resolves by substituting every value for MARCH.Flight into the subquery s WHERE clause, one row at a time. For example, when MARCH.Flight= 219 , the subquery resolves as follows:

  1. PROC SQL retrieves all the rows from DELAY where Flight= 219 and passes their DESTYPE values to the WHERE clause.

  2. PROC SQL uses the DESTYPE values to complete the WHERE clause:

     where 'International' in     ('International','International', ...) 
  3. The WHERE clause checks to see if International is in the list. Because it is, all rows from MARCH that have a value of 219 for Flight become part of the output.

The following output contains the rows from MARCH for international flights only.

Output 45.2: International Flights for March
start example
 International Flights  Flight      Date    Depart  Orig  Dest     Miles   Boarded  Capacity  -------------------------------------------------------------------- 219      01MAR94      9:31  LGA   LON       3442       198       250  622      01MAR94     12:19  LGA   FRA       3857       207       250  132      01MAR94     15:35  LGA   YYZ        366       115       178  271      01MAR94     13:17  LGA   PAR       3635       138       250  219      02MAR94      9:31  LGA   LON       3442       147       250 
end example
 

Subqueries and Efficiency

  • Use the MAX function in a subquery instead of the ALL keyword before the subquery. For example, the following queries produce the same result, but the second query is more efficient:

     proc sql;     select * from proclib.payroll     where salary> all(select salary                       from proclib.payroll                       where jobcode='ME3');  proc sql;     select * from proclib.payroll     where salary> (select max(salary)                    from proclib.payroll                    where jobcode='ME3'); 
  • With subqueries, use IN instead of EXISTS when possible. For example, the following queries produce the same result, but the second query is usually more efficient:

     proc sql;     select *        from proclib.payroll p        where exists (select *                      from staff s                      where p.idnum=s.idnum                         and state='CT');  proc sql;     select *        from proclib.payroll        where idnum in (select idnum                           from staff                           where state='CT'); 

SUBSTRING function

Returns a part of a character expression.

SUBSTRING (sql-expression FROM start <FOR length >)

  • sql-expression must be a character string and is described in sql-expression on page 1113.

  • start is a number (not a variable or column name) that specifies the position, counting from the left end of the character string, at which to begin extracting the substring.

  • length is a number (not a variable or column name) that specifies the length of the substring that is to be extracted.

Details

The SUBSTRING function operates on character strings. SUBSTRING returns a specified part of the input character string, beginning at the position that is specified by start . If length is omitted, then the SUBSTRING function returns all characters from start to the end of the input character string. The values of start and length must be numbers (not variables) and can be positive, negative, or zero.

If start is greater than the length of the input character string, then the SUBSTRING function returns a zero-length string.

If start is less than 1, then the SUBSTRING function begins extraction at the beginning of the input character string.

If length is specified, then the sum of start and length cannot be less than start or an error is returned. If the sum of start and length is greater than the length of the input character string, then the SUBSTRING function returns all characters from start to the end of the input character string. If the sum of start and length is less than 1, then the SUBSTRING function returns a zero-length string.

Note: The SUBSTRING function is provided for compatibility with the ANSI SQL standard. You can also use the SAS function SUBSTR.

summary-function

Performs statistical summary calculations.

Restriction: A summary function cannot appear in an ON clause or a WHERE clause.

See also: GROUP BY on page 1079, HAVING Clause on page 1080, SELECT Clause on page 1072, and table-expression on page 1127

Featured in: Example 8 on page 1157, Example 12 on page 1166, and Example 15 on page 1174

summary-function (<DISTINCT ALL> sql-expression)

Arguments

summary-function

  • is one of the following:

  • AVGMEAN

    • arithmetic mean or average of values

  • COUNTFREQN

    • number of nonmissing values

  • CSS

    • corrected sum of squares

  • CV

    • coefficient of variation (percent)

  • MAX

    • largest value

  • MIN

    • smallest value

  • NMISS

    • number of missing values

  • PRT

    • probability of a greater absolute value of Student s t

  • RANGE

    • range of values

  • STD

    • standard deviation

  • STDERR

    • standard error of the mean

  • SUM

    • sum of values

  • SUMWGT

    • sum of the WEIGHT variable values [*]

  • T

    • Student s t value for testing the hypothesis that the population mean is zero

  • USS

    • uncorrected sum of squares

  • VAR

    • variance

    For a description and the formulas used for these statistics, see Appendix 1, SAS Elementary Statistics Procedures, on page 1353.

DISTINCT

  • specifies that only the unique values of sql-expression be used in the calculation.

ALL

  • specifies that all values of sql-expression be used in the calculation. If neither DISTINCT nor ALL is specified, then ALL is used.

sql-expression

  • is described in sql-expression on page 1113.

Summarizing Data

Summary functions produce a statistical summary of the entire table or view that is listed in the FROM clause or for each group that is specified in a GROUP BY clause. If GROUP BY is omitted, then all the rows in the table or view are considered to be a single group. These functions reduce all the values in each row or column in a table to one summarizing or aggregate value. For this reason, these functions are often called aggregate functions . For example, the sum (one value) of a column results from the addition of all the values in the column.

Counting Rows

The COUNT function counts rows. COUNT(*) returns the total number of rows in a group or in a table. If you use a column name as an argument to COUNT, then the result is the total number of rows in a group or in a table that have a nonmissing value for that column. If you want to count the unique values in a column, then specify COUNT(DISTINCT column ).

If the SELECT clause of a table-expression contains one or more summary functions and that table-expression resolves to no rows, then the summary function results are missing values. The following are exceptions that return zeros:

  • COUNT(*)

  • COUNT(<DISTINCT> sql-expression)

  • NMISS(<DISTINCT> sql-expression)

See Example 8 on page 1157 and Example 15 on page 1174 for examples.

Calculating Statistics Based on the Number of Arguments

The number of arguments that is specified in a summary function affects how the calculation is performed. If you specify a single argument, then the values in the column are calculated. If you specify multiple arguments, then the arguments or columns that are listed are calculated for each row. For example, consider calculations on the following table.

 proc sql;     title 'Summary Table';     select * from summary; 
 Summary Table         X         Y         Z  ----------------------------        1         3         4         2         4         5         8         9         4         4         5         4 

If you use one argument in the function, then the calculation is performed on that column only. If you use more than one argument, then the calculation is performed on each row of the specified columns. In the following PROC SQL step, the MIN and MAX functions return the minimum and maximum of the columns they are used with. The SUM function returns the sum of each row of the columns specified as arguments:

 proc sql;      select min(x) as Colmin_x,             min(y) as Colmin_y,             max(z) as Colmax_z,             sum(x,y,z) as Rowsum         from summary; 
 Summary Table  Colmin_x   Colmin_y  Colmax_z    Rowsum  ---------------------------------------        1          3         5         8         1          3         5        11         1          3         5        21         1          3         5        13 

Remerging Data

When you use a summary function in a SELECT clause or a HAVING clause, you might see the following message in the SAS log:

 NOTE: The query requires remerging summary        statistics back with the original        data. 

The process of remerging involves two passes through the data. On the first pass, PROC SQL

  • calculates and returns the value of summary functions. It then uses the result to calculate the arithmetic expressions in which the summary function participates.

  • groups data according to the GROUP BY clause.

On the second pass, PROC SQL retrieves any additional columns and rows that it needs to show in the output.

The following examples use the PROCLIB.PAYROLL table (shown in Example 2 on page 1141) to show when remerging of data is and is not necessary.

The first query requires remerging. The first pass through the data groups the data by Jobcode and resolves the AVG function for each group. However, PROC SQL must make a second pass in order to retrieve the values of IdNumber and Salary.

 proc sql outobs=10;     title 'Salary Information';     title2 '(First 10 Rows Only)';     select IdNumber, Jobcode, Salary,            avg(salary) as AvgSalary        from proclib.payroll        group by jobcode; 
 Salary Information           (First 10 Rows Only)  Id  Number   Jobcode    Salary  AvgSalary  ------------------------------------- 1704     BCK         25465   25794.22  1677     BCK         26007   25794.22  1383     BCK         25823   25794.22  1845     BCK         25996   25794.22  1100     BCK         25004   25794.22  1663     BCK         26452   25794.22  1673     BCK         25477   25794.22  1389     BCK         25028   25794.22  1834     BCK         26896   25794.22  1132     FA1         22413   23039.36 

You can change the previous query to return only the average salary for each jobcode. The following query does not require remerging because the first pass of the data does the summarizing and the grouping. A second pass is not necessary.

 proc sql outobs=10;     title 'Average Salary for Each Jobcode';     select Jobcode, avg(salary) as AvgSalary     from proclib.payroll     group by jobcode; 
 Average Salary for Each Jobcode         Jobcode  AvgSalary         ------------------        BCK       25794.22         FA1       23039.36         FA2       27986.88         FA3       32933.86         ME1       28500.25         ME2       35576.86         ME3       42410.71         NA1        42032.2         NA2          52383         PT1          67908 

When you use the HAVING clause, PROC SQL may have to remerge data to resolve the HAVING expression.

First, consider a query that uses HAVING but that does not require remerging. The query groups the data by values of Jobcode, and the result contains one row for each value of Jobcode and summary information for people in each Jobcode. On the first pass, the summary functions provide values for the Number , Average Age , and Average Salary columns. The first pass provides everything that PROC SQL needs to resolve the HAVING clause, so no remerging is necessary.

 proc sql outobs=10;  title 'Summary Information for Each Jobcode';  title2 '(First 10 Rows Only)';     select Jobcode,            count(jobcode) as number                label='Number',            avg(int((today()-birth)/365.25))                as avgage format=2.                label='Average Age',            avg(salary) as avgsal format=dollar8.                label='Average Salary'        from proclib.payroll        group by jobcode        having avgage ge 30; 
 Summary Information for Each Jobcode          (First 10 Rows Only)                     Average   Average  Jobcode    Number      Age    Salary  ------------------------------------ BCK             9       36   ,794  FA1            11       33   ,039  FA2            16       37   ,987  FA3             7       39   ,934  ME1             8       34   ,500  ME2            14       39   ,577  ME3             7       42   ,411  NA1             5       30   ,032  NA2             3       42   ,383  PT1             8       38   ,908 

In the following query, PROC SQL remerges the data because the HAVING clause uses the SALARY column in the comparison and SALARY is not in the GROUP BY clause.

 proc sql outobs=10;  title 'Employees who Earn More than the';  title2 'Average for Their Jobcode';  title3 '(First 10 Rows Only)';     select Jobcode, Salary,            avg(salary) as AvgSalary        from proclib.payroll        group by jobcode        having salary > AvgSalary; 
 Employees who Earn More than the       Average for Their Jobcode          (First 10 Rows Only)      Jobcode    Salary  AvgSalary      ----------------------------     BCK         26007   25794.22      BCK         25823   25794.22      BCK         25996   25794.22      BCK         26452   25794.22      BCK         26896   25794.22      FA1         23177   23039.36      FA1         23738   23039.36      FA1         23979   23039.36      FA1         23916   23039.36      FA1         23644   23039.36 

Keep in mind that PROC SQL remerges data when

  • the values returned by a summary function are used in a calculation. For example, the following query returns the values of X and the percent of the total for each row. On the first pass, PROC SQL computes the sum of X, and on the second pass PROC SQL computes the percentage of the total for each value of X:

     proc sql;     title 'Percentage of the Total';     select X, (100*x/sum(X)) as Pct_Total        from summary; 
     Percentage of the Total           x  Pct_Total   --------------------         32   14.81481          86   39.81481          49   22.68519          49   22.68519 
  • the values returned by a summary function are compared to values of a column that is not specified in the GROUP BY clause. For example, the following query uses the PROCLIB.PAYROLL table. PROC SQL remerges data because the column Salary is not specified in the GROUP BY clause:

     proc sql;     select jobcode, salary,            avg(salary) as avsal        from proclib.payroll        group by jobcode        having salary > avsal; 
  • a column from the input table is specified in the SELECT clause and is not specified in the GROUP BY clause. This rule does not refer to columns used as arguments to summary functions in the SELECT clause.

    For example, in the following query, the presence of IdNumber in the SELECT clause causes PROC SQL to remerge the data because IdNumber is not involved in grouping or summarizing during the first pass. In order for PROC SQL to retrieve the values for IdNumber, it must make a second pass through the data.

     proc sql;     select IdNumber, jobcode,            avg(salary) as avsal        from proclib.payroll        group by jobcode; 

table-expression

Defines part or all of a query-expression.

See also: query-expression on page 1107

SELECT < DISTINCT > object-item <, object-item >

  • INTO : macro-variable-specification

    • <, : macro-variable-specification >>

    FROM from-list

  • < WHERE sql-expression>

  • < GROUP BY group-by -item <, group-by-item >>

  • < HAVING sql-expression>

See SELECT Statement on page 1071 for complete information on the SELECT statement.

Details

A table-expression is a SELECT statement. It is the fundamental building block of most SQL procedure statements. You can combine the results of multiple table-expressions with set operators, which creates a query-expression. Use one ORDER BY clause for an entire query-expression. Place a semicolon only at the end of the entire query-expression. A query-expression is often only one SELECT statement or table-expression.

UPPER function

Converts the case of a character string to uppercase.

See also: LOWER function on page 1107

UPPER (sql-expression)

  • sql-expression must be a character string and is described in sql-expression on page 1113.

Details

The UPPER function operates on character strings. UPPER converts the case of its argument to all uppercase.

[*] Currently, there is no way to designate a WEIGHT variable for a table in PROC SQL. Thus, each row (or observation) has a weight of 1.




Base SAS 9.1.3 Procedures Guide (Vol. 2)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 142

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net