Syntax


The TREE procedure is invoked by the following statements:

  • PROC TREE < options > ;

    • NAME variables ;

    • HEIGHT variable ;

    • PARENT variables ;

    • BY variables ;

    • COPY variables ;

    • FREQ variable ;

    • ID variable ;

If the input data set has been created by CLUSTER or VARCLUS, the only statement required is the PROC TREE statement. The BY, COPY, FREQ, HEIGHT, ID, NAME, and PARENT statements are described after the PROC TREE statement.

PROC TREE Statement

  • PROC TREE < options > ;

The PROC TREE statement starts the TREE procedure.

The options that can appear in the PROC TREE statement are summarized in the following table.

Table 76.1: PROC TREE Statement Options

Task

Options

Effect

Specify data sets

DATA=

DOCK=

LEVEL=

NCLUSTERS=

OUT=

ROOT=

specifies the input data set

does not count small clusters in OUT= data set

defines disjoint cluster in OUT= data set

specifies the number of clusters in OUT= data set

specifies the output data set

displays the root of a subtree

Specify cluster heights

HEIGHT=

DISSIMILAR

SIMILAR

specifies the variable for the height axis

specifies that large values are far apart

specifies that small values are close together

Display horizontal trees

HORIZONTAL

specifies that the height axis is horizontal

Control sort order

DESCENDING

SORT

reverses SORT order

sorts children by HEIGHT variable

Control displayed output

LIST

NOPRINT

LINEPRINTER

displays all nodes in the tree

suppresses display of the tree

displays tree using line printer style graphics

High resolution graphics

INC=

MAXHEIGHT=

MINHEIGHT=

NTICK=

CFRAME=

DESCRIPTION=

GOUT=

HAXIS=

HORDISPLAY=

HPAGES=

LINES=

NAME=

VAXIS=

VPAGES=

specifies the increment between tick values

specifies the maximum value on axis

specifies the minimum value on axis

specifies the number of tick intervals

specifies the color of the frame

specifies the catalog description

specifies the catalog name

customizes horizontal axis

displays a horizontal tree with leaves on the right

specifies the number of pages to expand tree horizontally

specifies the line color and thickness , dots at the nodes

specifies the name of graph in the catalog

customizes vertical axis

specifies the number of pages to expand tree vertically

Line printer graphics

INC=

MAXHEIGHT=

MINHEIGHT=

NTICK=

PAGES=

POS=

SPACES=

TICKPOS=

FILLCHAR=

JOINCHAR=

LEAFCHAR=

TREECHAR=

specifies the increment between tick values

specifies the maximum value on axis

specifies the minimum value on axis

specifies the number of tick intervals

specifies the number of pages

specifies the number of column positions

specifies the number of spaces between objects

specifies the number of column positions between ticks

specifies the fill character between unjoined leaves

specifies the character to display between joined leaves

specifies the character to represent clusters with no children

specifies the character to represent clusters with children

CFRAME= color

  • specifies a color for the frame, which is the rectangle bounded by the axes.

DATA= SAS-data-set

  • specifies the input data set defining the tree. If you omit the DATA= option, the most recently created SAS data set is used.

DESCENDING

DES

  • reverses the sorting order for the SORT option.

DESCRIPTION= entry-description

  • specifies a description for the graph in the GOUT= catalog. The default is 'Proc Tree Graph Output.'

DISSIMILAR

DIS

  • implies that the values of the HEIGHT variable are dissimilarities; that is, a large height value means that the clusters are very dissimilar or far apart.

  • If neither the SIMILAR nor the DISSIMILAR option is specified, PROC TREE attempts to infer from the data whether the height values are similarities or dissimilarities. If PROC TREE cannot tell this from the data, it issues an error message and does not display a tree diagram.

DOCK= n

  • causes observations in the OUT= data set assigned to output clusters with a frequency of n or less to be given missing values for the output variables CLUSTER and CLUSNAME . If the NCLUSTERS= option is also specified, DOCK= also prevents clusters with a frequency of n or less from being counted toward the number of clusters requested by the NCLUSTERS= option. By default, DOCK=0.

FILLCHAR= 'c'

FC= 'c'

  • specifies the character to display between leaves that are not joined into a cluster. The character should be enclosed in single quotes. The default is a blank. The LINEPRINTER option must also be specified.

GOUT= < libref. > member-name

  • specifies the catalog in which the generated graph is stored. The default is WORK.GSEG.

HAXIS=AXIS n

  • specifies the AXIS n statement used to customize the appearance of the horizontal axis.

HEIGHT= name

H= name

  • specifies certain conventional variables to be used for the height axis of the tree diagram. For many situations, the only option you need is the HEIGHT= option. Valid values for name and their meanings are as follows :

    HEIGHT H

    specifies the _HEIGHT_ variable.

    LENGTH L

    defines the height of each node as its path length from the root. This can also be interpreted as the number of ancestors of the node.

    MODE M

    specifies the _MODE_ variable.

    NCL N

    specifies the _NCL_ (number of clusters) variable.

    RSQ R

    specifies the _RSQ_ variable.

  • See also the 'HEIGHT Statement' section on page 4755, which can specify any variable in the input data set to be used for the height axis. In rare cases, you may need to specify either the DISSIMILAR option or the SIMILAR option.

HORDISPLAY=RIGHT

  • specifies that the graph is to be oriented horizontally, with the leaf nodes on the right side, when the HORIZONTAL option is also specified. By default, the leaf nodes are on the left side.

HORIZONTAL

HOR

  • orients the tree diagram with the height axis horizontal and the root at the left. The leaf nodes are on the side specified in the HORDISPLAY= option. If you do not specify the HORIZONTAL option, the height axis is vertical, with the root at the top. When the tree takes up more than one page and is viewed on a screen, horizontal orientation can make the tree diagram considerably easier to read.

HPAGES= n1

  • specifies that the original graph is to be enlarged to cover n 1 pages. If you also specify the VPAGES= n 2 option, the original graph is enlarged to cover n 1 — n 2 graphs. For example, if HPAGES=2 and VPAGES=3, then the original graph is generated followed by 2 — 3 = 6 more graphs. In these six graphs, the original is enlarged by a factor of 2 in the horizontal direction and by a factor of 3 in the vertical direction. The graphs are generated in left-to-right and top-to-bottom order.

INC= n

  • specifies the increment between tick values on the height axis. If the HEIGHT variable is _NCL_ , the default is usually 1, although a different value can be specified for consistency with other options. For any other HEIGHT variable, the default is some power of 10 times 1, 2, 2.5, or 5.

JOINCHAR= 'c'

JC= 'c'

  • specifies the character to display between leaves that are joined into a cluster. The character should be enclosed in single quotes. The default is X . The LINEPRINTER option must also be specified.

LEAFCHAR= 'c'

LC= 'c'

  • specifies a character to represent clusters having no children. The character should be enclosed in single quotes. The default is a period. The LINEPRINTER option must also be specified.

LEVEL= n

  • specifies the level of the tree defining disjoint clusters for the OUT= data set. The LEVEL= option also causes only clusters between the root and a height of n to be displayed. The clusters in the output data set are those that exist at a height of n on the tree diagram. For example, if the HEIGHT variable is _NCL_ (number of clusters) and LEVEL=5 is specified, then the OUT= data set contains five disjoint clusters. If the HEIGHT variable is _RSQ_ ( R 2 ) and LEVEL=0.9 is specified, then the OUT= data set contains the smallest number of clusters that yields an R 2 of at least 0.9.

LINEPRINTER

  • specifies that the generated report is to be displayed using line printer graphics.

LINES=( < COLOR= color >< WIDTH= n >< DOTS > )

  • enables you to specify both the color and the thickness of the lines. In addition, a dot can be drawn at each leaf node. Note that if the frame and the lines are specified to be the same color, PROC TREE selects a different color for the lines.

LIST

  • lists all the nodes in the tree, displaying the height, parent, and children of each node.

MAXHEIGHT= n

MAXH= n

  • specifies the maximum value displayed on the height axis.

MINHEIGHT= n

MINH= n

  • specifies the minimum value displayed on the height axis.

NAME= name

  • specifies the entry name for the generated graph in the GOUT= catalog. Note that each time another graph is generated with the same name, the name is modified by appending a number to make it unique.

NCLUSTERS= n

NCL= n

N= n

  • specifies the number of clusters desired in the OUT= data set. The number of clusters obtained may not equal the number specified if (1) there are fewer than n leaves in the tree, (2) there are more than n unconnected trees in the data set, (3) a multi-way tree does not contain a level with the specified number of clusters, or (4) the DOCK= option eliminates too many clusters.

  • The NCLUSTERS= option uses the _NCL_ variable to determine the order in which the clusters are formed . If there is no _NCL_ variable, the height variable (as determined by the HEIGHT statement or HEIGHT= option) is used instead.

NTICK= n

  • specifies the number of tick intervals on the height axis. The default depends on the values of other options.

NOPRINT

  • suppresses the display of the tree. Specify the NOPRINT option if you want only to create an OUT= data set. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, 'Using the Output Delivery System.'

OUT= SAS-data-set

  • creates an output data set that contains one observation for each object in the tree or subtree being processed and variables called CLUSTER and CLUSNAME showing cluster membership at any specified level in the tree. If you specify the OUT= option, you must also specify either the NCLUSTERS= or LEVEL= option in order to define the output partition level. If you want to create a permanent SAS data set, you must specify a two-level name (refer to 'SAS Data Files' in SAS Language Reference: Concepts ).

PAGES= n

  • specifies the number of pages over which the tree diagram (from root to leaves) is to extend. The default is 1. The LINEPRINTER option must also be specified.

POS= n

  • specifies the number of column positions on the height axis. The default depends on the value of the PAGES= option, the orientation of the tree diagram, and the values specified by the PAGESIZE= and LINESIZE= options. The LINEPRINTER option must also be specified.

ROOT= 'name'

  • specifies the value of the NAME variable for the root of a subtree to be displayed if you do not want to display the entire tree. If you also specify the OUT= option, the output data set contains only objects belonging to the subtree specified by the ROOT= option.

SIMILAR

SIM

  • implies that the values of the HEIGHT variable are similarities; that is, a large height value means that the clusters are very similar or close together.

  • If neither the SIMILAR nor the DISSIMILAR option is specified, PROC TREE attempts to infer from the data whether the height values are similarities or dissimilarities. If PROC TREE cannot tell this from the data, it issues an error message and does not display a tree diagram.

SORT

  • sorts the children of each node by the HEIGHT variable, in the order of cluster formation. See the DESCENDING option on page 4750.

SPACES= s

S= s

  • specifies the number of spaces between objects on the output. The default depends on the number of objects, the orientation of the tree diagram, and the values specified by the PAGESIZE= and LINESIZE= options. The LINEPRINTER option must also be specified.

TICKPOS= n

  • specifies the number of column positions per tick interval on the height axis. The default value is usually between 5 and 10, although a different value can be specified for consistency with other options.

TREECHAR= 'c'

TC= 'c'

  • specifies a character to represent clusters with children. The character should be enclosed in single quotes. The default is X . The LINEPRINTER option must also be specified.

VAXIS=AXIS n

  • specifies that the AXIS n statement be used to customize the appearance of the vertical axis.

VPAGES= n2

  • specifies that the original graph is to be enlarged to cover n 2 pages. If you also specify the HPAGES= n 1 option, the original graph is enlarged to cover n 1 — n 2 pages. For example, if HPAGES=2 and VPAGES=3, then the original graph is generated followed by 2 — 3 = 6 more graphs. In these six graphs, the original is enlarged by a factor of 2 in the horizontal direction and by a factor of 3 in the vertical direction. The graphs are generated in left-to-right and top-to-bottom order.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC TREE to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the TREE procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

COPY Statement

  • COPY variables ;

The COPY statement specifies one or more character or numeric variables to be copied to the OUT= data set.

FREQ Statement

  • FREQ variables ;

The FREQ statement specifies one numeric variable that tells how many clustering observations belong to the cluster. If the FREQ statement is omitted, PROC TREE looks for a variable called _FREQ_ to specify the number of observations per cluster. If neither the FREQ statement nor the _FREQ_ variable is present, each leaf is assumed to represent one clustering observation, and the frequency for each internal node is found by summing the frequencies of its children.

HEIGHT Statement

  • HEIGHT variable ;

The HEIGHT statement specifies the name of a numeric variable to define the height of each node (cluster) in the tree. The height variable can also be specified by the HEIGHT= option in the PROC TREE statement. If both the HEIGHT statement and the HEIGHT= option are omitted, PROC TREE looks for a variable called _HEIGHT_ . If the data set does not contain _HEIGHT_ , PROC TREE looks for a variable called _NCL_ . If _NCL_ is not found either, the height of each node is defined to be its path length from the root.

ID Statement

  • ID variables ;

The ID variable is used to identify the objects (leaves) in the tree on the output. The ID variable can be a character or numeric variable of any length. If the ID statement is omitted, the variable in the NAME statement is used instead. If both the ID and NAME statements are omitted, PROC TREE looks for a variable called _NAME_ . If the _NAME_ variable is not found in the data set, PROC TREE issues an error message and stops. The ID variable is copied to the OUT= data set. 4756 Chapter 76. The TREE Procedure

NAME Statement

  • NAME variables ;

The NAME statement specifies a character or numeric variable identifying the node represented by each observation. The NAME variable and the PARENT variable jointly define the tree structure. If the NAME statement is omitted, PROC TREE looks for a variable called _NAME_ . If the _NAME_ variable is not found in the data set, PROC TREE issues an error message and stops.

PARENT Statement

  • PARENT variables ;

The PARENT statement specifies a character or numeric variable identifying the node in the tree that is the parent of each observation. The PARENT variable must have the same formatted length as the NAME variable. If the PARENT statement is omitted, PROC TREE looks for a variable called _PARENT_ .If the _PARENT_ variable is not found in the data set, PROC TREE issues an error message and stops.




SAS.STAT 9.1 Users Guide (Vol. 7)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 132

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net