Examples


This section provides advanced examples of the PLOT statement.

Example 18.1. Using Box Plots to Compare Groups

In the following example, a box plot is used to compare the delay times for airline flights during the Christmas holidays with the delay times prior to the holiday period. The following statements create a data set named Times with the delay times in minutes for 25 flights each day. When a flight is canceled , the delay is recorded as a missing value.

  data Times;   informat day date7. ;   format   day date7. ;   input day @ ;   do flight=1 to 25;   input delay @ ;   output;   end;   datalines;   16DEC88   4  12   2   2  18   5   6  21   0   0   0  14   3   .   2   3   5   0   6  19   7   4   9   5  10   17DEC88   1  10   3   3   0   1   5   0   .   .   1   5   7   1   7   2   2  16   2   1   3   1  31   5   0   18DEC88   7   8   4   2   3   2   7   6  11   3   2   7   0   1  10   2   3  12   8   6   2   7   2   4   5   19DEC88  15   6   9   0  15   7   1   1   0   2   5   6   5  14   7  20   8   1  14   3   10   0   1  11   7   20DEC88   2   1   0   4   4   6   2   2   1   4   1  11   .   1   0   6   5   5   4   2   2   6   6   4   0   21DEC88   2   6   6   2   7   7   5   2   5   0   9   2   4   2   5   1   4   7   5   6   5   0   4  36  28   22DEC88   3   7  22   1  11  11  39  46   7  33   19  21   1   3  43  23   9   0  17  35   50   0   2   1   0   23DEC88   6  11   8  35  36  19  21   .   .   4   6  63  35   3  12  34   9   0  46   0   0  36   3   0  14   24DEC88  13   2  10   4   5  22  21  44  66  13   8   3   4  27   2  12  17  22  19  36   9  72   2   4   4   25DEC88   4  33  35   0  11  11  10  28  34   3   24   6  17   0   8   5   7  19   9   7   21  17  17   2   6   26DEC88   3   8   8   2   7   7   8   2   5   9   2   8   2  10  16   9   5  14  15   1   12   2   2  14  18   ;   run;  

In the following statements, the MEANS procedure is used to count the number of canceled flights for each day. This information is then added to the data set Times .

  proc means data=Times noprint;   var delay;   by day;   output out=Cancel nmiss=ncancel;   data Times;   merge Times Cancel;   by day;   run;  

The following statements create a data set named Weather that contains information about possible causes for delays. This data set is merged with the data set Times .

  data Weather;   informat day date7. ;   format   day date7. ;   length reason $ 16 ;   input day flight reason & ;   datalines;   16DEC88  8   Fog   17DEC88  18  Snow Storm   17DEC88  23  Sleet   21DEC88  24  Rain   21DEC88  25  Rain   22DEC88  7   Mechanical   22DEC88  15  Late Arrival   24DEC88  9   Late Arrival   24DEC88  22  Late Arrival   ;   run;   data Times;   merge Times Weather;   by day flight;   run;  

The following statements create a box plot for the complete set of data.

  symbol1 v=plus     c=black;   symbol2 v=square   c=black;   symbol3 v=triangle c=black;   title 'Box Plot for Airline Delays';   proc boxplot data=Times;   plot delay*day = ncancel /   nohlabel   symbollegend = legend1;   legend1 label = ('Cancellations:');   label delay = 'Delay in Minutes';   run;  

The box plot is shown in Output 18.1.1. The level of the symbol-variable ncancel determines the symbol marker for each group mean, and the SYMBOLLEGEND= option controls the appearance of the legend for the symbols. The NOHLABEL option suppresses the label for the horizontal axis.

Output 18.1.1: Box Plot for Airline Data
start example
  click to expand  
end example
 

The delay distributions from December 22 through December 25 are drastically different from the delay distributions during the pre-holiday period. Both the mean delay and the variability of the delays are much greater during the holiday period.

Example 18.2. Creating Various Styles of Box-and-Whisker Plots

The following example uses the flight delay data of the preceding example to illustrate how you can create box plots with various styles of box-and-whisker plots. The following statements create a plot, shown in Output 18.2.1, that displays skeletal box-and-whisker plots:

Output 18.2.1: BOXSTYLE=SKELETAL
start example
  click to expand  
end example
 
  symbol1 v=plus c=black;   title 'Analysis of Airline Departure Delays';   title2 'BOXSTYLE=SKELETAL';   proc boxplot data=Times;   plot delay*day /   boxstyle = skeletal   nohlabel;   label delay = 'Delay in Minutes';   run;  

In a skeletal box-and-whisker plot, the whiskers are drawn from the quartiles to the extreme values of the group. The skeletal box-and-whisker plot is the default style; consequently, you can also request this style by omitting the BOXSTYLE= option.

The following statements request a box plot with schematic box-and-whisker plots:

  symbol1 v=plus c=black;   title 'Analysis of Airline Departure Delays';   title2 'BOXSTYLE=SCHEMATIC';   proc boxplot data=Times;   plot delay*day /   boxstyle = schematic   nohlabel;   label delay = 'Delay in Minutes';   run;  

The plot is shown in Output 18.2.2. When BOXSTYLE=SCHEMATIC is specified, the whiskers are drawn to the most extreme points in the group that lie within the fences. The upper fence is defined as the third quartile (represented by the upper edge of the box) plus 1.5 times the interquartile range (IQR). The lower fence is defined as the first quartile (represented by the lower edge of the box) minus 1.5 times the interquartile range. Observations outside the fences are identified with a special symbol. The default symbol is a square, and you can specify the shape and color for this symbol with the IDSYMBOL= and IDCOLOR= options. Serifs are added to the whiskers by default. For further details, see the entry for the BOXSTYLE= option on page 493.

Output 18.2.2: BOXSTYLE=SCHEMATIC
start example
  click to expand  
end example
 

The following statements create a box plot with schematic box-and-whisker plots in which the observations outside the fences are labeled:

  symbol1 v=plus c=black;   title 'Analysis of Airline Departure Delays';   title2 'BOXSTYLE=SCHEMATICID';   proc boxplot data=Times;   plot delay*day /   boxstyle = schematicid   nohlabel;   id reason;   label delay = 'Delay in Minutes';   run;  

The plot is shown in Output 18.2.3. If you specify BOXSTYLE=SCHEMATICID, schematic box-and-whisker plots are displayed in which the value of the first ID variable (in this case, reason ) is used to label each observation outside the fences.

Output 18.2.3: BOXSTYLE=SCHEMATICID
start example
  click to expand  
end example
 

The following statements create a box plot with schematic box-and-whisker plots in which only the extreme observations outside the fences are labeled:

  title 'Analysis of Airline Departure Delays';   title2 'BOXSTYLE=SCHEMATICIDFAR';   symbol v=plus color=black;   proc boxplot data=Times;   plot delay*day /   boxstyle = schematicidfar   nohlabel;   id reason;   label delay = 'Delay in Minutes';   run;  

The plot is shown in Output 18.2.4. If you specify BOXSTYLE=SCHEMATICIDFAR, schematic plots are displayed in which the value of the first ID variable is used to label each observation outside the lower and upper far fences . The lower and upper far fences are located 3 —IQR below the 25th percentile and above the 75th percentile, respectively. Observations between the fences and the far fences are identified with a symbol but are not labeled.

Output 18.2.4: BOXSTYLE=SCHEMATICIDFAR
start example
  click to expand  
end example
 

Other options for controlling the display of box-and-whisker plots include the BOXWIDTH=, BOXWIDTHSCALE=, CBOXES=, CBOXFILL=, and LBOXES= options.

Example 18.3. Creating Notched Box-and-Whisker Plots

The following statements use the flight delay data of Example 18.1 to illustrate how to create box-and-whisker plots with notches:

  symbol1 v=plus c=black;   title 'Analysis of Airline Departure Delays';   title2 'Using the NOTCHES Option';   proc boxplot data=Times;   plot delay*day /   boxstyle = schematicid   nohlabel   notches;   id reason;   label delay = 'Delay in Minutes';   run;  

The notches, requested with the NOTCHES option, measure the significance of the difference between two medians. The medians of two box plots are significantly different at approximately the 0 . 05 level if the corresponding notches do not overlap.

For example, in Output 18.3.1, the median for December 20 is significantly different from the median for December 24.

Output 18.3.1: Notched Side-by-Side Box-and-Whisker Plots
start example
  click to expand  
end example
 

Example 18.4. Creating Box-and-Whisker Plots with Varying Widths

The following example shows how to create a box plot with box-and-whisker plots whose widths vary proportionately with the group size . The following statements create a SAS data set named Times2 that contains flight departure delays (in minutes) recorded daily for eight consecutive days:

  data Times2;   label delay = 'Delay in Minutes';   informat day date7. ;   format   day date7. ;   input day @ ;   do flight=1 to 25;   input delay @ ;   output;   end;   datalines;   01MAR90   12  4   2   2  15   8   0  11   0   0   0  12   3   .   2   3   5   0   6  25   7   4   9   5  10   02MAR90   1   .   3   .   0   1   5   0   .   .   1   5   7   .   7   2   2  16   2   1   3   1  31   .   0   03MAR90   6   8   4   2   3   2   7   6  11   3   2   7   0   1  10   2   5  12   8   6   2   7   2   4   5   04MAR90  12   6   9   0  15   7   1   1   0   2   5   6   5  14   7  21   8   1  14   3   11   0   1  11   7   05MAR90   2   1   0   4   .   6   2   2   1   4   1  11   .   1   0   .   5   5   .   2   3   6   6   4   0   06MAR90   8   6   5   2   9   7   4   2   5   1   2   2   4   2   5   1   3   9   7   8   1   0   4  26  27   07MAR90   9   6   6   2   7   8   .   .  10   8   0   2   4   3   .   .   .   7   .   6   4   0   .   .   .   08MAR90   1   6   6   2   8   8   5   3   5   0   8   2   4   2   5   1   6   4   5  10   2   0   4   1   1   run;  

The following statements create the box plot shown in Output 18.4.1:

Output 18.4.1: Box Plot with Box-and-Whisker Plots of Varying Widths
start example
  click to expand  
end example
 
  title 'Analysis of Airline Departure Delays';   title2 'Using the BOXWIDTHSCALE= Option';   symbol1 v=plus c=black;   proc boxplot data=Times2;   plot delay*day /   nohlabel   boxstyle      = schematic   boxwidthscale = 1   bwslegend;   run;  

The BOXWIDTHSCALE= value option specifies that the width of box plots is to vary proportionately to a particular function of the group size n . The function is determined by the value and is identified on the plot with a legend if the BWSLEGEND option is specified. The BOXWIDTHSCALE= option is useful in situations where the group sizes vary widely.




SAS.STAT 9.1 Users Guide (Vol. 1)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net