Examples: MEANS Procedure


Example 1: Computing Specific Descriptive Statistics

Procedure features:

  • PROC MEANS statement options:

    • statistic keywords

    • FW=

  • VAR statement

This example

  • specifies the analysis variables

  • computes the statistics for the specified keywords and displays them in order

  • specifies the field width of the statistics.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Create the CAKE data set. CAKE contains data from a cake-baking contest: each participant s last name , age, score for presentation, score for taste, cake flavor, and number of cake layers. The number of cake layers is missing for two observations. The cake flavor is missing for another observation.

 data cake;     input LastName $ 1-12 Age 13-14 PresentScore 16-17           TasteScore 19-20 Flavor $ 23-32 Layers 34 ;     datalines;  Orlando     27 93 80  Vanilla    1  Ramey       32 84 72  Rum        2  Goldston    46 68 75  Vanilla    1  Roe         38 79 73  Vanilla    2  Larsen      23 77 84  Chocolate  .  Davis       51 86 91  Spice      3  Strickland  19 82 79  Chocolate  1  Nguyen      57 77 84  Vanilla    .  Hildenbrand 33 81 83  Chocolate  1  Byron       62 72 87  Vanilla    2  Sanders     26 56 79  Chocolate  1  Jaeger      43 66 74             1  Davis       28 69 75  Chocolate  2  Conrad      69 85 94  Vanilla    1  Walters     55 67 72  Chocolate  2  Rossburger  28 78 81  Spice      2  Matthew     42 81 92  Chocolate  2  Becker      36 62 83  Spice      2  Anderson    27 87 85  Chocolate  1  Merritt     62 73 84  Chocolate  1  ; 

Specify the analyses and the analysis options. The statistic keywords specify the statistics and their order in the output. FW= uses a field width of eight to display the statistics.

 proc means data=cake n mean max min range std fw=8; 

Specify the analysis variables. The VAR statement specifies that PROC MEANS calculate statistics on the PresentScore and TasteScore variables.

 var PresentScore TasteScore; 

Specify the title.

 title 'Summary of Presentation and Taste Scores';  run; 

Output

PROC MEANS lists PresentScore first because this is the first variable that is specified in the VAR statement. A field width of eight truncates the statistics to four decimal places.

 Summary of Presentation and Taste Scores                          1                              The MEANS Procedure  Variable        N        Mean     Maximum     Minimum       Range      Std Dev  ------------------------------------------------------------------------------- PresentScore   20     76.1500     93.0000     56.0000     37.0000       9.3768  TasteScore     20     81.3500     94.0000     72.0000     22.0000       6.6116  ------------------------------------------------------------------------------- 

Example 2: Computing Descriptive Statistics with Class Variables

Procedure features:

  • PROC MEANS statement option:

    • MAXDEC=

  • CLASS statement

  • TYPES statement

This example

  • analyzes the data for the two-way combination of class variables and across all observations

  • limits the number of decimal places for the displayed statistics.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Create the GRADE data set. GRADE contains each student s last name, gender, status of either undergraduate (1) or graduate (2), expected year of graduation, class section (A or B), final exam score, and final grade for the course.

 data grade;     input Name $ 1-8 Gender $ 11 Status  Year $ 15-16           Section $ 18 Score 20-21 FinalGrade 23-24;     datalines;  Abbott    F 2 97 A 90 87  Branford  M 1 98 A 92 97  Crandell  M 2 98 B 81 71  Dennison  M 1 97 A 85 72  Edgar     F 1 98 B 89 80  Faust     M 1 97 B 78 73  Greeley   F 2 97 A 82 91  Hart      F 1 98 B 84 80  Isley     M 2 97 A 88 86  Jasper    M 1 97 B 91 93  ; 

Generate the default statistics and specify the analysis options. Because no statistics are specified in the PROC MEANS statement, all default statistics (N, MEAN, STD, MIN, MAX) are generated. MAXDEC= limits the displayed statistics to three decimal places.

 proc means data=grade maxdec=3; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the Score variable.

 var Score; 

Specify subgroups for the analysis. The CLASS statement separates the analysis into subgroups. Each combination of unique values for Status and Year represents a subgroup .

 class Status Year; 

Specify which subgroups to analyze. The TYPES statement requests that the analysis be performed on all the observations in the GRADE data set as well as the two-way combination of Status and Year, which results in four subgroups (because Status and Year each have two unique values).

 types () status*year; 

Specify the title.

 title 'Final Exam Grades for Student Status and Year of Graduation';  run; 

Output

PROC MEANS displays the default statistics for all the observations (_TYPE_=0) and the four class levels of the Status and Year combination (Status=1, Year=97; Status=1, Year=98; Status=2, Year=97; Status=2, Year=98).

 Final Exam Grades for Student Status and Year of Graduation            1                               The MEANS Procedure                            Analysis Variable : Score      N    Obs     N            Mean         Std Dev         Minimum          Maximum    --------------------------------------------------------------------------    10    10          86.000           4.714          78.000           92.000    --------------------------------------------------------------------------                          Analysis Variable : Score                  N  Status  Year  Obs   N          Mean       Std Dev      Minimum         Maximum  ------------------------------------------------------------------------------ 1       97      3   3        84.667         6.506       78.000          91.000          98      3   3        88.333         4.041       84.000          92.000  2       97      3   3        86.667         4.163       82.000          90.000          98      1   1        81.000          .          81.000          81.000  ------------------------------------------------------------------------------ 

Example 3: Using the BY Statement with Class Variables

Procedure features:

  • PROC MEANS statement option:

    • statistic keywords

  • BY statement

  • CLASS statement

Other features:

  • SORT procedure

Data set: GRADE on page 561

This example

  • separates the analysis for the combination of class variables within BY values

  • shows the sort order requirement for the BY statement

  • calculates the minimum, maximum, and median.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Sort the GRADE data set. PROC SORT sorts the observations by the variable Section. Sorting is required in order to use Section as a BY variable in the PROC MEANS step.

 proc sort data=Grade out=GradeBySection;     by section;  run; 

Specify the analyses. The statistic keywords specify the statistics and their order in the output.

 proc means data=GradeBySection min max median; 

Divide the data set into BY groups. The BY statement produces a separate analysis for each value of Section.

 by Section; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the Score variable.

 var Score; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by the values of Status and Year. Because there is no TYPES statement in this program, analyses are performed for each subgroup, within each BY group .

 class Status Year; 

Specify the titles.

 title1 'Final Exam Scores for Student Status and Year of Graduation';     title2 ' Within Each Section';  run; 

Output

 Final Exam Scores for Student Status and Year of Graduation          1                                 Within Each Section  ---------------------------------- Section=A -----------------------------------                               The MEANS Procedure                            Analysis Variable : Score                           N        Status   Year    Obs         Minimum         Maximum           Median        ---------------------------------------------------------------------       1          97      1      85.0000000      85.0000000       85.0000000                   98      1      92.0000000      92.0000000       92.0000000        2          97      3      82.0000000      90.0000000       88.0000000        --------------------------------------------------------------------- ---------------------------------- Section=B -----------------------------------                           Analysis Variable : Score                           N        Status   Year    Obs         Minimum         Maximum           Median        ---------------------------------------------------------------------       1          97      2      78.0000000      91.0000000       84.5000000                   98      2      84.0000000      89.0000000       86.5000000        2          98      1      81.0000000      81.0000000       81.0000000        --------------------------------------------------------------------- 

Example 4: Using a CLASSDATA= Data Set with Class Variables

Procedure features:

  • PROC MEANS statement options:

    • CLASSDATA=

    • EXCLUSIVE

    • FW=

    • MAXDEC=

    • PRINTALLTYPES

  • CLASS statement

Data set: CAKE on page 559

This example

  • specifies the field width and decimal places of the displayed statistics

  • uses only the values in CLASSDATA= data set as the levels of the combinations of class variables

  • calculates the range, median, minimum, and maximum

  • displays all combinations of the class variables in the analysis.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Create the CAKETYPE data set. CAKETYPE contains the cake flavors and number of layers that must occur in the PROC MEANS output.

 data caketype;     input Flavor $ 1-10 Layers 12;     datalines;  Vanilla    1  Vanilla    2  Vanilla    3  Chocolate  1  Chocolate  2  Chocolate  3  ; 

Specify the analyses and the analysis options. The FW= option uses a field width of seven and the MAXDEC= option uses zero decimal places to display the statistics. CLASSDATA= and EXCLUSIVE restrict the class levels to the values that are in the CAKETYPE data set. PRINTALLTYPES displays all combinations of class variables in the output.

 proc means data=cake range median min max fw=7 maxdec=0             classdata=caketype exclusive printalltypes; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the TasteScore variable.

 var TasteScore; 

Specify subgroups for analysis. The CLASS statement separates the analysis by the values of Flavor and Layers. Note that these variables, and only these variables, must appear in the CAKETYPE data set.

 class flavor layers; 

Specify the title.

 title 'Taste Score For Number of Layers and Cake Flavor';  run; 

Output

PROC MEANS calculates statistics for the 13 chocolate and vanilla cakes. Because the CLASSDATA= data set contains 3 as the value of Layers, PROC MEANS uses 3 as a class value even though the frequency is zero.

 Taste Score For Number of Layers and Cake Flavor              1                             The MEANS Procedure                       Analysis Variable : TasteScore                 N               Obs      Range     Median    Minimum     Maximum               ------------------------------------------------               13         22         80         72          94               ------------------------------------------------                       Analysis Variable : TasteScore                       N           Layers    Obs      Range     Median    Minimum     Maximum           ----------------------------------------------------------               1      8         19         82         75          94                2      5         20         75         72          92                3      0          .          .          .           .           ----------------------------------------------------------                      Analysis Variable : TasteScore                        N        Flavor        Obs      Range     Median    Minimum     Maximum        --------------------------------------------------------------       Chocolate       8         20         81         72          92        Vanilla         5         21         80         73          94        --------------------------------------------------------------                      Analysis Variable : TasteScore                              N  Flavor          Layers    Obs      Range     Median    Minimum     Maximum  -------------------------------------------------------------------------- Chocolate            1      5          6         83         79          85                       2      3         20         75         72          92                       3      0          .          .          .           .  Vanilla              1      3         19         80         75          94                       2      2         14         80         73          87                       3      0          .          .          .           .  -------------------------------------------------------------------------- 

Example 5: Using Multilabel Value Formats with Class Variables

Procedure features:

  • PROC MEANS statement options:

    • statistic keywords

    • FW=

    • NONOBS

  • CLASS statement options:

    • MLF

    • ORDER=

  • TYPES statement

Other features

  • FORMAT procedure

  • FORMAT statement

Data set: CAKE on page 559

This example

  • computes the statistics for the specified keywords and displays them in order

  • specifies the field width of the statistics

  • suppresses the column with the total number of observations

  • analyzes the data for the one-way combination of cake flavor and the two-way combination of cake flavor and participant s age

  • assigns user -defined formats to the class variables

  • uses multilabel formats as the levels of class variables

  • orders the levels of the cake flavors by the descending frequency count and orders the levels of age by the ascending formatted values.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=64; 

Create the $FLVRFMT. and AGEFMT. formats. PROC FORMAT creates user-defined formats to categorize the cake flavors and ages of the participants . MULTILABEL creates a multilabel format for Age. A multilabel format is one in which multiple labels can be assigned to the same value, in this case because of overlapping ranges. Each value is represented in the output for each range in which it occurs.

 proc format;     value $flvrfmt                  'Chocolate'='Chocolate'                  'Vanilla'='Vanilla'                  'Rum','Spice'='Other Flavor';     value agefmt (multilabel)                    15 - 29='below 30 years'                    30 - 50='between 30 and 50'                    51 - high='over 50 years'                    15 - 19='15 to 19'                    20 - 25='20 to 25'                    25 - 39='25 to 39'                    40 - 55='40 to 55'                    56 - high='56 and above';  run; 

Specify the analyses and the analysis options. FW= uses a field width of six to display the statistics. The statistic keywords specify the statistics and their order in the output. NONOBS suppresses the N Obs column.

 proc means data=cake fw=6 n min max median nonobs; 

Specify subgroups for the analysis. The CLASS statements separate the analysis by values of Flavor and Age. ORDER=FREQ orders the levels of Flavor by descending frequency count. ORDER=FMT orders the levels of Age by ascending formatted values. MLF specifies that multilabel value formats be used for Age.

 class flavor/order=freq;  class age /mlf order=fmt; 

Specify which subgroups to analyze. The TYPES statement requests the analysis for the one-way combination of Flavor and the two-way combination of Flavor and Age.

 types flavor flavor*age; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the TasteScore variable.

 var TasteScore; 

Format the output. The FORMAT statement assigns user-defined formats to the Age and Flavor variables for this analysis.

 format age agefmt. flavor $flvrfmt.; 

Specify the title.

 title 'Taste Score for Cake Flavors and Participant''s Age';  run; 

Output

The one-way combination of class variables appears before the two-way combination. A field width of six truncates the statistics to four decimal places. For the two-way combination of Age and Flavor, the total number of observations is greater than the one-way combination of Flavor. This situation arises because of the multilabel format for age, which maps one internal value to more than one formatted value.

The order of the levels of Flavor is based on the frequency count for each level. The order of the levels of Age is based on the order of the user-defined formats.

 Taste Score for Cake Flavors and Participant's Age            1                           The MEANS Procedure                     Analysis Variable : TasteScore                                 Flavor           N       Min       Max     Median             -------------------------------------------------            Chocolate        9     72.00     92.00      83.00             Vanilla          6     73.00     94.00      82.00             Other Flavor     4     72.00     91.00      82.00             -------------------------------------------------                                        Analysis Variable : TasteScore  Flavor          Age                   N       Min       Max     Median  ---------------------------------------------------------------------- Chocolate       15 to 19              1     79.00     79.00      79.00                  20 to 25              1     84.00     84.00      84.00                  25 to 39              4     75.00     85.00      81.00                  40 to 55              2     72.00     92.00      82.00                  56 and above          1     84.00     84.00      84.00                  below 30 years        5     75.00     85.00      79.00                  between 30 and 50     2     83.00     92.00      87.50                  over 50 years         2     72.00     84.00      78.00    Vanilla         25 to 39              2     73.00     80.00      76.50                  40 to 55              1     75.00     75.00      75.00                  56 and above          3     84.00     94.00      87.00                  below 30 years        1     80.00     80.00      80.00                  between 30 and 50     2     73.00     75.00      74.00                  over 50 years         3     84.00     94.00      87.00  Other Flavor    25 to 39              3     72.00     83.00      81.00                  40 to 55              1     91.00     91.00      91.00                  below 30 years        1     81.00     81.00      81.00                  between 30 and 50     2     72.00     83.00      77.50                  over 50 years         1     91.00     91.00      91.00  ---------------------------------------------------------------------- 

Example 6: Using Preloaded Formats with Class Variables

Procedure features:

  • PROC MEANS statement options:

    • COMPLETETYPES

    • FW=

    • MISSING

    • NONOBS

  • CLASS statement options:

    • EXCLUSIVE

    • ORDER=

    • PRELOADFMT

  • WAYS statement

Other features

  • FORMAT procedure

  • FORMAT statement

Data set: CAKE on page 559

This example

  • specifies the field width of the statistics

  • suppresses the column with the total number of observations

  • includes all possible combinations of class variables values in the analysis even if the frequency is zero

  • considers missing values as valid class levels

  • analyzes the one-way and two-way combinations of class variables

  • assigns user-defined formats to the class variables

  • uses only the preloaded range of user-defined formats as the levels of class variables

  • orders the results by the value of the formatted data.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=64; 

Create the LAYERFMT. and $FLVRFMT. formats. PROC FORMAT creates user-defined formats to categorize the number of cake layers and the cake flavors. NOTSORTED keeps $FLVRFMT unsorted to preserve the original order of the format values.

 proc format;     value layerfmt 1='single layer'                    2-3='multi-layer'                    .='unknown';     value $flvrfmt (notsorted)                    'Vanilla'='Vanilla'                    'Orange','Lemon'='Citrus'                    'Spice'='Spice'                    'Rum','Mint','Almond'='Other Flavor';  run; 

Generate the default statistics and specify the analysis options. FW= uses a field width of seven to display the statistics. COMPLETETYPES includes class levels with a frequency of zero. MISSING considers missing values valid values for all class variables. NONOBS suppresses the N Obs column. Because no specific analyses are requested , all default analyses are performed.

 proc means data=cake fw=7 completetypes missing nonobs; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Flavor and Layers. PRELOADFMT and EXCLUSIVE restrict the levels to the preloaded values of the user-defined formats. ORDER=DATA orders the levels of Flavor and Layer by formatted data values.

 class flavor layers/preloadfmt exclusive order=data; 

Specify which subgroups to analyze. The WAYS statement requests one-way and two-way combinations of class variables.

 ways 1 2; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the TasteScore variable.

 var TasteScore; 

Format the output. The FORMAT statement assigns user-defined formats to the Flavor and Layers variables for this analysis.

 format layers layerfmt. flavor $flvrfmt.; 

Specify the title.

 title 'Taste Score For Number of Layers and Cake Flavors';  run; 

Output

The one-way combination of class variables appears before the two-way combination. PROC MEANS reports only the level values that are listed in the preloaded range of user-defined formats even when the frequency of observations is zero (in this case, citrus). PROC MEANS rejects entire observations based on the exclusion of any single class value in a given observation. Therefore, when the number of layers is unknown, statistics are calculated for only one observation. The other observation is excluded because the flavor chocolate was not included in the preloaded user-defined format for Flavor.

The order of the levels is based on the order of the user-defined formats. PROC FORMAT automatically sorted the Layers format and did not sort the Flavor format.

 Taste Score For Number of Layers and Cake Flavors              1                                The MEANS Procedure                          Analysis Variable : TasteScore           Layers           N       Mean    Std Dev    Minimum     Maximum           ---------------------------------------------------------------          unknown          1     84.000          .     84.000      84.000           single layer     3     83.000      9.849     75.000      94.000           multi-layer      6     81.167      7.548     72.000      91.000           ---------------------------------------------------------------                         Analysis Variable : TasteScore           Flavor           N       Mean    Std Dev    Minimum     Maximum           ---------------------------------------------------------------          Vanilla          6     82.167      7.834     73.000      94.000           Citrus           0          .          .          .           .           Spice            3     85.000      5.292     81.000      91.000           Other Flavor     1     72.000          .     72.000      72.000           ---------------------------------------------------------------                         Analysis Variable : TasteScore  Flavor           Layers           N       Mean    Std Dev    Minimum    Maximum  ------------------------------------------------------------------------------- Vanilla          unknown          1     84.000          .     84.000     84.000                   single layer     3     83.000      9.849     75.000     94.000                   multi-layer      2     80.000      9.899     73.000     87.000  Citrus           unknown          0          .          .          .          .                   single layer     0          .          .          .          .                   multi-layer      0          .          .          .          .    Spice            unknown          0          .          .          .          .                   single layer     0          .          .          .          .                   multi-layer      3     85.000      5.292     81.000     91.000  Other Flavor     unknown          0          .          .          .          .                   single layer     0          .          .          .          .                   multi-layer      1     72.000          .     72.000     72.000  ------------------------------------------------------------------------------- 

Example 7: Computing a Confidence Limit for the Mean

Procedure features:

  • PROC MEANS statement options:

    • ALPHA=

    • FW=

    • MAXDEC=

  • CLASS statement

This example

  • specifies the field width and number of decimal places of the statistics

  • computes a two-sided 90 percent confidence limit for the mean values of MoneyRaised and HoursVolunteered for the three years of data.

If this data is representative of a larger population of volunteers, then the confidence limits provide ranges of likely values for the true population means.

Program

Create the CHARITY data set. CHARITY contains information about high-school students volunteer work for a charity. The variables give the name of the high school, the year of the fund-raiser, the first name of each student, the amount of money each student raised, and the number of hours each student volunteered. A DATA step on page 1392 creates this data set.

 data charity;     input School $ 1-7 Year 9-12 Name $ 14-20 MoneyRaised 22-26           HoursVolunteered 28-29;     datalines;  Monroe  1992 Allison 31.65 19  Monroe  1992 Barry   23.76 16  Monroe  1992 Candace 21.11  5       ...  more data lines  ...  Kennedy 1994 Sid     27.45 25  Kennedy 1994 Will    28.88 21  Kennedy 1994 Morty   34.44 25  ; 

Specify the analyses and the analysis options. FW= uses a field width of eight and MAXDEC= uses two decimal places to display the statistics. ALPHA=0.1 specifies a 90% confidence limit, and the CLM keyword requests two-sided confidence limits. MEAN and STD request the mean and the standard deviation, respectively.

 proc means data=charity fw=8 maxdec=2 alpha=0.1 clm mean std; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Year.

 class Year; 

Specify the analysis variables. The VAR statement specifies that PROC MEANS calculate statistics on the MoneyRaised and HoursVolunteered variables.

 var MoneyRaised HoursVolunteered; 

Specify the titles.

 title 'Confidence Limits for Fund Raising Statistics';     title2 '1992-94';  run; 

Output

PROC MEANS displays the lower and upper confidence limits for both variables for each year.

 Confidence Limits for Fund Raising Statistics                  1                                      1992-94                                The MEANS Procedure              N                       Lower 90%    Upper 90%       Year Obs  Variable           CL for Mean  CL for Mean     Mean    Std Dev  ------------------------------------------------------------------------------      1992  31  MoneyRaised              25.21        32.40     28.80     11.79                 HoursVolunteered         17.67        23.17     20.42      9.01       1993  32  MoneyRaised              25.17        31.58     28.37     10.69                 HoursVolunteered         15.86        20.02     17.94      6.94       1994  46  MoneyRaised              26.73        33.78     30.26     14.23                 HoursVolunteered         19.68        22.63     21.15      5.96  ------------------------------------------------------------------------------ 

Example 8: Computing Output Statistics

Procedure features:

  • PROC MEANS statement option:

    • NOPRINT

  • CLASS statement

  • OUTPUT statement options

    • statistic keywords

    • IDGROUP

    • LEVELS

    • WAYS

Other features:

  • PRINT procedure

Data set: GRADE on page 561

This example

  • suppresses the display of PROC MEANS output

  • stores the average final grade in a new variable

  • stores the name of the student with the best final exam scores in a new variable

  • stores the number of class variables are that are combined in the _WAY_ variable

  • displays the output data set.

  • stores the value of the class level in the _LEVEL_ variable

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the analysis options. NOPRINT suppresses the display of all PROC MEANS output.

 proc means data=Grade noprint; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Status and Year.

 class Status Year; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the FinalGrade variable.

 var FinalGrade; 

Specify the output data set options. The OUTPUT statement creates the SUMSTAT data set and writes the mean value for the final grade to the new variable AverageGrade. IDGROUP writes the name of the student with the top exam score to the variable BestScore and the observation number that contained the top score. WAYS and LEVELS write information on how the class variables are combined.

 output out=sumstat mean=AverageGrade            idgroup (max(score) obs out (name)=BestScore)            / ways levels;  run; 

Print the output data set WORK.SUMSTAT. The NOOBS option suppresses the observation numbers .

 proc print data=sumstat noobs;     title1 'Average Undergraduate and Graduate Course Grades';     title2 'For Two Years';  run; 

Output

The first observation contains the average course grade and the name of the student with the highest exam score over the two-year period. The next four observations contain values for each class variable value. The remaining four observations contain values for the Year and Status combination. The variables _WAY_, _TYPE_, and _LEVEL_ show how PROC MEANS created the class variable combinations. The variable _OBS_ contains the observation number in the GRADE data set that contained the highest exam score.

 Average Undergraduate and Graduate Course Grades                 1                                     For Two Years                                                         Average      Best  Status    Year     _WAY_   _TYPE_   _LEVEL_   _FREQ_    Grade      Score      _OBS_                       0        0        1        10     83.0000    Branford       2             97        1        1        1         6     83.6667    Jasper        10             98        1        1        2         4     82.0000    Branford       2    1                  1        2        1         6     82.5000    Branford       2    2                  1        2        2         4     83.7500    Abbott         1    1        97        2        3        1         3     79.3333    Jasper        10    1        98        2        3        2         3     85.6667    Branford       2    2        97        2        3        3         3     88.0000    Abbott         1    2        98        2        3        4         1     71.0000    Crandell       3 

Example 9: Computing Different Output Statistics for Several Variables

Procedure features:

  • PROC MEANS statement options:

    • DESCEND

    • NOPRINT

  • CLASS statement

  • OUTPUT statement options:

    • statistic keywords

Other features:

  • PRINT procedure

  • WHERE= data set option

Data set: GRADE on page 561

This example

  • suppresses the display of PROC MEANS output

  • stores the statistics for the class level and combinations of class variables that are specified by WHERE= in the output data set

  • orders observations in the output data set by descending _TYPE_ value

  • stores the mean exam scores and mean final grades without assigning new variables names

  • stores the median final grade in a new variable

  • displays the output data set.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the analysis options. NOPRINT suppresses the display of all PROC MEANS output. DESCEND orders the observations in the OUT= data set by descending _TYPE_ value.

 proc means data=Grade noprint descend; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of Status and Year.

 class Status Year; 

Specify the analysis variables. The VAR statement specifies that PROC MEANS calculate statistics on the Score and FinalGrade variables.

 var Score FinalGrade; 

Specify the output data set options. The OUTPUT statement writes the mean for Score and FinalGrade to variables of the same name. The median final grade is written to the variable MedianGrade. The WHERE= data set option restricts the observations in SUMDATA. One observation contains overall statistics (_TYPE_=0). The remainder must have a status of 1.

 output out=Sumdata (where=(status='1' or _type_=0))            mean= median(finalgrade)=MedianGrade;  run; 

Print the output data set WORK.SUMDATA.

 proc print data=Sumdata;     title 'Exam and Course Grades for Undergraduates Only';     title2 'and for All Students';  run; 

Output

The first three observations contain statistics for the class variable levels with a status of 1. The last observation contains the statistics for all the observations (no subgroup). Score contains the mean test score and FinalGrade contains the mean final grade.

 Exam and Course Grades for Undergraduates Only                  1                                 and for All Students                                                               Final      Median  Obs       Status     Year    _TYPE_    _FREQ_     Score      Grade       Grade    1         1         97        3         3      84.6667    79.3333       73    2         1         98        3         3      88.3333    85.6667       80    3         1                   2         6      86.5000    82.5000       80    4                             0        10      86.0000    83.0000       83 

Example 10: Computing Output Statistics with Missing Class Variable Values

Procedure features:

  • PROC MEANS statement options:

    • CHARTYPE

    • NOPRINT

    • NWAY

  • CLASS statement options:

    • ASCENDING

    • MISSING

    • ORDER=

  • OUTPUT statement

Other features:

  • PRINT procedure

Data set: CAKE on page 559

This example

  • suppresses the display of PROC MEANS output

  • considers missing values as valid level values for only one class variable

  • orders observations in the output data set by the ascending frequency for a single class variable

  • stores observations for only the highest _TYPE_ value

  • stores _TYPE_ as binary character values

  • stores the maximum taste score in a new variable

  • displays the output data set.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the analysis options. NWAY prints observations with the highest _TYPE_ value. NOPRINT suppresses the display of all PROC MEANS output.

 proc means data=cake nway noprint; 

Specify subgroups for the analysis. The CLASS statements separate the analysis by Flavor and Layers. ORDER=FREQ and ASCENDING order the levels of Flavor by ascending frequency. MISSING uses missing values of Layers as a valid class level value.

 class flavor /order=freq ascending;  class layers /missing; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the TasteScore variable.

 var TasteScore; 

Specify the output data set options. The OUTPUT statement creates the CAKESTAT data set and outputs the maximum value for the taste score to the new variable HighScore.

 output out=cakestat max=HighScore;  run; 

Print the output data set WORK.CAKESTAT.

 proc print data=cakestat;     title 'Maximum Taste Score for Flavor and Cake Layers';  run; 

Output

The CAKESTAT output data set contains only observations for the combination of both class variables, Flavor and Layers. Therefore, the value of _TYPE_ is 3 for all observations. The observations are ordered by ascending frequency of Flavor. The missing value in Layers is a valid value for this class variable. PROC MEANS excludes the observation with the missing flavor because it is an invalid value for Flavor.

 Maximum Taste Score for Flavor and Cake Layers             1                                                      High  Obs     Flavor       Layers   _TYPE_     _FREQ_    Score    1     Rum             2        3          1        72    2     Spice           2        3          2        83    3     Spice           3        3          1        91    4     Vanilla         .        3          1        84    5     Vanilla         1        3          3        94    6     Vanilla         2        3          2        87    7     Chocolate       .        3          1        84    8     Chocolate       1        3          5        85    9     Chocolate       2        3          3        92 

Example 11: Identifying an Extreme Value with the Output Statistics

Procedure features:

  • CLASS statement

  • OUTPUT statement options:

    • statistic keyword

    • MAXID

Other features:

  • PRINT procedure

Data set: CHARITY on page 574

This example

  • identifies the observations with maximum values for two variables

  • creates new variables for the maximum values

  • displays the output data set.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Specify the analyses. The statistic keywords specify the statistics and their order in the output. CHARTYPE writes the _TYPE_ values as binary characters in the output data set

 proc means data=Charity n mean range chartype; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by School and Year.

 class School Year; 

Specify the analysis variables. The VAR statement specifies that PROC MEANS calculate statistics on the MoneyRaised and HoursVolunteered variables.

 var MoneyRaised HoursVolunteered; 

Specify the output data set options. The OUTPUT statement writes the new variables, MostCash and MostTime, which contain the names of the students who collected the most money and volunteered the most time, respectively, to the PRIZE data set.

 output out=Prize maxid(MoneyRaised(name)         HoursVolunteered(name))= MostCash MostTime         max= ; 

Specify the title.

 title 'Summary of Volunteer Work by School and Year';  run; 

Print the WORK.PRIZE output data set.

 proc print data=Prize;     title 'Best Results: Most Money Raised and Most Hours Worked';  run; 

Output

The first page of output shows the output from PROC MEANS with the statistics for six class levels: one for Monroe High for the years 1992, 1993, and 1994; and one for Kennedy High for the same three years.

 Summary of Volunteer Work by School and Year                  1                               The MEANS Procedure                           N  School           Year  Obs  Variable            N          Mean          Range  ------------------------------------------------------------------------------ Kennedy          1992   15  MoneyRaised        15    29.0800000     39.7500000                              HoursVolunteered   15    22.1333333     30.0000000                   1993   20  MoneyRaised        20    28.5660000     23.5600000                              HoursVolunteered   20    19.2000000     20.0000000                   1994   18  MoneyRaised        18    31.5794444     65.4400000                              HoursVolunteered   18    24.2777778     15.0000000  Monroe           1992   16  MoneyRaised        16    28.5450000     48.2700000                              HoursVolunteered   16    18.8125000     38.0000000                   1993   12  MoneyRaised        12    28.0500000     52.4600000                              HoursVolunteered   12    15.8333333     21.0000000                   1994   28  MoneyRaised        28    29.4100000     73.5300000                              HoursVolunteered   28    19.1428571     26.0000000  ------------------------------------------------------------------------------ 

The output from PROC PRINT shows the maximum MoneyRaised and HoursVolunteered values and the names of the students who are responsible for them. The first observation contains the overall results, the next three contain the results by year, the next two contain the results by school, and the final six contain the results by School and Year.

 Best Results: Most Money Raised and Most Hours Worked             2                                       Most     Most      Money     Hours  Obs   School   Year  _TYPE_  _FREQ_  Cash     Time     Raised  Volunteered    1               .    00      109   Willard  Tonya     78.65       40    2            1992    01       31   Tonya    Tonya     55.16       40    3            1993    01       32   Cameron  Amy       65.44       31    4            1994    01       46   Willard  L.T.      78.65       33    5   Kennedy     .    10       53   Luther   Jay       72.22       35    6   Monroe      .    10       56   Willard  Tonya     78.65       40    7   Kennedy  1992    11       15   Thelma   Jay       52.63       35    8   Kennedy  1993    11       20   Bill     Amy       42.23       31    9   Kennedy  1994    11       18   Luther   Che-Min   72.22       33   10   Monroe   1992    11       16   Tonya    Tonya     55.16       40   11   Monroe   1993    11       12   Cameron  Myrtle    65.44       26   12   Monroe   1994    11       28   Willard  L.T.      78.65       33 

Example 12: Identifying the Top Three Extreme Values with the Output Statistics

Procedure features:

  • PROC MEANS statement option:

    • NOPRINT

  • CLASS statement

  • OUTPUT statement options:

    • statistic keywords

    • AUTOLABEL

    • AUTONAME

    • IDGROUP

  • TYPES statement

Other features:

  • FORMAT procedure

  • FORMAT statement

  • PRINT procedure

  • RENAME = data set option

Data set: CHARITY on page 574

This example

  • suppresses the display of PROC MEANS output

  • analyzes the data for the one-way combination of the class variables and across all observations

  • stores the total and average amount of money raised in new variables

  • stores in new variables the top three amounts of money raised, the names of the three students who raised the money, the years when it occurred, and the schools the students attended

  • automatically resolves conflicts in the variable names when names are assigned to the new variables in the output data set

  • appends the statistic name to the label of the variables in the output data set that contain statistics that were computed for the analysis variable.

  • assigns a format to the analysis variable so that the statistics that are computed from this variable inherit the attribute in the output data set

  • renames the _FREQ_ variable in the output data set

  • displays the output data set and its contents.

Program

Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.

 options nodate pageno=1 linesize=80 pagesize=60; 

Create the YRFMT. and $SCHFMT. formats. PROC FORMAT creates user-defined formats that assign the value of All to the missing levels of the class variables.

 proc format;     value yrFmt . = " All";     value $schFmt ' ' = "All   ";  run; 

Generate the default statistics and specify the analysis options. NOPRINT suppresses the display of all PROC MEANS output.

 proc means data=Charity noprint; 

Specify subgroups for the analysis. The CLASS statement separates the analysis by values of School and Year.

 class School Year; 

Specify which subgroups to analyze. The TYPES statement requests the analysis across all the observations and for each one-way combination of School and Year.

 types () school year; 

Specify the analysis variable. The VAR statement specifies that PROC MEANS calculate statistics on the MoneyRaised variable.

 var MoneyRaised; 

Specify the output data set options. The OUTPUT statement creates the TOP3LIST data set. RENAME= renames the _FREQ_ variable that contains frequency count for each class level. SUM= and MEAN= specify that the sum and mean of the analysis variable (MoneyRaised) are written to the output data set. IDGROUP writes 12 variables that contain the top three amounts of money raised and the three corresponding students, schools, and years. AUTOLABEL appends the analysis variable name to the label for the output variables that contain the sum and mean. AUTONAME resolves naming conflicts for these variables.

 output out=top3list(rename=(_freq_=NumberStudents))sum= mean=         idgroup( max(moneyraised) out[3] (moneyraised name           school year)=)/autolabel autoname; 

Format the output. The LABEL statement assigns a label to the analysis variable MoneyRaised. The FORMAT statement assigns user-defined formats to the Year and School variables and a SAS dollar format to the MoneyRaised variable.

 label MoneyRaised='Amount Raised';   format year yrfmt. school $schfmt.          moneyraised dollar8.2;  run; 

Print the output data set WORK.TOP3LIST.

 proc print data=top3list;     title1 'School Fund Raising Report';     title2 'Top Three Students';  run; 

Display information about the TOP3LIST data set. PROC DATASETS displays the contents of the TOP3LIST data set. NOLIST suppresses the directory listing for the WORK data library.

 proc datasets library=work nolist;     contents data=top3list;     title1 'Contents of the PROC MEANS Output Data Set';  run; 

Output

The output from PROC PRINT shows the top three values of MoneyRaised, the names of the students who raised these amounts, the schools the students attended, and the years when the money was raised. The first observation contains the overall results, the next three contain the results by year, and the final two contain the results by school. The missing class levels for School and Year are replaced with the value ALL .

The labels for the variables that contain statistics that were computed from MoneyRaised include the statistic name at the end of the label.

 School Fund Raising Report                         1                                Top Three Students                                      Money    Money                           Number   Raised_  Raised_  Money    Money    Money  Obs School  Year _TYPE_ Students      Sum     Mean Raised_1 Raised_2 Raised_3   1  All      All    0      109   92.75   .29   .65   .22   .44   2  All     1992    1       31    2.92   .80   .16   .76   .63   3  All     1993    1       32    7.92   .37   .44   .33   .23   4  All     1994    1       46   91.91   .26   .65   .22   .87   5  Kennedy  All    2       53   75.95   .73   .22   .63   .89   6  Monroe   All    2       56   16.80   .87   .65   .44   .87  Obs Name_1  Name_2  Name_3  School_1 School_2 School_3 Year_1 Year_2 Year_3   1  Willard Luther  Cameron Monroe   Kennedy  Monroe    1994   1994   1993   2  Tonya   Edward  Thelma  Monroe   Monroe   Kennedy   1992   1992   1992   3  Cameron Myrtle  Bill    Monroe   Monroe   Kennedy   1993   1993   1993   4  Willard Luther  L.T.    Monroe   Kennedy  Monroe    1994   1994   1994   5  Luther  Thelma  Jenny   Kennedy  Kennedy  Kennedy   1994   1992   1992   6  Willard Cameron L.T.    Monroe   Monroe   Monroe    1994   1993   1994 
 Contents of the PROC MEANS Output Data Set                   2                                The DATASETS Procedure  Data Set Name         WORK.TOP3LIST                     Observations           6  Member Type           DATA                              Variables              18  Engine                V9                                Indexes                0  Created               18:59 Thursday, March 14, 2002    Observation Length     144  Last Modified         18:59 Thursday, March 14, 2002    Deleted Observations   0  Protection                                              Compressed             NO  Data Set Type                                           Sorted                 NO  Label  Data Representation   WINDOWS  Encoding              wlatin1   Western (Windows)                          Engine/Host Dependent Information  Data Set Page Size           12288  Number of Data Set Pages     1  First Data Page              1  Max Obs per Page             85  Obs in First Data Page       6  Number of Data Set Repairs   0  File Name                    filename  Release Created              9.0000B0  Host Created                 WIN_PRO                      Alphabetic List of Variables and Attributes       #    Variable             Type    Len    Format       Label       7    MoneyRaised_1        Num       8    DOLLAR8.2    Amount Raised       8    MoneyRaised_2        Num       8    DOLLAR8.2    Amount Raised       9    MoneyRaised_3        Num       8    DOLLAR8.2    Amount Raised       6    MoneyRaised_Mean     Num       8    DOLLAR8.2    Amount Raised_Mean       5    MoneyRaised_Sum      Num       8    DOLLAR8.2    Amount Raised_Sum      10    Name_1               Char      7      11    Name_2               Char      7      12    Name_3               Char      7       4    NumberStudents       Num       8       1    School               Char      7    $SCHFMT.      13    School_1             Char      7    $SCHFMT.      14    School_2             Char      7    $SCHFMT.      15    School_3             Char      7    $SCHFMT.       2    Year                 Num       8    YRFMT.      16    Year_1               Num       8    YRFMT.      17    Year_2               Num       8    YRFMT.      18    Year_3               Num       8    YRFMT.       3    _TYPE_               Num       8 

See the TEMPLATE procedure in The Complete Guide to the SAS Output Delivery System for an example of how to create a custom table definition for this output data set.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net