Professional Documents
Culture Documents
1)
Either type or cut-and-past this SAS program into the SAS
program editor. Click on the "running person" icon at the
top menu of SAS, and the program should run successfully.
DATA CELLPHONE;
INPUT GENDER $ WEIGHT;
CARDS;
M 12
M 14
M 10
M 9
M 13
F 11
F 15
F 13
F 12
F 11
;
PROC SORT;
BY GENDER;
PROC MEANS MEAN N STD STDERR;
PROC UNIVARIATE;
PROC MEANS MEAN N STD STDERR;
BY GENDER;
RUN;
An explanation of the various SAS statements follows:
The first statement, the DATA CELLPHONE;
The keyword DATA is recognized by SAS and it is usually the
first statement in a data set. The word that follows is one
you make up that is informative to you. It probably
shouldn't be too long, nor should it have spaces in it, nor
should it be a SAS keyword (like don't say DATA DATA;).
Note that SAS statements are typically all followed by a
semicolon ";". The lines of data are not.
INPUT statement.
INPUT is a SAS keyword that tells SAS how your data are
organized. So in my data set I had two things that described
each individual in the data set. One was their gender (M or
F) the other was the weight of the cellphone in grams. I
PROC SORT;
BY GENDER;
It is normally a good idea to sort your data and indeed a
number of SAS procedures require that it is sorted in a
particular way.
So the two statements above tells SAS to sort the data by
gender (even though I'd already input in a sorted form, I
did this any way).
PROC MEANS MEAN N STD STDERR;
This is from procedure MEANS and tells SAS to find the mean,
sample size standard deviation (STD) and standard error of
the mean (STDERR) for the whole data set. I did it for
purpose of illustration. The procedure written in this way,
will not do separate estimations for males and females.
PROC UNIVARIATE; automatically does even more descriptive
statistics including the quartiles, it measure skewness and
kurtosis (two measures of departure from a normal
distribution), it will test for whether the data follow a
normal distribution (we'll consider this later in the
course). Again, it will do it for all the data in the way I
have set this up (but you could do it separately for each
gender using the BY statement as in the statement below).
PROC MEANS MEAN N STD STDERR;
BY GENDER;
In the statement above I ran PROC MEANS again, this time
calculating the means etc separately for each gender, which
is perhaps more useful for this data set.
RUN;
End the SAS program code with the word RUN;
You still have to press the little running man icon at the
top of the screen to run the program.
NOTE: SAS is notorious for lengthy output.
Avoid just printing out the output it gives you as you'll
consume several forests of paper. I normally edit the output
in word or somewhere and print out what I need. You will see
in the output window below, that I've also reduced the font
size to squeeze everything in.
DATA CELLPHONE;
INPUT GENDER $ WEIGHT;
CARDS;
Std Error
0.5773503
10
12
1.82574186
0
1470
15.2145155
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
10
120
3.33333333
-0.45
30
0.57735027
Location
Mean
12.00000
Median
12.00000
Mode
11.00000
Test
Student's t
Sign
Signed Rank
THIS FINAL PORTION IS FOR PROC MEANS WHERE THE ANALYSIS HAVE BEEN DONE SEPARATELY
FOR FEMALES (F) AND MALES (M)
Std Error
0.7483315
Std Error
0.9273618
GENDER=M
Mean
11.6000000
The SAS program plots the distribution and provides a table of frequencies and
cumulative frequencies.
DATA STUFF;
INPUT Y;
CARDS;
1
1
2
2
2
2
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
5
5
5
5
6
;
PROC FREQ;
TABLES Y / PLOTS=FREQPLOT;
RUN;
DATA STUFF;
INPUT Y;
CARDS;
;
PROC FREQ;
TABLES Y / PLOTS=FREQPLOT;
RUN;
Percent
Cumulative
Cumulative
Frequency
1
2
3
4
5
6
2
4
7
12
4
1
6.67
13.33
23.33
40.00
13.33
3.33
2
6
13
25
29
30
Percent
6.67
20.00
43.33
83.33
96.67
100.00
Plot a histogram of these data (do this by hand just to see why computers were
invented, and then using sas).
26.5
61.7
54.5
50.2
57
55.1
81.3
49.3
66.1
61.8
60.2
45.7
53.7
66.3
67.2
60.7
64.9
27.7
49.4
68.8
48
61.7
71.8
85.9
49.9
59.3
35.3
39.7
67.4
70.2
39.3
64.6
66.4
49.6
84.7
75.8
53.5
60.6
58
76.2
53.3
30.3
66.5
88.4
88.1
66.7
62.7
67.5
42.9
67.7
66.3
80.7
34.5
34.5
73.9
49.9
53.2
65.5
78.7
86.2
51.1
79.1
54.2
66.1
75.5
68.9
62
64.7
92.7
44.1
56.4
49.9
40.8
72.2
35.9
53
58.7
88.5
54.5
57.1
39.5
86.8
45.2
80.9
50.7
73
68.5
62.7
80
45.5
68.3
50.9
50.8
52.8
55.7
62.1
49.7
42.5
76.5
73.4
82
65.8
59.4
87.6
70.9
80.2
56.5
21.5
49.4
88.2
75.5
31.5
69.3
46.1
71.2
68.6
57.3
59.4
88.8
24.4
61.8
60.8
43.3
63.6
67
69.5
44.5
47.6
62.6
36.3
81.3
60.5
64
54.8
59.7
65.8
56
58.6
33
74.5
61.9
75.6
44.4
58.8
45.2
77.6
73.8
49.8
45.1
67.4
42.6
54.9
62.4
75.9
81.5
87.9
68.1
45.8
45.1
56.3
82.7
68.1
51.3
65.9
85.1
52.9
70.9
40
64.4
56.4
42.5
44.7
56.1
64.8
89.8
83
92.3
72.9
66.1
56.2
59.5
53.2
61.1
92.3
79.8
70.5
79.6
77.6
79.6
75.9
47.2
76.7
73.5
55
68.3
73.7
56
60.4
41
83.8
72
56.2
30.4
77.3
40.9
86.2
48.1
53.3
50.1
52
51.6
58
64.4
50.8
31.5
59.6
61.8
81.3
61.5
71.2
54.2
61.8
57.5
68.1
51
57.6
64.1
65.7
54.5
69.2
52
45.2
86.6
33.4
77.9
39
48.7
50.8
54.8
59.1
41
52.9
73.3
49.9
53.8
51.7
76.7
67.2
72.2
49.5
53.8
70.8
46.5
46
44
Plot a histogram of these data using sas. Since these are continuous
numerical data we use a different procedure to plot them.
You can set up the sas data set as before.
Note that if you want to put the data in exactly as above your input
statement should read as follows:
INPUT GRADES @@;
The "@@" simbols tells SAS that the data for GRADES are not in a single
column but rather follow one after the other. This form of input can be
used when you are input the values of just a single variable and you
don't want to have the data in a single large column of numbers.
Use proc univariate to plot the histogram.
Use the statements:
PROC UNIVARIATE;
HISTOGRAM / VSCALE = COUNT;
80.7
34.5
34.5
73.9
49.9
53.2
65.5
78.7
86.2
51.1
79.1
54.2
66.1
75.5
68.9
62
64.7
92.7
44.1
56.4
49.9
40.8
72.2
35.9
53
58.7
88.5
54.5
57.1
39.5
86.8
45.2
80.9
50.7
73
68.5
62.7
80
45.5
68.3
50.9
50.8
52.8
55.7
62.1
49.7
42.5
76.5
73.4
82
65.8
59.4
87.6
70.9
80.2
56.5
21.5
49.4
88.2
75.5
31.5
69.3
46.1
71.2
68.6
57.3
59.4
88.8
24.4
61.8
60.8
43.3
63.6
67
69.5
44.5
47.6
62.6
36.3
81.3
60.5
64
54.8
59.7
65.8
56
58.6
33
74.5
61.9
75.6
44.4
58.8
45.2
77.6
73.8
49.8
45.1
67.4
42.6
54.9
62.4
75.9
81.5
87.9
68.1
45.8
45.1
56.3
82.7
68.1
51.3
65.9
85.1
52.9
70.9
40
64.4
56.4
42.5
44.7
56.1
64.8
89.8
83
92.3
72.9
66.1
56.2
59.5
53.2
61.1
92.3
79.8
70.5
79.6
77.6
79.6
75.9
47.2
76.7
73.5
55
68.3
73.7
56
60.4
41
83.8
72
56.2
30.4
77.3
40.9
86.2
48.1
53.3
50.1
52
51.6
58
64.4
50.8
31.5
59.6
61.8
81.3
61.5
71.2
54.2
61.8
57.5
68.1
51
57.6
64.1
65.7
54.5
69.2
52
45.2
86.6
33.4
77.9
39
48.7
50.8
54.8
59.1
41
52.9
73.3
49.9
53.8
51.7
76.7
67.2
72.2
49.5
53.8
70.8
46.5
46
44
PROC UNIVARIATE;
HISTOGRAM / VSCALE = COUNT;
RUN;
255
60.8345098
14.8328875
-0.0364244
999597.28
24.3823573
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
255
15512.8
220.014552
-0.4133202
55883.6963
0.92887145
Location
Mean
60.83451
Median
60.60000
Mode
49.90000
Test
Student's t
Sign
Signed Rank
24.4
26.5
27.7
30.3
The SAS System
The UNIVARIATE Procedure
88
1
86
206
89.8
92.3
92.3
92.7
109
119
154
87
Below you will find the final letter grades for a course entitled
"Advanced technobabble".
Plot the distribution of these data by hand and using SAS. Using SAS you will need
to use proc FREQ as we did in an earlier example.
F
A
F
F
B+
F
D
D+
C+
A
A
D
A
D+
F
C+
B+
B
C
C+
F
A+
E
D+
F
D
E
B
F
F
D+
C
A+
D+
F
F
D+
E
A
D
F
A
D
B+
C+
C
F
A
D
B
F
D
D
D
D+
F
C
D
F
E
F
F
F
B+
B+
A
F
C+
A
F
C+
B
A
B
A
F
C
D+
D
D+
D+
A
F
D
C+
C
C
F
D
D+
C+
C+
C
C+
F
D
B
D
C
F
B
F
A
F
D
F
C
F
E
F
C+
B
E
F
C+
C+
D
A
F
B+
D+
C
C
F
B+
F
D+
F
C+
A+
A+
F
C+
C
C+
E
C+
C+
C+
E
B+
A
A+
A
C
A
F
B
A
D+
F
D
A+
B+
F
F
B
D
B
B
F
F
D+
C
F
A+
F
F
C
C
E
C+
C+
B
D
D
C
F
F
A
F
C
C+
D+
C
C+
D+
C
F
B+
C
B+
F
D
C
D
F
F
B+
F
B+
D
D
C+
E
F
F
F
D+
C
C
F
B
E
D+
A
F
F
D+
F
B+
A
A
B
D
D
D+
A
B
D
C+
A
D
B
E
C+
D+
E
D
D+
C+
A+
A
A+
B
C+
D+
C
D+
C
RUN;
7
2
3
3
3
4
6
6
3
13
Percent
14.00
4.00
6.00
6.00
6.00
8.00
12.00
12.00
6.00
26.00
Cumulative
Frequency
7
9
12
15
18
22
28
34
37
50
Cumulative
Percent
14.00
18.00
24.00
30.00
36.00
44.00
56.00
68.00
74.00
100.00
NOTE THIS IS ONE WAY (SOMEWHAT CLUMSY) OF GETTING THE CORRECT ORDER OF
GRADES ALONG THE X AXIS FROM PROBLEM SET 1
Note that I dont expect you to know how to do this for this course,
although you should know how to use an IF statement that well consider
later in the course.
Essentially what is done below is to create a new variable I called G,
which hss the value 1 when you have an F, 2 for E, and so one.
After the data I sort the it by G. I then indicate PROC FREQ ORDER=DATA
Which tells proc freq to plot things in the order they appear in the
now sorted data.
DATA TECHNO;
INPUT LETGRADE $ @@;
IF LETGRADE ='F' THEN G=1;
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
IF LETGRADE
CARDS;
F
F
A
D
F
D
F
D
B+
D+
F
F
D
C
D+
D
C+
F
A
E
A
F
D
F
A
F
D+
B+
F
B+
C+
A
B+
F
B
C+
C
A
C+
F
F
C+
A+
B
E
A
D+
B
F
A
D
F
E
C
B
D+
F
D
F
D+
D+
D+
C
A
A+
F
D+
D
F
C+
F
C
D+
C
E
F
A
D
D
D+
F
C+
A
C+
D
C
B+
C+
C+
F
C
D
F
B
F
D
A+
B+
F
F
B
D
B
B
F
F
D+
C
F
A+
F
F
C
C
E
C+
C+
B
D
D
C
F
F
A
F
C
C+
D+
C
C+
D+
C
F
B+
C
B+
F
D
C
D
F
B+
D
D
C+
E
F
F
F
D+
C
C
F
B
E
D+
A
F
F
D+
F
B+
A
A
B
D
D
D+
A
B
D
C+
A
D
B
E
C+
D+
E
D
D+
C+
A+
A
A+
B
C+
D+
A
D
B
D
C
F
B
A
D+
F
B+
F
C
D+
C
;
PROC SORT;
BY G;
PROC FREQ ORDER=DATA;
TABLES LETGRADE / PLOTS=FREQPLOT;
RUN;
Here are some additional SAS statements and examples that allow other kinds of
plots/graphs to be drawn.
Scatter plots:
Data humans;
input sex $ height weight;
cards;
M 150 150
M 160 170
M 180 175
M 190 200
F 140 100
F 145 110
F 150 115
F 155 120
;
proc sgplot;
scatter x=height y=weight / group=sex;
run;
80.7
34.5
34.5
73.9
49.9
53.2
65.5
78.7
86.2
51.1
79.1
54.2
66.1
75.5
68.9
62
64.7
92.7
59.4
87.6
70.9
80.2
56.5
21.5
49.4
88.2
75.5
31.5
69.3
46.1
71.2
68.6
57.3
59.4
88.8
24.4
75.9
81.5
87.9
68.1
45.8
45.1
56.3
82.7
68.1
51.3
65.9
85.1
52.9
70.9
40
64.4
56.4
42.5
40.9
86.2
48.1
53.3
50.1
52
51.6
58
64.4
50.8
31.5
59.6
61.8
81.3
61.5
71.2
54.2
61.8
49.4
68.8
48
61.7
71.8
85.9
49.9
59.3
35.3
39.7
67.4
70.2
39.3
64.6
66.4
49.6
84.7
75.8
53.5
60.6
58
76.2
53.3
30.3
66.5
88.4
88.1
66.7
62.7
67.5
42.9
67.7
66.3
;
PROC UNIVARIATE;
cdf grade;
RUN;
44.1
56.4
49.9
40.8
72.2
35.9
53
58.7
88.5
54.5
57.1
39.5
86.8
45.2
80.9
50.7
73
68.5
62.7
80
45.5
68.3
50.9
50.8
52.8
55.7
62.1
49.7
42.5
76.5
73.4
82
65.8
61.8
60.8
43.3
63.6
67
69.5
44.5
47.6
62.6
36.3
81.3
60.5
64
54.8
59.7
65.8
56
58.6
33
74.5
61.9
75.6
44.4
58.8
45.2
77.6
73.8
49.8
45.1
67.4
42.6
54.9
62.4
44.7
56.1
64.8
89.8
83
92.3
72.9
66.1
56.2
59.5
53.2
61.1
92.3
79.8
70.5
79.6
77.6
79.6
75.9
47.2
76.7
73.5
55
68.3
73.7
56
60.4
41
83.8
72
56.2
30.4
77.3
57.5
68.1
51
57.6
64.1
65.7
54.5
69.2
52
45.2
86.6
33.4
77.9
39
48.7
50.8
54.8
59.1
41
52.9
73.3
49.9
53.8
51.7
76.7
67.2
72.2
49.5
53.8
70.8
46.5
46
44
Note that you can plot the theoretical normal distribution as a curve on
top of your cumulative frequency distribution to see how closely (or
not) your data follow the theoretical normal distribution given the
particular mean and standard deviation of your data).
To do this use the following statements:
PROC UNIVARIATE;
cdf grade/normal;
RUN;
RUN;