You are on page 1of 16

LSB Chapter 2

Topic - Reading data into SAS

Raw Data

Read in Data
Process Data
(Create new variables)

Data Step

Output Data
(Create SAS Dataset)

Analyze Data Using Statistical


Procedures

PROCs

Raw Data Sources


You type it in the SAS program
Text file (e.g. .csv or .txt file)
Spreadsheet (Excel)
Database (Access, Oracle)
SAS dataset

Data in Text Files

Text files are simple character files that you can


create or view in a text editor like Notepad. They
may also be created as dumps from spreadsheet
files like excel.
Delimited data variables are separated by a

special character (e.g. a comma)


Fixed position data is organized into
columns

Data delimited with spaces:

C
D
A
A
C

84 138
89 150
78 116
. . 86
81 145

93 143
91 140
100 162
155
86 140

Note: Missing data is identified with a period.

Data delimited with commas

C,84,138,93,143
D,89,150,91,140
A,78,116,100,162
A,.,.,86,155
C,81,145,86,140

Note: Missing data is identified with a period.

Data delimited by commas


(.csv file)

C,84,138,93,143
D,89,150,91,140
A,78,116,100,162
A,,,86,155
C,81,145,86,140

Note: Missing data is identified by multiple

commas.

Column Data

C084138093143
D089150091140
A078116100162
A
086155
C081145086140
Note: Missing data values are blank.

INFILE and INPUT Statements


When you write a SAS program to read in raw data,
youll use two key statements:
The INFILE statement tells SAS where to find the

data and how it is organized.


The INPUT statement names the variables to bring
in and tells SAS how they are formatted.

Program 1
* List Directed Input: Reading data values
separated by spaces;
DATA bp;
INFILE DATALINES;
INPUT clinic $ dbp6 sbp6 dbpbl sbpbl;
DATALINES;
C 84 138 93 143
D 89 150 91 140
A 78 116 100 162
A . . 86 155
C 81 145 86 140
;
RUN ;
TITLE 'Data Separated by Spaces';
PROC PRINT DATA=bp;
RUN;
Obs

clinic

dbp6

1
2
3
4
5

C
D
A
A
C

84
89
78
.
81

sbp6
138
150
116
.
145

dbpbl

sbpbl

93
91
100
86
86

143
140
162
155
140

PARTIAL SASLOG
1
2
3

DATA bp;
INFILE DATALINES;
INPUT clinic $ dbp6 sbp6 dbpbl
sbpbl;
4
DATALINES;
NOTE: The data set WORK.BP has 5 observations
and 5 variables.
NOTE: DATA statement used:
real time
0.39 seconds
cpu time
0.03 seconds

* List Directed Input: Reading data values


separated by commas;
DATA bp;
INFILE DATALINES DLM = ',' ;
INPUT clinic $ dbp6 sbp6 dbpbl sbpbl;
DATALINES;
C,84,138,93,143
D,89,150,91,140
A,78,116,100,162
A,.,.,86,155
C,81,145,86,140
;
RUN ;
TITLE 'Data separated by a comma';
PROC PRINT DATA=bp;
RUN;

* List Directed Input: Reading .csv files


DATA bp;
INFILE DATALINES DLM = ',' DSD ;
INPUT clinic $ dbp6 sbp6 dbpbl sbpbl;
DATALINES;
C,84,138,93,143
D,89,150,91,140
A,78,116,100,162
Consecutive commas indicate
A,,,86,155
missing data
C,81,145,86,140
;
TITLE 'Reading in Data using the DSD Option';
PROC PRINT DATA=bp;
RUN;

* List Directed Input: Reading data values


separated by tabs (.txt files);
DATA bp;
INFILE DATALINES DLM = '09'x DSD;
INPUT clinic $ dbp6 sbp6 dbpbl sbpbl;
DATALINES;
C 84 138 93 143
D 89 150 91 140
A 78 116 100 162
A
86 155
C 81 145 86 140
;
TITLE 'Reading in Data separated by a tab';
PROC PRINT DATA=bp;
RUN;

* Column Input: Data in fixed columns.


DATA bp;
INFILE DATALINES ;
INPUT clinic $
1-1
dbp6
2-4
sbp6
5-7
dbpbl
8-10
sbpbl
11-13 ;
DATALINES;
C084138093143
D089150091140
A078116100162
A
086155 Note: missing data is blank
C081145086140
;
Title 'Reading in Data using Column Input';
PROC PRINT DATA=bp;

* Reading data using Pointers and Informats


DATA bp;
INFILE DATALINES ;
INPUT @1 clinic $1.
@2 dbp6
3.
Informats must end with a
@5 sbp6
3.
period.
@8 dbpbl
3.
@11 sbpbl
3. ;
DATALINES;
C084138093143
D089150091140
A078116100162
A
086155
C081145086140
;
Title 'Reading in Data using Point/Informats';
PROC PRINT DATA=bp;

You might also like