You are on page 1of 15

Business Econometrics

using SAS Tools (BEST)


Class III Data Management
101

Creating a Dataset from Existing


Data
*We use the SET command;
PROC IMPORT DATAFILE
='c:\SASData\stocks.xls' OUT = stocks
REPLACE;
run;
data stocks1;
*you can use 'c:\SASData\stocks1' instead, to
create a permanent file;
set stocks;
Run;

Creating Variables
*generating PE ratios for stocks;
PROC IMPORT DATAFILE
='c:\SASData\stocks.xls' OUT = stocks
REPLACE;
run;
data stocks1;
set stocks;
PE = Price1/EPS;
AvgPr = Mean(Price1, Price2);
run;

Creating Variables Conditionally : the


Notion of If-Then-Else
*generating PE ratios for stocks;
PROC IMPORT DATAFILE ='c:\SASData\stocks.xls'
OUT = stocks REPLACE;
run;
data stocks1;
set stocks;
PE = Price1/EPS;
AvgPr = Mean(Price1, Price2);
IF sector = "REIT" THEN Note = "DONT USE PE";
run;

More IFs
If Then, using AND (you can also use OR)
IF sector = "REIT" AND Price1 > Price2
THEN Note = 'REIT Price Up';
You can also have multiple THENs
IF sector = "FIN" AND Price1 < Price2
THEN Do;

Note1 = 'FIN stocks mean revert';

Note2 = 'This Fin Stock is Down';

END;

IF THEN ELSE
*generating PE ratios for stocks;
PROC IMPORT DATAFILE ='c:\SASData\stocks.xls'
OUT = stocks REPLACE;
run;
data stocks1;
set stocks;
PE = Price1/EPS;
AvgPr = Mean(Price1, Price2);
IF sector = "REIT" THEN Note = "DONT USE PE";

ELSE Note = "PE OK";


run;

Dates
A SAS date is a numeric value equal
to the number of days since January
1, 1960. The table below lists four
dates and their values as SAS dates:
Date

SAS date value

01-Jan-59

-365

01-Jan-60

01-Jan-61

366

01-Jan-08

17532

More Dates!
Informats There are a variety, but the ANYDTDTE9.
is usually smart enough to understand all date entries
INPUT BirthDate ANYDTDTE9.;
You can make sure that SAS knows what the minimum
date can be:
OPTIONS YEARCUTOFF = 1995;
Here you are making sure that no date can be
entered which is less than 1995 (or no one below
age 18 can enter the site)
There are also functions that are useful. This code
calculates age:
AGE = INT (YRDIF (BirthDate, TODAY(), 'ACTUAL') );
The YRDIF function, with the ACTUAL argument to
calculate age

Setting Datasets
Use a case and data from accounts (Source: LSB)
How much are you making in a train ride that you
sell, given that adults, seniors and children are
charged differently?

* Create permanent SAS data set trains;


DATA 'c:\SASData\trains';
INFILE 'c:\SASData\Train.dat';
INPUT Time TIME5. Cars People;
RUN;

Combining Datasets
Data for two entrances of a train station

DATA southentrance;
INFILE 'c:\SASData\South.dat';
INPUT Entrance $ PassNumber PartySize Age;
RUN;
DATA northentrance;
INFILE 'c:\SASData\North.dat';
INPUT Entrance $ PassNumber PartySize Age Lot;
RUN;

Combining Datasetscontd
* Create a data set combining northentrance and southentrance;
* Create a variable, AmountPaid, based on value of variable Age;
* Rules fare is Rs. 35, free for children less than 3 and Rs. 27 for
senior citizens, over age 65;

DATA both;
SET southentrance northentrance;
IF Age = . THEN AmountPaid = .;
ELSE IF Age < 3 THEN AmountPaid = 0;
ELSE IF Age < 65 THEN AmountPaid = 35;
ELSE AmountPaid = 27;
PROC PRINT DATA = both;
TITLE 'Both Entrances';
RUN;

Combiningwhen the data is sorted


and we want the sorting to remain

*Initially need to sort the smaller datasets by the important


variable;
*The South data is sorted, we need to only worry about the
North data;
DATA northentrance;
INFILE 'c:\SASData\North.dat';
INPUT Entrance $ PassNumber PartySize Age Lot;
PROC SORT DATA = northentrance;
BY PassNumber;
RUN;
* Interleave observations by PassNumber;
DATA interleave;
SET northentrance southentrance;
BY PassNumber;
RUN;

Merging Datasets
Usually, not every data is in the same file.
Example multiple data sources for the same economy.
You might get your GDP data from Datastream and the
Primary Market Index data from Bloomberg.
To analyze, need to merge these datasets

Syntax
DATA new-data-set;
MERGE data-set-1 data-set-n;
BY variable-list;

If you merge two data sets, and they have variables with
the same namesbesides the BY variables variables
from the second data set will overwrite any variables
having the same name in the first data set.

One-to-One

*code to merge data sets in SAS;


PROC IMPORT DATAFILE ='c:\SASData\stocks.xls' OUT = stocks REPLACE;
PROC SORT DATA = stocks;
BY Stock;
RUN;
PROC IMPORT DATAFILE ='c:\SASData\stockdata.xls' OUT = stockdata REPLACE;
PROC SORT DATA = stockdata;
BY Stock;
RUN;
* Merge data sets by Stock;
DATA allstocks;
MERGE stocks stockdata;
BY Stock;
PROC PRINT DATA = allstocks;
TITLE Complete Stocks Data;
RUN;

One-to-Many/Many-to-One
Same syntax
Ex: If we want to add another file
that has only 2 columns Stocks and
Note
The file has data on only a subset of
the stocks here we use the same
merge command
Ex: Another file contains data on
Sectors, but not on individual stocks
Use Merge, but BY Sector, not BY Stocks

You might also like