You are on page 1of 141

HISTORY OF SAS

SAS, (pronounced "sass") once stood for "Statistical Analysis System,"


But now it is only SAS.

SAS began at North Carolina State University as a project to analyze agricultural


research. Founded in 1976 to help all sort of customers.

SAS is both software and company.

The world biggest private sector company.

SAS giving operations in various sectors like,

Automotive
Communications
Education
Banking/Financial Services
Government
Health Insurance
Health Care Providers
Hospitality & Entertainment
Insurance
Life Sciences
Manufacturing
Media
Oil & Gas
Retail
Hotels
Utilities
And giving solution lines as
Analytics
Business Intelligence
Customer Intelligence
Data Integration & ETL
Financial Intelligence
Foundation Tools
Fraud Management
Governance, Risk & Compliance
High-Performance Computing
Human Capital Intelligence
IT Management
On Demand Solutions
Performance Management
Risk Management
Supply Chain Intelligence
Sustainability Management

In 1966, there was no SAS.

But there was a need for a computerized statistics program to analyze vast amounts of
agricultural data collected through

United States Department of Agriculture (USDA) grants.

Then research started by University Statisticians Southern Experiment Stations, Eight


land-grant universities that received the majority of their research funding from the
USDA. And some schools came together under a grant from the National Institutes of
Health (NIH) to develop a general-purpose statistical software package to analyze all the
agricultural data they were generating.

North Carolina State University, located in the capital city of Raleigh, North Carolina
became the leader in the consortium.

North Carolina State University faculty members

Jim Goodnight and Jim Barr

Emerged as the project leaders


Barr creating the architecture and
Goodnight implementing the features

In 1972 NIH stopped to give funds to this team, then the consortium agreed to chip in
$5,000 apiece each year to allow NCSU to continue developing and maintaining the
system and supporting their statistical analysis needs.
During the coming years, SAS software was licensed by pharmaceutical companies,
insurance companies and banks, as well as by the academic community that had given
birth to the project.
Jane Helwig, another Statistics Department employee at NCSU, Joined the project
consortium as documentation writer
John Sall, a graduate student and programmer, rounded out the core team

Incorporation

In 1976 Goodnight, Barr, Helwig and Sall left NCSU and formed

SAS Institute Inc. - a private company "devoted to the maintenance and further
development of SAS." They opened offices in a building #2806 Hillsborough Street,
across from the university.

By 1980, the growing company building capacity is not sufficient in Hillsborough Street
building, and then it's moved to the site of its present headquarters offices just outside
Raleigh in Cary, North Carolina. In that time employes were 20.

In this time SAS was growing, the entire computer hardware and software industry was
changing, with new operating systems and platforms placing new demands on software
developers one of the first steps for SAS was to adapt the software to operate on IBM's
Disk Operating System (DOS).

Now it is working on different operating systems like windows, Dos, Z/OS, UNIX and
various UNIX flavors.

In 1990 SAS Company grow with employ force of 7000.

SAS celebrated its 25th anniversary in 2001,

Its turn out from various difficulties along with the millennium and the Y2K frenzy. And
they created new logo and tagline presently which we are seeing Tagline is

THE POWER TO KNOW

SAS has been named one of FORTUNE magazine's "100 Best Companies to Work For"
every year since 1998 and no1 in 2010

SAS named the best company to work for in 2010 by FORTUNE.

SAS is Multi Vendor Architecture

Multi Vendor Architecture is the foundation ofthe cross-platform portability and


interoperability of the SAS Systemmeans it allows programs to be written once and run
anywhere, regardless of hardware or operating system. This architecture provides
customers with hardware independence and a flexibleimplementation.
SAS is Multi Database Architecture

SAS can connect to any kind of data source to read the data, thats why SAS is Multi
Database Architecture.

Data sources are databases (like Oracle, SQL Server, DB2, Sybase, Terradata, Informix
and MS-Access etc)

Or Files (like Excel, CSV, and Notepads etc)

ORACL SQL
E SERVER
NOTEP DB2
ADS
CSV DAT TERAD
ATA
EXCEL A
SYBAS
MS- INFOR E
ACCESS MIX

The purpose of SAS

SAS is Flexible and extensible fourth-generation programming language


designed for data access, transformation and reporting

-> Data access and Data Management


-> User Interfaces
-> Application Development
-> Business Solutions
-> Analytics
-> Report & Graphics
SAS-Functionality

The functionality of the SAS System is built around the four data-driven tasks.
1. Data access
2. Data management
3. Data analysis
4. Data presentation.
Data access: addresses the data required by the application.
It means read raw data from source to SAS application.

Topics cover Infile Statement, Proc import, sql pass thru, Libname, Proc access, DB Load
procedure

Data management: shapes data into a form required by the application.

Topics cover Set, Merge, Format, Informat, Update etc statements and Functions

Data analysis: Analyze data by using various procedures to find sum, means and
various statistical calculations.
Or transforms raw data into meaningful and useful information.
Topics cover statistical procedures to find out Sum, Means, Frequency, Univeriate Anova,
chi square, CMH, GLN, Regression, Correlation, STD etc. and reporting procedures like
Proc print, Report, Tabulate and _Null_ Report.

Data presentation: how you are going to present the output to end user.

Topics covers ODS and mainly work delivery concepts

Turning Data into Information

REPORT/
RAW DATA DATA STEP SAS DATASET INFORMAT
PROC STEP
ION

Example:-

Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27 45000
003 MNO F 21 70000
004 JKL F 23 44000
005 XYZ M 25 58000
;
Run;
Data ds2 (drop=age);
Set ds1;
Format sal comma6. ;
Run;

Proc sortdata=ds2;
By sex;
Run;

ODS pdf file="E:\sas\sas_class\target\employee.pdf";

Proc print data=ds2;


Var id name sex sal;
By sex;
Sumby sex;
Sum sal;
Run;

ODSpdfclose;
Rules in SAS program

1) Every statement should end with semicolon

2) Every step (both data step & proc step) end with Run statement

3) SAS Program is not case sensitive

4) Single statement can write in multiple lines or


Multiple statements can write in single line

SAS Names & Name rules:


Names must be Less than or equal 32 character length
SAS Names must start with alphabetic or under square (_) but not
any special character
SAS Names should contain alphabets, numbers and under square (_) but not
any special character
SAS Names can contain upper and lower case letters but data is case sensitive.

Data types:

Character Data type:


Variable values contain letters or special characters.

Numeric Data type:


Variable values contain Numbers and Dates.

Terminology:
Tables are called datasets
Columns are called variables
Rows are called observations

VARIABLES

COLUMNS

ID NAME SEX AGE SAL


1 ABC M 23 50000
R
2 DEF F 27 45000
O OBSERVATIONS
W
S
3 MNO F 21 70000
4 XYZ M 25 58000

TABLES

DATASETS

Size of SAS Dataset:


Prior 9.1 versions you can take maximum 32,767 columns now you can specify as many
as up to your CPU memory.

Missing Data:

The values of particular variables may be missing for some observations in that case

Missing character data represented by blanks and

Missing numeric data represented by period (.)

Example:-

ID NAME SEX AGE SAL


1 ABC M 23 50000
2 F 27 .
3 MNO F . 70000 PERIOD
4 XYZ 25 58000

BLANK

We can invoke SAS in following ways

-> Interactive windowing mode (SAS windowing environment)


-> Interactive menu-driven mode (SAS Enterprise Guide,
SAS/ASSIST, SAS/AF, or SAS/EIS software etc)
-> Batch mode
-> Noninteractive mode.

SAS Windowing Environment (Interactive windowing mode)


OUTPUT

LOG

EDITOR

EXPLORER

RESULTS

In SAS windowing environment there are 5 basic Windows


Those are (1) Editor (2) Log (3) Output(4) Explorer(5) Results

Editor Window:-

-> To write the program

-> To modify the program


-> To submit the program for execution

The default editor is the Enhanced Editor. The Enhanced Editor is syntax sensitive and
color codes your programs making it easier to read them and find mistakes.The
Enhanced Editor also allows you to collapse and expand the various steps in your
program. For other operating environments, the default editor is the
Program Editor.

Log Window:-
Log window contains information of program which submitted in Editor Window.
Generally we can get here Notes, Warnings and Errors.
How many observations are there and how many variables are there in which library
datasets are storing.

Output Window
If your program generates any printable results, then it will appear in the Output
window.

Explorer Window
The Explorer window gives you easy access to your SAS librariesand files.

Results Window
The Results window is like a table of contents for your Output window. The results tree
lists each part of your results in an outline form.

Command Bar Tool Bar Full down Menus (Menu Bar)

Command Bar:-

The command bar is a place that you can type in SAS commands

Most of the commands that you can type in the command bar are also accessible
through the pull-down menus or the toolbar.

Example:-
X Notepad
X Time
X Date
X SQL etc
Include 'sample.sas'
Tool Bar:-

Gives you quick access to commands that are already accessible through the pull-down
menus.

(New)-
To open New Window

(Open)
To open the program which save in server/pc location

(Save)
To save the program or Log or Output windows information in
Server location or pc location.

(Print)
To produce print of program or Log or Output windows info.

(Print Preview)
Before giving the print we can check the preview of info

(Cut)
To cut the part of program lines in Editor Window
The same can do thru key board, using Ctrl X

(Copy)-
To select the part of program lines
The same can do thru key board, using Ctrl C

(Paste) -
To Paste the part of program lines
The same can do thru key board, using Ctrl V

(Undo)
To get back the part of program lines those cuts.
The same can do thru key board, using Ctrl Z

(New Library) -
To create a new library for storing datasets
Click on this icon,
Specify new library name,
Specify Engine as default,
Click enable at startup,
And browse the location where datasets should store,
And click OK.

(SAS Explorer)-
To open SAS Explorer Window.

(Submit)-
To submit the program for execution
This we can do in multiple ways
-> Click on this icon to execute entire SAS Session
-> Select some part of program lines and click on this icon only
Selected program lines submit for execution
-> Select some part of program lines and right click on select
Program lines and click Submit selection for execute selected
lines or click Submit All for execute entire SAS Session.
-> In fill down Menus clicks Run then click submits.
-> Use F3 from keyboard.

(Clear All)-
To clean only Editor Window.
Other ways to clean windows

To clean Editor Window


-> Click on above icon
-> Or right click on anywhere in Editor Window and click Clear All
-> Or in full down Menu bar click Edit then Clear All
-> Or execute below program
DM'EDITOR' CLEAR;
To clean Log Window
-> Or right click on anywhere in Log Window then click on Edit
And click Clear All
-> Or in full down Menu bar click Edit then Clear All
-> Or execute below program
DM'LOG' CLEAR;
To clean Output Window
-> Or right click on anywhere in output Window then click on
Edit and click Clear All
-> Or in full down Menu bar click Edit then Clear All
-> Or execute below program
DM'OUTPUT' CLEAR;
(Break)-
To stop the execution program lines.
Click on this icon and select Cancel submit statements to stop
The execution
Or select Terminate SAS System to close the session.

(Help)-
To get the documents and sample programs which help to learn.

Menu Bar:-

In Menu bar located at top of the window contains some full down menus those are

File-

-> To open new Editor Window


-> To open existing program
-> To save program
-> For print preview and
-> For Importing and Exporting data

Edit:-
-> For undo, redo, cut, copy, paste, clear all, select all,
Collapse all, expand all, find and replace

View:-
-> For getting back whichever is closed window like Enhanced
Editor, Program Editor, Log, Explorer and Output Windows

Tools:-
-> For create new library, change font type, font size
And enable to create listing output and html output.
Run:-
-> For submitting SAS Program and getting back last
Submitted program.
Solutions:-
For analysis, Reporting

Window:-
For checking what are the windows are opened

Help:-
To get the help from SAS documenting
SHORT CUT KEYS

HelpF1
ExecuteF3
Recall F4
Log F6
Output F7
Zoom off F8
Short cut keys F9
Underlines First letter of Menus in Menu bar F10
Command FocusF11
Sub top Shift F1
Horizontal zoomShift F3
Vertical zoomShift F4
Zoom one on another Shift F5
Left Shift F7
Right Shift F8
Wpopup (Bring up word tip) Shift F10
Hide the current word tip ESC
Libname Ctrl B
Copy Ctrl C
Directory Ctrl D
Clear Ctrl E
Find Ctrl F
Moves line no Ctrl G
Replace Ctrl H
SAS System Options Ctrl I
Log Ctrl L
File name Ctrl Q
RFind Ctrl R
Title Ctrl T
Paste Ctrl V
Cut Ctrl X
Redo Ctrl Y
Undo Ctrl Z
Open Explorer Ctrl W
Execute the last recorded macro Ctrl F1
Move cursor to next case change ALT Right
Move cursor to previous case change ALT Left
Commenting Shift /
Uncommenting Shift Ctrl /
Convert the selected text to lowercaseCtrl Shift L
Convert the selected text to uppercaseCtrl Shift U

Note: Click F9 from your Keyboard to get all the short cut keys into Log.

SAS Program

A SAS program is a sequence of statements in executed order.


SAS program having 2 steps
Data step
Proc step
DATA steps are typically used to retrieve the data and create SAS data sets.
PROC steps are typically used to process SAS data sets
(That is, generate reports and graphs, edit data, and sort data).

REPORT/
RAW DATA DATA STEP SAS DATASET INFORMAT
PROC STEP
ION

Example:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000 DATA STEP
002 DEF F 27 45000
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;
Proc print data=ds1; PROC STEP
Run;

Ways we can read data in to SAS

Instream Data we can enter in SAS Program itself followed by DATALINES


statement

Example:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27 45000
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;

Can read External files (plot files) to SAS

Examples:-
Data ds;
Infile"C:\Documents and Settings\Administrator\Desktop\SAMPLE.txt";
Input id name$ sex$ age sal;
Run;

Proc import datafile="C:\Documents and Settings\


Administrator\Desktop\SAMPLE.csv";
Out=work.ds dbms=csv replace;
Run;

Proc import datafile="C:\Documents and Settings\


Administrator\Desktop\SAMPLE.xls";
Out=work.ds dbms=excel replace;
Run;
Can read from existing SAS Datasets by using SET, MERGE statements.

Example:-
Data ds2;
Set ds;
Run;

Can read from different databases to sas like Oracle DB2 Sybase etc
Example:-
Procsql;
Connectto oracle (user=Scott password=tiger);
Createtable ds3 as
Select * from connection to oracle
(
Select * from EMP
);
Disconnectfrom oracle;
Quit;

How the SAS System works with data

When starts a SAS session with any mode, there is a work library this is just temporary
created directory (default library)where datasets are stored for SAS session.
All the datasets are created in SAS session will be referred as Work. Prefix
Once close the session will be lost all the datasets from work Library,
If we want keep the datasets permanently need to create own library and keepthe
datasets permanently

How to create Library:

(Programming method)

LIBNAME Statement

Associates a Libref with a SAS library and lists file attributes for a SAS library.
Syntax: - LIBNAME Libref 'SAS-library';
LIBNAME MY_SAS E:\SAS_CLAS;(Library name not more than 8 char)

(Or)

(GUI method)
In menu bar click on New Library icon and
Specify library name, Click enable at start up
And browse the location where you are going to create datasets as backup
And click ok.
Specify Library Name (MY_SAS) Select Engine as

Default, select Enable at start up and browse the location where you can store datasets
permanently.
Click OK to Create a Library

If already any datasets are there in that location it will be reflect into library

DATA STEP PROCESS

When the data step is submit for execution, it first under goes a syntax check by the
SAS system ,if no errors are found the data step is then complied and executed .When
executing the data step for in stream data, the SAS system creates the following three
items.
INPUT BUFFER:-
Each raw record of data is read into an area of memory when an input statement is
executed.
PROGRAM DATA VECTOR:-
The SAS system builds the data set one observation at a time in this area of
memory as the program is executed; values are read from the input buffer or created by
programming statements and assigned to corresponding variables in the PDV. The
written to a SAS data set as a single observation.

In PDV along with all variables there are 2 automatic variables those are
_ N _ and _ ERROR _
_ N_: indicates how many times the data step has iterated.
By default _ n _ =1 When iterations done its increase +1 Using we can find out how
many observations are there in dataset.
_ Error _: default value =0 when error encounter it gives _ Error _ =1
If 100 of errors also _ Error _ =1 only
_ Error_ =1 is logical error its not a syntax error. For Syntax error you wont get
_error _=value. Syntax errors can see in the log with red color and where ever error is
there it shows red color underline
DESCRIPTOR INFORMATION:-
On each SAS data set, SAS creates and maintains information about data set and
variable attributes like Length, Label, Format, and Informat and data type. To see this
information use Proc contents procedure.
ProccontentsData=Dataset_Name;
Run;
Example:-
Data ds;
Infile datalines;
Input id name age sex$ sal;
Datalines;
001 abc 23 m 5000
002 def 25 f 5600
003 mno 28 f 8000
004 xyz 21 m 6000
;
Run;

(Run above program and see the log for _n_ and _error_ values)

DATA STEP

Data step always starts with key word of data


A data step consists of group of statements in SAS language that can read raw
data or to create SAS data set.
Purpose OF Data step:
Checking for errors, validating and correcting SAS code.
Create new variables and compute their values
Create new data sets from existing SAS datasets
Manipulating and reshaping data
Generating and printing reports
Retrieving information
General group of Statements are Underlined in below Program

DATA DATASET;
INFILE DATALINES;
INPUT ID NAME$ AGE SEX$ SALARY;
DATALINES;
001 ABC 23 M 23000
002 DEF 25 F 25000
003 XYZ 22 M 21000
;
RUN;

DATA STATEMENT
Begins a DATA step and provides names for output SAS data sets
Options
KEEP:-
Specifies variables for processing or for writing to output SAS data sets
Syntax: - KEEP=variable(s)
Examples: -
Data ds1 (keep=name sal);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds1a (keep=name sal);
Set ds;
Run;
Data ds1b (keep=name sal);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;
DROP: -
Excludes variables from output SAS data sets
Syntax: DROP variable(s)
Examples: -
Data ds2 (Drop=name sal);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds2a (drop=name sal);
Set ds;
Run;
Data ds2b (drop=name sal);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;

RENAME: -
Specifies new names for variables in output SAS data sets
Syntax: -
RENAME= (old-name-1=new-name-1 . . . <old-name-N=new-nameN>);
Examples: -

Data ds3 (rename= (sex=gender sal=income));


Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds3a (rename= (sex=gender sal=income));
Set ds;
Run;
Data ds3b (rename= (sex=gender sal=income));
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;

Rename changes variable name permanently

WHERE: -

Selects observations from SAS data sets that meet a particular condition

Syntax: - where= (Arguments)

Examples: -

Data ds4 (where= (sex='m'));


Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds4a (where= (age>=23));
Set ds;
Run;
Data ds4b (where= (age=23 and sex='m'));
Set ds;
Run;
Data ds4c (where= (id in (001,004)));
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;

REPLACE: -

Controls replacement of like-named temporary or permanent SAS data sets

When we are creating dataset with any name that dataset already is exist in our SAS
library, by default it will replace on first dataset when second data step executes but
dont want to replace use replace=No

Default replace=Yes.

Syntax: - REPLACE=NO | YES

Examples: -

Data ds5 (where= (sex='m'));


Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds5 (replace=no);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;

PW (password): -
To assign the password to data set.
Syntax: pw=password
Examples: -
Data ds6 (pw=sasadmin);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds6b (pw=sasadmin);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;

Password should max up to 8 characters length it should contains Alphabetic and


numbers but not any special characters and numbers also not first.

Label: -
To assign the label to data set.
Syntax: Label=Name
Examples: -
Data ds7 (Label=sample);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;

Creating Multiple Data sets in single datastep


We can create multiple SAS datasets from one datastep and can use data
statement options like below
Syntax: dataset1 dataset 2 datasetN;
Examples: -
Data ds16 DS17;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds18(KEEP=ID NAME SAL) ds19(WHERE=(SEX='f'));
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds20 ds21;
Infile datalines;
Input id name$ sex$ age sal;
If sex='f'thenoutput ds11;
elseoutput ds12;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;

_Null_
The data set name _null_ is reserved for a special purpose here no data set will be
createbut the programming step executes.

Example:-

DATA _NULL_;
INFILE DATALINES;
INPUT ID NAME$ SEX$ AGE SAL;
DATALINES;
001 ABC M 20 2500
002 DEF M 22 3000
003 XYZ F 21 5000
;
RUN;

Use Put statement to see the variable information in log

Put Statement:-
It will write information in SAS log.
Syntax: Put Variable(S)
DATA_NULL_;
INFILE DATALINES;
INPUT ID NAME$ SEX$ AGE SAL;
PUT ID NAME$ SEX$ AGE SAL;
DATALINES;
001 ABC M 20 2500
002 DEF M 22 3000
003 XYZ F 21 5000
;
RUN;

We can create reports in RTF format using _null_


See below example for null reporting

Data Health;
Infile Datalines;
Input idno 1-4 name $ 6-24 team $ strtwght endwght;
Loss=strtwght-endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
1095 Susan Stewart blue 135 127
1157 Rose Collins green 155 141
1331 Jason Schock blue 187 172
1067 Kanoko Nagasaka green 135 122
1251 Richard Rose blue 181 166
1192 Charlene Armstrong yellow 152 139
1352 Bette Long green 156 137
1262 Yao Chen blue 196 180
1124 Adrienne Fink green 156 142
1197 Lynne Overby red 138 125
1133 John VanMeter blue 180 167
1057 Margie Vanhoy yellow 146 132
1328 Hisashi Ito red 155 142
1243 Deanna Hicks blue 134 122
1177 Holly Choate red 141 130
1259 Raoul Sanchez green 189 172
1017 Jennifer Brooks blue 138 127
1099 Asha Garg yellow 148 132
;
Run;
Data_null_;
d=today ();
t=time ();
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200;
Put' ';
Put' ';
Put @2 'REQUEST # I11796-S'
@35'NEERU TECHNOLOGIES'
@70"PAGE 1 ";
Put @2'RUN DATE:' d ddmmyys10.
@31'INFORMATION CENTER REQUEST'
@70'RUNTIME:' t time8.
Put' ';
Put' ';
Run;
Data_null_;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put @2'EMP_ID' @10'EMP_NAME' @30'TEAM'@42'STRTWGHT' @55'ENDWGHT';
PUT' ';
Run;

Data_null_;
Set My_SAS.wghtclub;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put @2 idno @10 name @30 team
@42 strtwght @55 endwght;
Run;
Data_null_;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put;
Put;
Put;
Put @2'*************END OF REPORT *************************';
Put @2'**********GENERATED BY Mr.KRISHNA *******************';
Run;

In above example any how we are sending information into rtf file so its waste to create
datasets in library its waste of space
Using _Null_ we can save space in workspace server

You can get deep information about this in future classes.


INFILE STATEMENT

Infile statement identifies an external file or raw data


Identifies an external file to read with an INPUT statement
Infile statement can also use a file reference using the file name statement.
Syntax:-
INFILE file-specification<options><operating-environment-options>;
Infile Statement Options: -

DSD (delimiter sensitive data):-

->It reads delimeter is a comma


->It reads missing values with delimeter is a comma
->It removes double quotation marks from character values.
Examples:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds1a;
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;
In above case default delimiter is space so you no need to use any infile options
But see below example there is a comma delimiter
Data ds2;
Infile datalines dsd;
Input id name$ sex$ age sal;
Datalines;
001,abc,m,23,45000
002,def,f,34,67000
003,mno,m,21,36000
004,xyz,f,27,45000
;
Run;
Data ds2b;
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt" dsd;
Input id name$ sex$ age sal;
Run;
Data ds2c;
Infile datalines dsd;
Input id name$ sex$ age sal;
Datalines;
001,abc,m,23,
002,def,f,34,67000
003,mno,m,21,36000
004,xyz,f,27,45000
;
Run;
Data ds2d;
Infile datalines dsd;
Input id name$ sex$ age sal;
Datalines;
001,"abc",m,23,
002,"def",f,34,67000
003,"mno",m,21,36000
004,"xyz",f,27,45000
;
Run;
Data ds2e;
Infile datalines dsd;
Input Name: $9. Score Team: $25. Div $;
Datalines;
Joseph,76,"Red Racers, Washington",AAA
Mitchel,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,74,"Green Gazelles, Atlanta",AA
;
Run;

DLM(or) DELIMITER(Delimiter):-
When data values having special characters in raw data/an external file. Than
we use DLM option to read data
This special character must be enclosed with quotes.
Examples:-
Data ds3;
Infile datalines dlm=, ;
Input id name$ sex$ age sal;
Datalines;
001,abc,m,23,45000
002,def,f,34,67000
003,mno,m,21,36000
004,xyz,f,27,45000
;
Run;
Data ds3a;
Infile datalines dlm=* ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002*def*f*34*67000
003*mno*m*21*36000
004*xyz*f*27*45000
;
Run;
Data ds3b;
Infile datalines dlm=* ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002*def*f*34*67000
003*mno*m*21*36000
004*xyz*f*27*45000
;
Run;
Data ds3c;
Infile datalines dlm='* ,' ;
Input id name$ sex$ age sal;
Datalines;
001*abc,m*23*45000
002*def,f*34*67000
003*mno*m*21,36000
004*xyz*f*27,45000
;
Run;
Data ds3c1;
Infile datalines DELIMITER='* ,' ;
Input id name$ sex$ age sal;
Datalines;
001*abc,m*23*45000
002*def,f*34*67000
003*mno*m*21,36000
004*xyz*f*27,45000
;
Run;
Data ds3d;
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt"dlm='* ,' ;
Input id name$ sex$ age sal;
Run;

DLMSTR:-
When data values having strings as a delimeter in raw data/an external file.
Than we use DLMSTR option to read data
This string must be enclosed with quotes and case sensitive.
Example:-
Data ds3e;
Infile datalines dlm='a' ;
Input X Y Z;
Datalines;
1a2a3
4a5a6
7a8a9
;
Run;
Data ds3e1;
Infile datalines dlm='a' ;
Input X Y$ Z;
Datalines;
1ama3
4afa6
7ama9
;
Run;
Data ds3e2;
Infile datalines dlmstr='PRD' ;
Input X Y Z;
Datalines;
1PRD2PRD3
4PRD5PRD6
7PRD8PRD9
;
Run;
DLMSOPT=Options :-
When data values having strings as a delimeter that should be in one case in all
places if it is not in one case use dlmsopt=I to read data properly.
Options=i
specifies that case-insensitive comparisons will be done.
Options=t
specifies that trailing blanks of the string delimiter will be removed.
Example:-
Data ds3e4;
Infile datalines dsd dlmstr='PRD' dlmsopt='i';
Input X Y Z;
Datalines;
1PRD2PRd3
4PrD5Prd6
7pRd8pRD9
;
Run;

FIRST OBS:-
Specify the first observation at which processing starts
Examples:-
Data ds4;
Infile datalines dlm='*'firstobs=2;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds4a;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=2;
Input id name$ sex$ age sal;
Run;

OBS:-
Specify the observation at which processing ends.
Examples:-
Data ds5;
Infile datalines dlm='*' obs=3;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds5a;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=4;
Input id name$ sex$ age sal;
Run;
Data ds5b;
Infile datalines dlm='*' firstobs=2 obs=4;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds5c;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=2 obs=4;
Input id name$ sex$ age sal;
Run;

FLOWOVER
It is default. Causes the INPUT statement to jump to the next record if it
doesnt find values forall variables.
Examples:-
Data ds6;
Infile datalines flowover;
Input Id Type$ Amount;
Datalines;
101 x 3400
102 x 2000
103 y 3400
104 y 2500
105 x 3000
;
Run;

If you specify or without specify I will give result, it is default.


Data ds6a;
Infile datalines flowover;
Input Id Type$ Amount;
Datalines;
101 x
102 x 2000
103 y 3400
104 y
105 x 3000
;
Run;
MISSOVER:-

When we have missing values in raw data or external file at the end of a data
record is encountered than we will use Missover in Infile statement.
Missing value are represented for

Numeric values denoted by period (.)


Character values denoted by blank

Examples:-

When data is separating with any delimiter we can specify DLM option so it will read
missing values also but when data is separating with space we should use MISSOVER

Data ds7;
Infile datalines Missover;
Input Id Type$ Amount;
Datalines;
101 x
102 x 2000
103 y 3400
104 y
105 x 3000
;
Run;

Data ds7a;
Infile datalines Missover;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;
Data ds7b;
Infile datalines Missover ;
Input Lname$ Fname$ Emp_id$ Job_code$;
Datalines;
LANGKAMM SARAH E0045 Mechanic
TORRES JAN E0029 Pilot
SMITH MICHAEL E0065
LEISTNER COLIN E0116 Mechanic
TOMAS HARALD
WAUGH TIM E0204 Pilot
;
Run;
In below example no need to specify MISSOVER because data is separating with special
character so it will read missing values without missover.
Data ds7c;
Infile datalines dlm='*' ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002* * *34*67000
003*mno*m*21*
004*xyz* *27*45000
;Run;
STOPOVER
Stops the DATA step when it reads a short line.
causes the DATA step to stop execution immediately and write a note to the SAS log.
Example:-
Data ds8;
Infile datalines stopover;
Input Id Type$ Amount;
Datalines;
101 x
102 x 2000
103 y 3400
104 y
105 x 3000
;
Run;

ERROR: INPUT statement exceeded record length.


INFILE CARDS OPTION STOPOVER specified.
RULE: ----+----1----+----2----+----3----+----4----+----5---
168 101 x
Id=101 Type=x Amount=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DS6 may be incomplete. When this
step was stopped there were 0 observations and 3variables.
WARNING: Data set WORK.DS6 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
But it wont give error if you take the data properly without missing values because there
is no shortline.
Data ds8a;
Infile datalines stopover;
Input Id Type$ Amount;
Datalines;
101 x 3400
102 x 2000
103 y 3400
104 y 2500
105 x 3000
;
Run;
FILENAME
Associates a file ref for externalfile
Infile statement can also use a file reference using the file name statement.

Syntax:-FILENAME fileref 'external-file'

Filename krish "C:\Documents and Settings\Administrator\Desktop\sample.txt";


Data ds1;
Infile krish;
Input id name$ sex$ age sal;
Run;

INPUT STATEMENT

The order which data values are entered the name of the SAS variables and
their type.

We should use the input statement only for data values stored in external files or for
data immediately following a cards or data lines statement.

Syntax: input <variables><specifications>;

Data ds1;
Infile datalines;
Input idno name $ team $ strtwght endwght;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;

Types of Input Statement


1) Column Input
2) List Input
a) Simple list Input
b) Modifier List Input
3) Formatted Input
4) Named Input
5) Null Input
6) Mixed Input
When data values are occupying more than 8 char we should specify input methods
Column Input:

The column numbers follow the variable name in the input statement that
numbers indicate where the variable values are found in the input data records.
When to Use Column Input
With column input, the column numbers that contain the value follow a variable name in
the INPUT statement. To read with column input, data values must be in
the same columns in all the input data records
standard numeric form or character form
Useful features of column input are that
Character values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Input values can be read in any order, regardless of their position in the record.
EX: input name $ 1-10 Sal 11-15;

Data ds2;
Infile datalines;
Input idno 1-4 name $ 6-23 team $ 25-30 strtwght 32-34 endwght 36-38;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;

List Input:

List inputs are 2 types.

1) Simple list input


2) Modified List input

Simple List Input:


The variables names are simply listed in the input statement.
A $ follows the name of each character variable.
The input statement can read data values that are separated by blanks or aligned in
columns.

EX: input city: $12. State: $12. Date: mmddyy8.

Data DS3;
Infile datalines;
Input idno 4. Name $19. Team $7. Strtwght 4. Endwght 4. ;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 232
;
Run;
Modified List Input

Modified list input makes the INPUT statement more versatile


Because you can use a format modifier to overcome several restrictions of simple list
input
List input is more versatile when you use format modifiers. The format modifiers are as
follows:

Format Modifier - Purpose

$ Indicates to store a variable value as a character value.


& reads character values that contain embedded blanks.
: reads data values that need the additional instructions that informats can provide but
that are not aligned in columns.
~ reads delimiters within quoted character values as characters and retains the
quotation marks.
+ Moves pointer to column N
# Moves the pointer record N
/Advances the pointer to column 1 of the next input record.
$
Indicates to store a variable value as a character value.
Data DS14a;
Infile Datalines;
Input name $ subject1 subject2 subject3 team $;
Datalines;
Joe 11 32 76 red
Mitchel 13 29 82 blue
Susan 14 27 74 green
;
Run;
&
Indicates that a character value can have one or more single embedded blanks.

This format modifier reads the value from the next non-blank column until the pointer
reaches two consecutive blanks,
The defined length of the variable, or the end of the input line, whichever comes first.
Data ds14b;
Infiledatalines;
Input idno name &$ team $ strtwght endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118 s
1221 Jim Brown yellow 220 .
;
Run;
Data ds14c;
Infiledatalines;
Input idno name &$18. team $ strtwght endwght;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;

Restriction: The & modifier must follow the variable name and $ sign that it affects.

:
Enables you to specify an informat that INPUT statement uses to read the
variable value.
For a character variable, this format modifier reads the value from the next non-blank
column until the pointer reaches the next blank column, the defined length of the
variable, or the end of the data line, whichever comes first. For a numeric variable, this
format modifier reads the value from the next non-blank column until the pointer
reaches the next blank column or the end of the data line, whichever comes first.

/*wrong*/
Data ds14d;
Infiledatalines;
Input item $10. Amount;
Datalines;
Trucks 1382
Vans 1235
Sedans 2391
;
Run;
/*right*/
Data ds14d;
Infiledatalines;
Input item: $10. Amount;
Datalines;
Trucks 1382
Vans 1235
Sedans 2391
;
Run;
~
Indicates to treat single quotation marks, double quotation marks, and
delimiters in character values in a special way. This format modifier reads
delimiters within quoted character values as characters instead of as delimiters and
retains the quotation marks when the value is written to a variable.
Restriction: You must use the DSD option in an INFILE statement. Otherwise, the INPUT
statement ignores this option.

Data ds14e;
Infile datalines dsd;
Input id name ~ $ sex$ age sal;
Datalines;
001,"abc",m,23,45000
002,"def",f,34,67000
003,"mno",m,21,36000
004,"xyz",f,27,45000
;
Run;
Data ds14f;
Infile datalines dsd;
Input Name: $9. Score1-Score3 Team ~ $25. Div $;
Datalines;
Joseph,11,32,76,"Red Racers, Washington",AAA
Mitchel,13,29,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,14,27,74,"Green Gazelles, Atlanta",AA
;
Run;
+Moves pointer columns N
Datadsl;
Infiledatalines;
Input team $6. +6 points 2.;
Cards;
red 59
blue 95
yellow 63
green 76
;
Run;
Multiple Input Statements
We can write multiple input statements or # format modifier to read the data when data
is available in multiple lines for one record
Data ds14g;
Infiledatalines;
Input Idno 1-4 name $7-20;
Input team $1-6;
Input strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;

Data ds14h;
Input Idno 1-4;
Input;
Input strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;
#Moves the pointer to record N.
Data ds14i;
Input #1 name $ 6-23 idno 1-4
#2 team $ 1-6
#3 strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177

1078 Ashley McKnight


red
127 118
1221 Jim Brown
yellow
220 .
;
Run;
Datads14j;
Infile datalines;
Input #2 team $ 1-6
#1 name $ 6-23 idno 1-4
#3 strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;

/
Advances the pointer to column 1 of the next input record.
Data ds14k;
Infile datalines;
Input idno 1-4/ / strtwght 1-3 endwght 5-7;
Datalines;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;

Formatted Input:
An in format follows with the variable name in the input statement.
The in format gives the data type and the field width of an input value. In formats also
to read data that are stored in non standard form, such as packed decimals or numbers
that contain special characters such as command.

Ex: input @1 name $10. @11 Sal 5.;

Data ds15;
Infiledatalines;
Input @1idno 4. @6name $18. @25team $5. @32strtwght 3. @36endwght 3.;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;

Named input:
We specify the name of the variable followed by an equal sign. SAS looks for n
variable name and an equal sign in the input record.

EX: input city = $8. State = $6. Date mmddyy8.

Data ds16;
Infiledatalines;
Input Id= Name=$18. Team=$6. Strtwght= Endwght=3. ;
Cards;
ID=1023 NAME=David Shaw TEAM=red strtwght=189 endwght=165
ID=1049 NAME=Amelia Serrano TEAM=yellow strtwght=145 endwght=124
ID=1219 NAME=Alan Nance TEAM=red strtwght=210 endwght=192
ID=1246 NAME=Ravi Sinha TEAM=yellow strtwght=194 endwght=177
ID=1078 NAME=Ashley McKnight TEAM=red strtwght=127 endwght=118
ID=1221 NAME=Jim Brown TEAM=yellow strtwght=220
;
Run;

Data DS16a;
Infiledatalines;
Input Id= Name=$18. Team=$6. Strtwght= Endwght=3. ;
Cards;
NAME=David Shaw TEAM=red strtwght=189 endwght=165 ID=1023
NAME=Amelia Serrano TEAM=yellow strtwght=145 endwght=124 ID=1049
NAME=Alan Nance TEAM=red strtwght=210 endwght=192 ID=1219
ID=1246 NAME=Ravi Sinha TEAM=yellow strtwght=194 endwght=177
ID=1078 NAME=Ashley McKnight TEAM=red strtwght=127 endwght=118
ID=1221 NAME=Jim Brown TEAM=yellow strtwght=220
;
Run;
When to Use Named Input:

Named input reads the input data records that contain a variable name followed by an
equal sign and a value for the variable

The INPUT statement reads the input data record at the current location of the input
pointer. If the input data records contain data values at the start of the record that the
INPUT statement cannot read with named input, use another input style to read
them.

Using Named Input with another Input Style


Data ds16b;
Infiledatalines;
Input Id Name=$18. Team=$6. Strtwght= Endwght=3. ;
Cards;
1023 NAME=David Shaw TEAM=red strtwght=189 endwght=165
1049 NAME=Amelia Serrano TEAM=yellow strtwght=145 endwght=124
1219 NAME=Alan Nance TEAM=red strtwght=210 endwght=192
1246 NAME=Ravi Sinha TEAM=yellow strtwght=194 endwght=177
1078 NAME=Ashley McKnight TEAM=red strtwght=127 endwght=118
1221 NAME=Jim Brown TEAM=yellow strtwght=220
;
Run;

/*Not reading Team */


Data ds16c;
Infiledatalines;
Input Id Name=$18. Team $6. Strtwght= Endwght=3.;
Cards;
1023 NAME=David Shaw red strtwght=189 endwght=165
1049 NAME=Amelia Serrano yellow strtwght=145 endwght=124
1219 NAME=Alan Nance red strtwght=210 endwght=192
1246 NAME=Ravi Sinha yellow strtwght=194 endwght=177
1078 NAME=Ashley McKnight red strtwght=127 endwght=118
1221 NAME=Jim Brown yellow strtwght=220
;
Run;
Data ds16d;
Infiledatalines;
Input Id name=$18. Team $30-36Strtwght= Endwght=3. ;
Cards;
1023 NAME=David Shaw red strtwght=189 endwght=165
1049 NAME=Amelia Serrano yellow strtwght=145 endwght=124
1219 NAME=Alan Nance red strtwght=210 endwght=192
1246 NAME=Ravi Sinha yellow strtwght=194 endwght=177
1078 NAME=Ashley McKnight red strtwght=127 endwght=118
1221 NAME=Jim Brown yellow strtwght=220
;
Run;
Reading Character Variables with Embedded Blanks
Data ds16e;
Informat Header $30. Name $15. ;
Input header= name=;
Datalines;
Header= age=60 AND UP Name=PHILIP
;
Run;

Null Input:
The INPUT statement with no arguments (variables) is called a null INPUT. The
DATA step copies records from the input file to the output file without creating any SAS
variables.
Data ds17;
Input;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
In above program when input statement executes, one record at a time is storing into
input buffer then pdv picks data from input buffer and assigns to corresponding variables
But there is no variables so its create zero variable dataset.

Mixed Input:

The input statement with all input styles called mixed input
EX: input city = $1-8. State = $6. Date mmddyy8.
Data ds18;
Infiledatalines;
Input Idno Name $ 6-23 @25Team $7.Strtwght 3. Endwght 36-38;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
INPUT SPECIFICATIONS

@ (Single Trailing)

Holds an input record for the execution of the next INPUT statement within the same
iteration of the DATA step.

This line-hold specifier is called trailing @.

Restriction: The trailing @ must be the last item in the INPUT statement.

Data redteam;
Infile datalines;
Input team $ 13-18@;
If team='red';
Input idno 1-4 strtwght 20-23 endwght 24-26;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;
Data ds;
Input id 1-3@;
If id in (101,103,105) ;/*if id=101; ---- Only for one value*/
Input name$ 5-7 sal 9-12;
Datalines;
101 abc 1000
102 asd 2000
103 dfg 3000
104 hjk3400
105 xyz 5000
;
Run;

@@ (Double Trailing)

Holds an input record for the execution of the next INPUT statement across iterations of
the DATA step.

This line-hold specifier is called double trailing @@.


Restriction: The double trailing @ must be the last item in the INPUT statement.
Examples:-
Data zone_temp;
Infiledatalines;
Input zone $ tempc tempf @@;
Cards;
East 34 89 west 25 78 north 33 82 south 42 98
;
Run;
Data city_temp;
Infiledatalines;
Input city$ tempc tempf@@;
Cards;
Delhi 34 89 Kolkata 25 78
Mumbai 33 82 Chennai 42 98
Hyderabad 40 94 Bangalore 31 79
Pune 28 71
;
Run;

DATALINES STATEMENT
Indicates that data lines follow
Syntax:- DATALINES;

Use the DATALINES statement with an INPUT statement to read data that you enter
directly in the program.
The DATALINES statement is the last statement in the DATA step and immediately
precedes the first data line

/*Generally SAS processes data lines longer than 80 columns in their entirety*/
/*if we need more then that need to use CARDIMAGE system option*/
/*If we use CARDIMAGE, SAS processes data lines exactly like 80-byte punched card
images padded with blanks*/

Use the DATALINES statement whenever data does not contain semicolons
If data contains semicolons use DATALINES4 statement

Example: - (datalines)
Data health;
Infiledatalines;
Input id name &$18. Sex$ RBC WBC;
Datalines;
1023 David Shaw f 1900 120
1049 Amelia Serrano m 2000 125
1219 Alan Nance m 2100 130
1246 Ravi Sinha f 2050 122
1078 Ashley McKnight f 2200 150
;
Run;

Example: - (datalines4)
Data health;
Infile datalines;
Input id name &$18. Sex$ RBC WBC;
Datalines4;
1023 David Shaw f 1900 120 ;
1049 Amelia Serrano m 2000 125 ;
1219 Alan Nance m 2100 130 ;
1246 Ravi Sinha f 2050 122 ;
1078 Ashley McKnight f 2200 150 ;
;;;;
Run;

Data health;
Infile datalines;
Input id name &$18. Sex$ RBC WBC 30-32;
Datalines4;
1023 David Shaw f 1900 120;
1049 Amelia Serrano m 2000 125;
1219 Alan Nance m 2100 130;
1246 Ravi Sinha f 2050 122;
1078 Ashley McKnight f 2200 150;
;;;;
Run;

INFORMAT STATEMENT

Informat is an instruction that SAS uses to read data values into a variable.
Informats are usually specified in an input statement.If coded with the informat
statements, attach an informat to a variable for subsequentinput.
Informats can be user-written informats also.
Syntax: -INFORMAT variable-1<informat-1>variable-N<informat-N>;
Categories ofInformats:-
Character Informats: -
Reads character data into character variables.
Syntax: -$informatw.
Ex: - $
$10.
$20.
$Char.
Examples:-
Data infmt6;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
Data informat6a;
Infile datalines;
Input idno name &$18.team$ strtwght endwght;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;

Numeric Informats: -
Reads numeric data values from numeric variables
Syntax: -informatw.d
Ex: - comma12.
dollar10.2
Examples:-
Data infmt1;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
In above example data is there properly so no need to worry for reading data, but see
the below program sal variable is containing comma with values here sal is numeric
variable but comma is special character so we cant read data in this case so we can
specify informat to read data, not only with comma when numeric data contains comma,
dollar we can specify Numeric informats like below.
Data infmt2;
Infile datalines;
Input id name$ age sex$ sal comma6.;
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run
Data infmt3;
Infile datalines;
Input id name$ age sex$ sal;
Informat sal comma6.;
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run;
Data infmt4;
Infile datalines;
Input id name$ age sex$ sal dollar5.;
Datalines;
001 David 23 m $5000
002 Amelia 32 f $2500
003 Alan 31 f $3000
004 Ravi 21 m $4500
005 Jim 35 f $28000
;
Run;
Data infmt5;
Infile datalines;Informat we can write with Input
Input id name$ age sex$ sal dollar6.;statement, after the variable or we can
/*Informat sal dollar6.;*/write as a separate statement like this
Datalines;
001 David 23 m $5,000
002 Amelia 32 f $2,500
003 Alan 31 f $3,000
004 Ravi 21 m $4,500
005 Jim 35 f $2,800
;
Run;
Date and time Informats:-
Reads date values into variables representing time, dates and date times.
Syntax: -informatw.
Ex: - date7. Ex: - 23Oct09
date9. Ex: - 23Oct2009
ddmmyy8. Ex: - 23/10/09
ddmmyy10. Ex: - 23/10/2009
anydtdte. If u doesnt know about data informat then you can use this
time. Ex: - 06:33:45
datetime. Ex: - 23Oct09:06:33:45
Examples:-
Data infmt7;
Infile datalines;
Input id name$ age sex$ sal dob;
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;

In SAS dates are Numeric data type but DOB values contains character values in above
example so we cant read, to read dates we should use date informats like below
Data infmt7a;
Infile datalines;
Input id name$ age sex$ sal dob date9.;
/*Informat dob date9.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data infmt7b;
Infile datalines;
Input id name$ age sex$ sal dob date7.;
/*Informat dob date7.;*/
Datalines;
001 David 23 m 50000 10Feb83
002 Amelia 32 f 25000 15May84
003 Alan 31 f 30000 21Jul84
004 Ravi 21 m 45000 05Aug84
005 Jim 35 f 28000 30Jan85
;
Run;
Data infmt7c;
Infile datalines;
Input id name$ age sex$ sal dob anydtdte.;
/*Informat dob date9.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data infmt8;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy10.;
/*Input id name$ age sex$ sal dob anydtdte9. doj:anydtdte10.;*/
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data infmt8a;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy8. ;
/*Informat dob date9. doj ddmmyy8.;*/
Datalines;
001 David 23 m 50000 10Feb1983 12/01/11
002 Amelia 32 f 25000 15May1984 15/01/11
003 Alan 31 f 30000 21Jul1984 31/01/11
004 Ravi 21 m 45000 05Aug1984 25/02/11
005 Jim 35 f 28000 30Jan1985 08/03/11
;
Run;
Data infmt8b;
Infile datalines;
Input id name$ age sex$ sal dob doj;
Informat dob date9. doj ddmmyy10.;
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data infmt9;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob datetime.;
Datalines;
001 David 23 m 50000 10Feb1983:10:30:15
002 Amelia 32 f 25000 15May1984:11:23:23
003 Alan 31 f 30000 21Jul1984:08:34:45
004 Ravi 21 m 45000 05Aug1984:12:43:56
005 Jim 35 f 28000 30Jan1985:03:35:12
;
Run;
Data infmt9a;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob time.;
Datalines;
001 David 23 m 50000 10:30:15
002 Amelia 32 f 25000 11:23:23
003 Alan 31 f 30000 08:34:45
004 Ravi 21 m 45000 12:43:56
005 Jim 35 f 28000 03:35:12
;Run;
Column binary Informats:-
Reads data stored in column- binary or multi punched form into character and numeric
variables
Ex: - row 12.3, $ cd4.

FORMAT STATEMENT

Format is an instruction that SAS uses to write data values.


The format is exactly the same as that for informat.
Infact most SAS defined informats are also SAS defined formats.
However there are some informats such as anydtdte.
That is not defined as Formats.

Syntax: -FORMAT variable-1<format-1>variable-N<format-N>;

Categories of Formats:-

Character Formats: -
Writes character data values from character variables
Character informats and character formats both are same
Syntax: -$ formatw.
Ex: - $
$10.
$20.
Examples:-
Data fmt5;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
Data fmt5a;
Infile datalines;
Input idno name &$18.team$ strtwght endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;

Numeric Formats: -
Writes numeric data values from numeric variables.
Syntax: -formatw.d
Ex: - dollar10.
dollar 10.2
comma10.
comma10.2
percent 4.2
best10.

Examples:-
Data fmt1;
Infile datalines;
Input id name$ age sex$ sal comma6.;
Format sal comma6.;
/*Format sal comma9.2;*/
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run;
Data fmt2;
Infile datalines;
Input id name$ age sex$ sal;
Informat sal comma9.2;
Format sal comma9.2;
Datalines;
001 David 23 m 50,000.55
002 Amelia 32 f 25,000.00
003 Alan 31 f 30,000.60
004 Ravi 21 m 45,000.77
005 Jim 35 f 28,000.50
;
Run;
Data fmt3;
Infile datalines;
Input id name$ age sex$ sal dollar5. ;
Format sal dollar5.;
Datalines;
001 David 23 m $5000
002 Amelia 32 f $2500
003 Alan 31 f $3000
004 Ravi 21 m $4500
005 Jim 35 f $28000
;
Run;
Data fmt3a;
Infile datalines;
Input id name$ age sex$ sal dollar6. ;
/*Format sal dollar6.;*/
Format sal dollar9.2;
/*Format sal comma6.;*/
Datalines;
001 David 23 m $5,000
002 Amelia 32 f $2,500
003 Alan 31 f $3,000
004 Ravi 21 m $4,500
005 Jim 35 f $2,000
;
Run;

Date, Time and Datetime Formats: -


Write data values from variables representing Time, date and date time variables.
Syntax: -formatw.
Ex: - date7. Ex:- 23Oct09
date9. Ex:- 23Oct2009
ddmmyy8. Ex:-23/10/09
ddmmyy10. Ex:-23/10/2009
time5. Ex:-08:20
time8. Ex:-08:20:30
datetime20. Ex:-23Oct2009:08:20:30
worddate20. Ex:-October23, 2009
weekdate20. Ex:-Friday, October23, 2009
yymmddn8. Ex:-20091023
yymmdds8. Ex:-09/10/23
yymmdds10. Ex:-2009/10/23
yymmddD8. Ex:-09-10-23
yymmddD10. Ex:-2009-10-23
yymmddC8. Ex:-09:1:23
yymmddC10. Ex:-2009:10:23

Examples:-
Data fmt6;
Infile datalines;
Input id name$ age sex$ sal dob date9.;
Format dob date9.;
/*Format dob date7.;*/
/*Format dob date9.;*/
/*Format dob ddmmyy8.;*/
/*Format dob ddmmyy10.;*/
/*Format dob worddate20.;*/
/*Format dob weekdate30.;*/
/*Format dob yymmddN8.;*/
/*Format dob yymmddS8.;*/
/*Format dob yymmddS10.;*/
/*Format dob yymmddD8.;*/
/*Format dob yymmddD10.;*/
/*Format dob yymmddC8.;*/
/*Format dob yymmddC10.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data fmt7;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy10. ;
Format dob worddate20. doj weekdate30.;
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data fmt8;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob datetime.;
Format dob datetime.;
/*Format dob datetime20.;*/
Datalines;
001 David 23 m 50000 10Feb1983:10:30:15
002 Amelia 32 f 25000 15May1984:11:23:23
003 Alan 31 f 30000 21Jul1984:08:34:45
004 Ravi 21 m 45000 05Aug1984:12:43:56
005 Jim 35 f 28000 30Jan1985:03:35:12
;
Run;
Data fmt9;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob time.;
Format dob time.;
/*Format dob time8.;*/
/*Format dob time5.;*/
Datalines;
001 David 23 m 50000 10:30:15
002 Amelia 32 f 25000 11:23:23
003 Alan 31 f 30000 08:34:45
004 Ravi 21 m 45000 12:43:56
005 Jim 35 f 28000 03:35:12
;
Run;

Column binary Formats: -


Writes data stored in column- binary or multi punched form into character and numeric
variables.
Ex: - row 12.3 $cd4.
User defined Formats: -
Created by using proc format.
(This topic covers in Proc Format)
Example:-
Data fmt9a;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob time.;
Format dob time.;
Datalines;
001 David 23 m 50000 10:30:15
002 Amelia 32 f 25000 11:23:23
003 Alan 31 f 30000 08:34:45
004 Ravi 21 m 45000 12:43:56
005 Jim 35 f 28000 03:35:12
;
Run;

Procformat;
value $gen 'f'='Female'
'm'='Male';
Run;
Data fmt9b;
Set fmt9;
Format sex $gen.;
Run;
Procreportdata=fmt9a nowd;
Column id name age sex sal dob ;
Define Sex/displayformat=$gen.;
Run;

LENGTH STATEMENT
Specifies the number of bytes for storing variable values.
We can assign length for variables.
Syntax: -LENGTH variable(s)<$>length
Examples:-
Data DS1;
Length name $10.;
Infile datalines;
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data ds2;
Length name $10.;
Set ds1;
Run;

Length statement always should write before Input or before Set statement
Otherwise when input executes whatever the length is there(default is 8) that will come
into output.

LABEL STATEMENT
Assigns descriptive labels to variables.
Syntax: - LABELvariable-1='label-1' . . . <variable-n='label-n'>;
LABELvariable-1=' ' . . . <variable-n=' '>;
Examples:-
Data DS1;
Infile datalines;
Input id name$ sex$ age sal ;
Label name='Emp Name'sex='Gender' sal='Income';
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS2;
Infile datalines;
Label name='Emp Name'sex='Gender' sal='Income';
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
If we write Label statement after input statement output dataset will be in input order
But if we write before input, output dataset order should be in label order then what ever
variable is not there in label those will come in input order
If we specify label is ' '
raw data variable name should come into output
dataset.
Means below exampleLabel name=' ' so output dataset contains variable is name
Data DS3;
Infile datalines;
Input id name$ sex$ age sal ;
Label name=' 'sex='Gender'sal=' ';
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Note: we can specify attributes as individually with above statements like Length, Label,
Informt and Format or all attributes we can specify with one statement that is Attrib
Statement.
ATTRIB STATEMENT
Associates a format, informat, label, and/or length with one or more variables
Syntax: -ATTRIB variable-list(s) attribute-list(s) ;
Generally using Attrib statement we can change length, format, informat and label.
Examples:-
Data DS1;
Attrib name length=$10.;
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS2;
Attrib name length=$10.label='Emp Name';
Input id name$ sex$ age sal;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS3;
Attrib name length=$10. label='Emp name'
sal format=comma6.label='Income';
Input id name$ sex$ age sal;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS4;
Attrib name length=$10.
doj format=worddate30.Label='Date of joining'
sex label='gender' sal format=dollar7. ;
Infile datalines;
Input id name$ sex$ age sal doj date9.;
Datalines;
001 Ronald m 23 50000 02Jan2009
002 Clark f 22 34500 22Feb2010
003 Roopa f 26 45000 30Apr2010
;
Run;
Data DS5;
Attrib name length=$10.
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Infile datalines;
Input id name$ sex$ age sal dob doj ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;
Data DS6;
Infile datalines;
Input id name$ sex$ age sal dob doj ;
Attrib name label='Empname'
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;
Data DS7;
Infile datalines;
Input id name$ sex$ age sal dob date9. doj:date9. ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;

Data DS7a;
Attrib name length=$10.
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Set DS6;
Run;
Note: When we specify length as an attribute with Attrib statement, Attrib statement
must write before Input or before Set statement otherwise when input statement or set
statement executes whatever the length is there for variable that will come into output.
And the output dataset order also change accordingly attrib statement variable order
that we can change again into required order in Proc print
But if we specify Attrib statement after Input or Set statement dataset order is Input
statement order (we can specify like this when we are not using Length in attrib
statement).
SUM STATEMENT;
Adds the result of an expression to an accumulator variable
Syntax: -variable+expression;
Examples:-
Data summ1;
a=2;
b=3;
c=4;
d=5;
Run;
Data summ2;
Set summ1;
Total=a+b+c+d;
Run;
Data summ3;
a=2;
b=3;
c=4;
d=.;
e=5;
g=.;
Run;
Data summ4;
Set summ3;
Total=a+b+c+d+e+g;
Run;

Sum Function
Data summ5;
Set summ3;
Total=Sum(a,b,c,d,e,g);
Run;

Difference between SUM Statement and SUM Function


SUM Statement Adds the value into variable with non missing values
It wont consider missing values.
If missing value are there after sum value is missing.
Ex: - x2=sum (4+.+9+3+8+.);
Above example gives value missing
SUM Function returns the sum of non missing values
Ex: - x2=sum (4,.,9,3,8,.);
It gives value 24

Data Loan_info;
Infile datalines dsd;
Input Loan_id$ Cust_Name : $15. Age Loan_amt1:dollar5.
Loan_amt2:dollar5. Loan_amt3 dollar5.;
FormatLoan_amt1 dollar5.Loan_amt2 dollar5.Loan_amt3 dollar5.;
Datalines;
LP101,Ravi Sinha,23,$3000,$3500,$2000
LP102,Alan Nance,29,$2500,$1500,
LP103,Brown lee,31,$5000,$1000,$2000
LP104,Ashley McKnight,22,$1500, ,$3000
LP105,Jim Brown,25,$4500,$1000,$1200
;
Run;
Data Loan_info;
Set Loan_info;
Format Total1 dollar6. Total2 dollar6.;
Total1=Loan_amt1+Loan_amt2+Loan_amt3; /* Sum Statement */
Total2=Sum(Loan_amt1,Loan_amt2,Loan_amt3); /* Sum Function */
Run;

RETAIN STATEMENT:

Retain the values of the variable in subsequent iterations of the data step.
Retain statement prevents SAS form re-initializing the values of new variables
At the top of data step and can be used to create an accumulator variable.
Syntax: -RETAIN <element-list(s)<initial-value(s)
Examples:-
Data Ret1;
Input Id Mon$ Sales;
Datalines;
101 Jan 230
102 Feb 320
103 Mar 210
104 Apr 210
105 May 180
106 Jun 310
;
Run;
Data Ret2;
Set Ret1;
Retain total 0;
Total=Total+Sales;
Run;
Data Ret3;
Retain Row;
Set Ret1;
Row+1;
Run;

STOP STATEMENT
Stops execution of the current Data step.
It create 0 observations dataset
Syntax: -STOP;
Examples:-
Data DS1;
Stop;
Infile datalines;
Input Loan_id$ Cust_Name : &$15. Loan_amt: dollar5.;
/*Stop;*/
Format Loan_amt dollar5.;
Datalines;
LP101 Ravi Sinha $4500
LP102 Alan Nance $7000
LP103 Brown lee $6000
LP104 Jim Brown $5000
LP105 McKnight $8000
;
Run;
The data set WORK.DS1 has 0 observations and 3 variables.

Data DS2;
Stop;
Set sashelp.class;
Run;
The data set WORK.DS2 has 0 observations and 5 variables.
In above programs when input/set statement executes data should read into input buffer
but we are using stop statement so it wont read into input buffer, so pdv cant assign
data to variables thats why we are getting 0 observations dataset.
IF Statement, Subsetting
Continues processing only those observations that meet the condition.
Syntax:- IF expression;
Examples:-
Data ds1;
Input idno name $ team $ strtwght endwght;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;
Data ds2;
Set ds1;
If team='red';
Run;

IF-THEN Statement
Executes a SAS statement for observations that meet specific conditions
Syntax:-IF expression THEN statement;
Datads3 ;
Set ds1;
If team='red' then team=1;
Run;
Data ds3a ;
Setds1;
If team='red'then team=1;
If team='yellow'then team=2;
If team='green'then team=3;
If team='blue'then team=4;
Run;
Data ds3b ;
Setds1;
If team='red'then team='R';
If team='yellow'then team='Y';
If team='green'then team='G';
If team='blue'then team='B';
Run;

Data ds3c ;
Setds1;
If team='red'then team1='R';
If team='yellow'then team1='Y';
If team='green'then team1='G';
If team='blue'then team1='B';
Run;

IF-THEN/ELSE Statement
Executes a SAS statement for observations that meet specific conditions
Syntax:-IFexpressionTHENstatement; <ELSEstatement ;>
Data ds4 ;
Setds1;
If team='red'then team=1;
Else team=2;
Run;
Data ds4a ;
Setds1;
If team='red'then team=1;
Elseif team='yellow'then team=2;
Elseif team='green'then team=3;
Else team=4;
Run;

IF-THEN/ELSE OUTPUT
Executes a SAS statement for observations that meet specific conditions
Using this we can create multiple datasets at a time based on conditions.
Data ds5 ds6 ;
Setds1;
If team='red'thenoutput ds5;
Elseoutput ds6;
Run;
Data ds5 ds6 ds7 ds8 ;
Setds1;
If team='red'thenoutput ds5;
Elseif team='yellow'thenoutput ds6;
Elseif team='green'thenoutput ds7;
Elseoutput ds8;
Run;

IF-THEN/ELSE DELETE
Executes a SAS statement for observations that meet specific conditions
Using this we can delete observations based on condition
Data ds9;
Set ds1;
If team='red'thendelete;
Run;
Data ds9 ds10;
Set ds1;
If team='red'then delete;
Else output ds10;
Run;

WHERE Statement
Selects observations from SAS data sets that meet a particular condition
Syntax:-

WHEREwhere-expression-1<logical-operator>where-expression-n;

Operator Type Symbol or Mnemonic Description


Arithmetic
* Multiplication
/ Division
+ Addition
- Subtraction
** Exponentiation
Comparison
= or EQ equal to
^=, =, ~=, or NE not equal to
> or GT greater than
< or LT less than
>= or GE greater than or equal to
<= or LE less than or equal to
IN equal to one of a list
Logical (Boolean)
& or AND logical and
| or OR logical or1
~,^ , , or NOT logical not
Other
|| concatenation of character variables
() indicate order of evaluation
+ prefix positive number
- prefix negative number
WHERE Expression Only
BETWEEN-AND an inclusive range
? or CONTAINS a character string
IS NULL or IS MISSING missing values
LIKE match patterns
=* sounds-like
SAME-AND add clauses to an existing WHERE
statement without retyping original one
Examples:-
Datads1;
Input pid drug$ visit_date date9.;
Formatvisit_date date9.;
Cards;
101 asp-05mg 12jan2005
102 asp-10mg 14jan2005
101 asp-05mg 18jan2005
102 asp-10mg 12jan2005
101 asp-05mg 21jan2005
103 asp-15mg 12jan2005
101 asp-05mg 30jan2005
102 asp-10mg 12jan2005
101 asp-05mg 23jan2005
102 asp-10mg 12jan2005
101 asp-05mg 11jan2005
103 asp-15mg 12jan2005
101 asp-05mg 15jan2005
104 asp-20mg 12jan2005
101 asp-05mg 16jan2005
102 asp-10mg 12jan2005
103 asp-15mg 12jan2005
103 asp-15mg 12jan2005
101 asp-05mg 15jan2005
;
Run;
Data ds2;
Set ds1;
Where pid=101;
Run;

Data ds2a;
Set ds1;
Where pid=>101;
Run;
Data ds2b;
Set ds1;
Where drug='asp-10mg';
Run;
Data ds2c;
Set ds1;
Where date='15jan2005'd;
Run;
Where with Operators
WHERE AND
Data ds3a;
Set ds1;
where visit_date >'12jan2005'd and visit_date <'20jan2005'd ;
Run;
Data ds3b;
Set ds1;
Where p_id >101 and p_id <104 ;
Run;
WHERE BETWEEN
Data ds4;
Set ds1;
Wherevisit_date between '15jan2005'd and '21jan2005'd ;
Run;
WHERE IN
Data ds5;
Set ds1;
where p_id in (102103 ) ;
Run;

WHERE LIKE
Like operator is useful to select data with particular letter in a variable
Data ds6;
Input p_id drug_name$ visit_date date9.;
Format visit_date date9.;
Cards;
101 asp-05mg 12jan2005
102 asp-10mg 14jan2005
101 bsp-05mg 18jan2005
102 aap-10mg 12jan2005
101 csp-05mg 21jan2005
103 amp-15mg 12jan2005
101 dsp-05mg 30jan2005
102 dsp-10mg 12jan2005
;
Run;
Data ds6a;
Set ds6;
Where drug_name like 'c%' ;
Run;
Data ds6b;
Set ds6;
Where drug_name like '_a%' ;
Run;
Data ds6c;
Set ds6;
Where drug_name like '_____5%' ;
Run;
Data ds6d;
Set ds6;
Where drug_name like '%g' ;
Run;
Data ds6e;
Set ds6;
Where drug_name like '%m_' ;
Run;
Data ds6f;
Set ds6;
Where drug_name like '%0__' ;
Run;

WHERE CONTAINS(?)
Select the data where ever that letter is there in variable
But letter is case sensitive because it works on only character

Data ds1;
Infile datalines;
Length name $12.;
Input name$ sex$ sal dollar5.;
Format sal dollar6.;
Datalines;
Ramakrishna m $5000
pragna f $3500
Raju m $4500
Mohanprasad m $6000
;
Run;
Data ds2;
Set ds1;
Where name contains 'r';
Run;
Data ds3;
Set ds1;
Where name contains 'R';
Run;
Data ds4;
Set ds1;
Where name ? 'R';
Run;

WHERE NULL/MISSING
Select the data only null/missing values
Data ds1;
Input p_id 3. +1 drug_name$8. +1 visit_date date9.;
Format visit_date date9.;
Cards;
101 asp-05mg 12jan2005
102 asp-10mg 14jan2005
101 bsp-05mg 18jan2005
102 12jan2005
101 csp-05mg 21jan2005
103 amp-15mg 12jan2005
101 30jan2005
102 dsp-10mg 12jan2005
;
Run;
Data ds1a;
Set ds4;
where drug_name is null;
run;
Data ds1b;
Set ds4;
where drug_name is missing;
run;

WHERE SOUNDS-LIKE
Select the data only when sound is same .
Even spelling is different also it will pick if pronunciation is same.

Data ds1;
Input p_id p_name$ drug_name$ visit_date date9.;
Format visit_date date9.;
Cards;
101 john asp-05mg 12jan2005w
102 smith asp-10mg 14jan2005
101 smit bsp-05mg 18jan2005
102 clark aap-10mg 12jan2005
101 manish csp-05mg 21jan2005
103 clarc amp-15mg 12jan2005
101 ronald dsp-05mg 30jan2005
102 ronold dsp-10mg 12jan2005i
;
Run;
Data ds1a;
Set ds1;
where p_name='smith';
Run;
Data ds1b;
Set ds1;
where p_name=*'smith';
Run;

WHERE SAME AND


Data ds1;k
Input p_id p_name$ drug_name$ visit_date date9.;
Format visit_date date9.;
Cards;
101 john asp-05mg 12jan2005
102 smith asp-10mg 14jan2005
101 smit bsp-05mg 18jan2005
102 clark aap-10mg 12jan2005
101 manish csp-05mg 21jan2005
103 clarc amp-15mg 12jan2005
101 ronald dsp-05mg 30jan2005
102 ronold dsp-10mg 12jan2005
;
Run;
data ds1a;
set ds1;
where visit_date >'12jan2005'd;
where same and visit_date <'20jan2005'd;
Run;
It works like where and operator.
COMBINING DATASETS
Ways to combine datasets
-> Concatenation
-> Interleaving
-> Merge
-> Update
-> Modify

Concatenation:
Combining two or more SAS Datasets into a single SAS Dataset one after other
using SET Statement.
The number of observations in new sas dataset is equal to the sum of the number
observations from original datasets.
Ex;
DS3(20) = DS1(10) + DS2(10)

The new data set contains all observations from DS1 followed by all observations from DS2

DS1 OUTPUT
DS2 OUTPUT

Syntax:-
Set dataset(s);
Examples:-
If original datasets contain same variables, the variables in new dataset are
same as the variables in the original datasets.
Data ds1;
Infile datalines;
Input P_id Drug_name$ Visit_date;
InformatVisit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
101 asp-05mg 18Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 30Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 23Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds2;
Infile datalines;
Input P_id Drug_name$ Visit_date;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3;
Set ds1 ds2;
Run;

If original datasets contain different variables, observations from one dataset


having missing values for variables in new datasets.
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date;
Informat Visit_date date9.;
FormatVisit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 17Jan2011
103 asp-15mg 12Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
;
Run;
Data ds3;
Set ds1 ds2;
Run;
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date Age;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;
Data ds3;
Set ds1 ds2;
Run;
If original Datasets contain different Data types for variables, concatenation
wont happen
ERROR: Variable p_id has been defined as both character and numeric.
Use Input function to covert P_id from character to numeric Data type before
performing concatenation.
Data ds1;
InfileDatalines;
Input P_id$ Drug_name$ Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date Age;
InformatVisit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;
Data ds3;
Set ds1 ds2;
Run;
Data ds1;
InfileDatalines;
Input P_id$ Drug_name$ Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Converting P_id from character numeric
Data ds1a (drop=p_id rename=(p_id1=p_id));
Set ds1;
p_id1=Input (p_id, 3.);
Run;

Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date Age;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;
Data ds3;
Set ds1a ds2;
Run;

If original Datasets contain different lengths for variables, concatenation done.


The length for new Dataset variable is equal to the first Dataset variable length.
Data ds1;
InfileDatalines;
Input P_id Drug_name:$10. Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$8. Visit_date Age;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;
Data ds3;
Set ds1 ds2;
Run;

Concatenation with options


Firstobs and Obs
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date Sex$;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 f
101 asp-05mg 18Jan2011 f
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 m
101 asp-05mg 30Jan2011 f
102 asp-10mg 12Jan2011 m
101 asp-05mg 23Jan2011 f
102 asp-10mg 12Jan2011 f
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date Age;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32-
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;

Concatenation with dataset options

Data ds3;
Set ds1(firstobs=4) ds2;
Run;
Data ds3;
Set ds1(firstobs=4) ds2(obs=7);
Run;
Data ds3;
Set ds1 ds2(firstobs=4 obs=8);
Run;

Keep, Drop, Rename


Data ds3;
Set ds1(keep=drug_name visit_date) ds2;
Run;
Data ds3;
Set ds1(keep=drug_name visit_date) ds2(rename=(p_id=patient_id));
Run;
Data ds3;
Set ds1(drop=drug_name visit_date) ds2(rename=(p_id=patient_id));
Run;

Point=Slice
We can use this option for selecting particular observations from dataset.
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
103 asp-10mg 12Jan2011
104 asp-05mg 21Jan2011
105 asp-15mg 12Jan2011
106 asp-10mg 12Jan2011
107 asp-15mg 12Jan2011
;
Run;
Data ds2;
Slice=2;
Set ds1 point=slice;
Output;
Stop;
Run;
Data ds3;
Do slice=2,4,5;
Set ds1 point=slice;
Output;
End;
Stop;
Run;
Concatenation with multiple SET statements (one to one reading)
Combines observations from two or more SAS Datasets into a one observation
using two or more SET statements. The new Dataset contains all the variables
from all Input Datasets.

Syntax:-
Set dataset1;
Set dataset2;
Set datasetN;

Both datasets contains same variables and same number of observations


The number of observations in the new Dataset is the number of observations from
second Dataset if the variables are same from both datasets.
Data ds1;
Infile Datalines;
Input a b c;
Datalines;
123
456
;
Run;
Data ds2;
Infile Datalines;
Input a b c;
Datalines;
345
678
;
Run;
Data ds3; In this example second dataset variables (a b c) are same with first
Set ds1; dataset variables so second dataset variables overwrite on first
Set ds2; dataset variables.
Run;
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 12Jan2011
103 asp-15mg 12Jan2011
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date;
Informat Visit_date date9.
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;

Data ds3b;
Set ds2;
Set ds1;
Run;

Both datasets contains different variables and same number of observations


If the Dataset doesnt contains common variables extra variables which are not common
also comes into output Dataset. And common variables are overwrite by second dataset

Data ds1;
Infile Datalines;
Input a b c d;
Datalines;
1234
4567
;
Run;
Data ds2;
Infile Datalines;
Input a b c;
Datalines;
345
678
;
Run;
Data ds3; In this example second dataset variables (a b c) are same with first
Set ds1; dataset so a b c variables come from second dataset and d variable
Set ds2; comes from first dataset. Here same variables over write from
Run; second dataset and extra variables comes from any one dataset
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date Sex$ ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 m
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 f
103 asp-15mg 12Jan2011 m
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;
Both datasets contains same variables and different number of observations
Both datasets contain same variables so second dataset values are overwrite on first
dataset but no of observations are different so which ever dataset contains less
observations that many observations come from second dataset.
Data ds1;
Infile Datalines;
Input a b c ;
Datalines;
123
456
789
;
Run;

Data ds2;
Infile Datalines;
Input a b c;
Datalines;
345
678
;
Run;

Data ds3; In this example second dataset ds2 variables (a b c) are same with
Set ds1; first dataset ds1 so a b c variable values come from ds2 dataset.
Set ds2; first dataset(ds1) 3 obs are there but second dataset(ds2) 2 obs are
Run; there, will give in output 2 observations only from ds2.

Data ds3; In this example second dataset (ds1) variables (a b c) are same with
Set ds2; dataset ds2 so a b c variable values come from second dataset. in
Set ds1; first dataset(ds2) 2 obs are there but second dataset(ds1) 3 obs are
Run; there so it will give in output 2 observations only

Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 12Jan2011
102 asp-10mg 12Jan2011
103 asp-15mg 12Jan2011
104 asp-20mg 15Jan2011
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;
Both datasets contains different variables and different number of observations
If first dataset contains more observations and both datasets contains different variables
second dataset overwrite on first dataset values and unmatched variables also comes in
output dataset
But second dataset contains more observations and both datasets different variables it
will read data from second dataset only lowest number of observations come to output
dataset from second dataset and unmatched variables also comes into output dataset
Data ds1;
Infile Datalines;
Input a b c ;
Datalines;
123
456
789
;
Run;
Data ds2;
Infile Datalines;
Input a b c d;
Datalines;
3456
6789
;
Run;
Data ds3;
Set ds1; In this example all the data comes from second dataset
Set ds2;
Run;

Data ds3; In this example 2 records comes from second dataset


Set ds2; ds1and d variable comes from first dataset ds2. And
Set ds1; 2obs should be there in output
Run;

Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date Sex$ ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 m
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 f
102 asp-10mg 12Jan2011 f
103 asp-15mg 12Jan2011 m
104 asp-20mg 15Jan2011 m
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;

Interleaving:

Use SET statement and BY statement to combine multiple Datasets into single Dataset.

The number of observations in new Dataset is equal to the sum of the number of
observations from original Datasets.

The observations in new Dataset are arranged the values of the BY variables.
We can interleave Datasets using BY variable or using Index.

Note: To perform interleave both Datasets variables should be same, same Data
types, same length and should be sorting order.

Syntax:-
Set Dataset(s);
By variable(S);
Examples:-
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 02Jan2011
102 asp-10mg 02Jan2011
101 asp-05mg 10Jan2011
102 asp-10mg 10Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 02Jan2011
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 30Jan2011
103 asp-15mg 10Jan2011
102 asp-10mg 20Jan2011
104 asp-05mg 02Jan2011
;
Run;
Data ds3;
Set ds1 ds2;
By p_id;
Run;
ERROR: BY variables are not properly sorted on Data set WORK.DS1.
As per syntax rules both Dataset should be in sorting order

ProcsortData=ds1;
By p_id;
Run;

ProcsortData=ds2;
By p_id;
Run;

Data ds3;
Set ds1 ds2;
By p_id;
Run;

When we use a BY statement along withSETstatement (performs interleaving) in Data


step SAS create two automatic variables along with raw Data variables those are
First.byVariable and Last.byVariable

First.Variable:-
Value is 1 for the first observation in the by group and value 0 for all other
observations in the by group
Last.Variable:-
Value is one for the last observation in the by group and value 0 for all other
observations in the by group

P_id Drug_name Visit_date First.P_id Last.P_id


101 asp-05mg 02JAN2011 1 0
101 asp-05mg 10JAN2011 0 0
101 asp-05mg 20JAN2011 0 0
101 asp-05mg 30JAN2011 0 1
102 asp-10mg 02JAN2011 1 0
102 asp-10mg 10JAN2011 0 0
102 asp-10mg 20JAN2011 0 1
103 asp-15mg 02JAN2011 1 0
103 asp-15mg 10JAN2011 0 1
104 asp-05mg 02JAN2011 1 1

Using First.Variable and Last.Variable we can perform

To report each subject first visiting Information


Data ds3;
Set ds1 ds2;
By p_id;
If first.p_id=1;
Run;

To report each subject last visiting Information


Data ds3;
Set ds1 ds2;
By p_id;
Iflast.p_id=1;
Run;
To report each subject visiting Information except first
and last
Data ds3;
Set ds1 ds2;
By p_id;
If first.p_id=0 and last.pid=0;
Run;

To report who ever visited only once


Data ds3;
Set ds1 ds2;
By p_id;
If first.p_id=1 and last.pid=1;
Run;

Banking Example

Data ds1;
Infile Datalines;
Input Loan_no$ 1-5 Customer $7-15 Loan_amt Loan_date ;
Informat Loan_date date9. Loan_amt dollar5. ;
Format Loan_date date9. Loan_amt dollar5. ;
Datalines;
LP101 RaviSinha 3000 02Jan2011
LP102 AlanNance 2500 02Jan2011
LP101 RaviSinha 5000 10Jan2011
LP102 AlanNance 1500 10Jan2011
LP101 RaviSinha 4500 20Jan2011
LP103 JimBrown 4500 02Jan2011
;
Run;
Procsortdata=ds1;
By Loan_no;
Run;
Data ds2;
Infile Datalines;
Input Loan_no$ 1-5 Customer $7-15 Loan_amt Loan_date ;
Informat Loan_date date9. Loan_amt dollar5. ;
Format Loan_date date9. Loan_amt dollar5. ;
Datalines;
LP101 RaviSinha $3000 30Jan2011
LP103 JimBrown $2500 10Jan2011
LP102 AlanNance $5000 20Jan2011
LP104 AshleyMcK $1500 01Jan2011
;
Run;
Procsortdata=ds2;
By Loan_no;
Run;

Data ds3;
Set ds1 ds2;
By Loan_no;
Run;

To report each subject first visiting Information


Data ds3a;
Set ds1 ds2;
By Loan_no;
If First.Loan_no=1;
Run;
To report each subject last visiting Information
Data ds3b;
Set ds1 ds2;
By Loan_no;
If Last.Loan_no=1;
Run;
To report each subject visiting Information except first
and last
Data ds3c;
Set ds1 ds2;
By Loan_no;
If First.Loan_no=0 and Last.Loan_no=0;
Run;
To report who ever visited only once
Data ds3d;
Set ds1 ds2;
By Loan_no;
If First.Loan_no=1 and Last.Loan_no=1;
Run;

MERGE

Joins/Combines observations from two or more SAS Datasets into single


observation in new Dataset.
->Merge usually joins Datasets with different variables.
-> Output Dataset contains all the variables from all Datasets.
-> Upto 100 Datasets can merge in one step.
-> Observations are joining one to one with out BY statement
-> Observations are Match Merge with BY statement.
We can Merge the data in Two ways
1) Merge in DATASTEP
2) Merge with SQL

1) Merge in DATASTEP

Merge with out By statement


Combines observations from two or more SAS Datasets into a single
observation in a new dataset using the MERGE statement.
Combines first observations from all Datasets into the first observation in new Dataset.
The second observations from all Datasets into the second observation in new Dataset
etc...
The number of observations in the new Dataset is equal to the maximum number of
observations from original Datasets.
Syntax:-
MergeDataset(s);
Examples:-
Same no of Observations from both Datasets
Data ds1;
InfileDatalines;
Input id name$ sex$ address$;
Datalines;
001 abc m bang
002 def m hyd
003 jkl f che
004 mno f bang
005 xyz m mum
;
Run;
Data ds2;
InfileDatalines;
Input dob date9. doj:date9. sal ;
Format dob date9. doj date9.;
Datalines;
01Feb1983 12Jan2011 45000
23Mar1983 20Jan2011 50000
12Oct1983 13Feb2011 34000
02Jan1984 19May2011 28000
28Apr1985 11Jun2011 29000
;
Run;

Data ds3;
Merge ds1 ds2;
Run;

Different no of observations from both Datasets

Data ds1;
InfileDatalines;
Input id name$ sex$ address$;
Datalines;
001 abc m bang
002 def m hyd
003 jkl f che
004 mno f bang
005 xyz m mum
006 asd f hyd
;
Run;

Data ds2;
InfileDatalines;
Input dob date9. doj:date9. sal;
Format dob date9. doj date9.;
Datalines;
01Feb1983 12Jan2011 45000
23Mar1983 20Jan2011 50000
12Oct1983 13Feb2011 34000
02Jan1984 19May2011 28000
28Apr1985 11Jun2011 29000
;
Run;
Data ds3;
Merge ds1 ds2;
Run;

Match Merge (With By Statement):


Combines observations from two or more SAS Datasets into a single
observation in a Dataset according to values of the common variables from both
Datasets.
The number of observations in the new dataset is equal to the largest number of
observations in each BY group from original Datasets.
Before performing match Merge all Datasets must be sorted based on common variables.
To perform match Merge we can use the MERGE statement with BY statement.
In the SAS match-merge, the matching process iscontrolled by the BY variables. BY
variables are thevariables listed in the BY statement.
BY variables should be key variables. Key variablesare either character or numeric
variables that uniquely identify or label the records or observations within theinput
data sets.

Syntax:-
MergeDataset(s);
By variable(s);
Examples:-
Data ds1;
InfileDatalines;
Input id name$ sex$ address$;
Datalines;
001 abc m bang
002 def m hyd
003 jkl f che
004 mno f bang
005 xyz m mum
006 asd f hyd
;
Run;
ProcsortData=ds1;
By id;
Run;
Data ds2;
InfileDatalines;
Input id dob date9. doj:date9. sal ;
Format dob date9. doj date9.;
Datalines;
001 01Feb1983 12Jan2011 45000
004 23Mar1983 20Jan2011 50000
005 12Oct1983 13Feb2011 34000
002 02Jan1984 19May2011 28000
003 28Apr1985 11Jun2011 29000
;
Run;
ProcsortData=ds2;
By id;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;
In above example both ds1 and ds2 datasets contains common variable id, based on
common variable need to sort both datasets and can perform match merge.
Data demo;
Infile datalines;
Input name$ 1-25 age 27-28 sex$30;
Datalines;
Vincent, Martina 34 F
Phillipon, Marie-Odile 28 F
Gunter, Thomas 27 M
Harbinger, Nicholas 36 M
Benito, Gisela 32 F
Rudelich, Herbert 39 M
Sirignano, Emily 12 F
Morrison, Michael 32 M
;
Run;
Procsortdata=demo;
By name;
Run;
Data finance;
Infile datalines;
Input ssn$ 1-11 name$ 13-40 salary;
Datalines;
074-53-9892 Vincent, Martina 35000
776-84-5391 Phillipon, Marie-Odile 29750
929-75-0218 Gunter, Thomas 27500
446-93-2122 Harbinger, Nicholas 33900
228-88-9649 Benito, Gisela 28000
029-46-9261 Rudelich, Herbert 35000
442-21-8075 Sirignano, Emily 5000
;
Run;
Procsortdata=finance;
By name;
Run;
Data new;
Merge demo (drop=age) finance;
By name;
Run;
Techniques, Tricks, and Traps in Match Merge
A common mistake in Match merge is forget to include the BY statement.

Data ds1;
Infile datalines;
Input id$ name$;
Datalines;
A01 SUE
A02 TOM
A05 KAY
A10 JIM
;
Run;

Data ds2;
Infile datalines;
Input id$ age sex$;
Datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Data ds3;
Merge ds1 ds2;
Run;

Even with this simple example, there is already a hint of problems. Observe that the
records A05 (3rd record from ds1) is not there in ds2. So A04(3rd record from ds2) is
merging with A05record.Notice that the A05 ID is lost in this merge and thename Kay is
moved from ID=A05 to ID=A04, and onedoes not even get a note or error I log to say
that something is wrong. This kind of merge we getting wrong output so before
performing match merge both datasets should be in sorting order and need to use that
BY statement with Merge statement like below

Data ds1;
Infile datalines;
Input id$ name$;
Datalines;
A01 SUE
A02 TOM
A05 KAY
A10 JIM
;
Run;
Procsortdata=ds1;
By id;
Run;

Data ds2;
Infile datalines;
Input id$ age sex$;
Datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Procsortdata=ds2;
By id;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;

Both by variables have different length


Data ds1;
Infile datalines;
Input id $1-3 x 6-7;
Datalines;
A22 12
A38 88
A51 33
;
Run;
Data ds2;
Infile datalines;
Input ID $1-4 y 6-7;
Datalines;
A22 72
A38 31
A41 11
A511 58
;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;
In above example, the variable, ID, has a LENGTH=3 in the first dataset and a
LENGTH=4 in the second dataset. Atcompile time, the program data vector, for the
output file,the attributes of each variable is determined by the firstinput data set where
they appear. Thus, in this case, the after first dataset in the merge statement is
scanned, the data vector is (ID $3, x). Then, the second dataset is scannedthe data
vector is (ID $4, y). And new variables added to the vector, so that the final output data
is (ID$3 X Y). Since the ID has aLENGTH=3 in the data vector, the valueofID=511 in
thesecond file is clipped to A51 and matched with the recordA51 from the first file. This
is an example of how, whenthe LENGTHs are different, one can get undesired results.

Data ds3a;
Merge ds2 ds1;
By id;
Run;

In above example, shows how reversing the order of thedata sets in the merge
statement can sometimes change thevalues and records in the output file. In this case
merging is happening correctly because second dataset is scanning first in pdv so length
For id are 4 for both datasets in output so we can get proper results.

Types of Match Merge

1) zero-to-one
2) one-to-zero
3) one-to-one
4) one-to-many
5) many-to-one
6) few-to-many
7) many-to-few
8) many-to-many

Zero-to-One/ One-to-Zero/ One-to-One Match Merge


Data ds1;
Infile datalines;
Input id$ name$;
Datalines;
A01 SUE
A02 TOM
A05 KAY
A10 JIM
;
Run;
Procsortdata=ds1;
By id;
Run;
Data ds2;
Infile datalines;
Input id$ age sex$;
Datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Procsortdata=ds2;
By id;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;

In above example performs one-to-one, zero-to-one, and one-to-zeroMatch-merge


In ds1 dataset A01 id is merging with A01 from ds2
A02 id is merging with A02 from ds2
A10 id is merging with A10 from ds2 dataset that is One-to-One Merge
A05 from ds1 is not merging with ds2 dataset that is One-to-Zero Merge
A04 is not there from ds1 but its there from ds2 dataset so it's Zero-to-One Merge

Many-to-Many Match Merge


The many-to-many type of match-merge occurs when,for a given BY group, there are
the same number ofrecords in all the input data sets.
Data ds1;
Infile datalines;
Input id$ x;
Datalines;
A25 24
A25 22
A25 76
;
Run;
Data ds2;
Infile datalines;
Input id$ y;
Datalines;
A25 24
A25 22
A25 76
;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;

The many-to-many match-merge is essentially a one-to-one Merge (Merge with out by)
and has the samedrawbacks and dangers. Specifically, one has very littlecontrol over the
actual order of the records within the BYgroup for each of the input data sets.
For example, how does one know that the first value of x=24 is supposed to be matched
with the first value of y=4 Why shouldnt x=24 be matched with y=91 (the second value
of y)? If great care is not taken, a many-to-many merge can result in random matching
of variable values.
This Many-to-Many merge is dangerous and unreliable sometimes so program has to
take care and he has to choose some additional by variables to merge properly

Few-to-Many or Many-to-few Match Merge


The few-to-many type of match-merge occurs when for a given BY group, there is more
than one record in the first input data set, and the second input data set has more
records than the first.
Data ds1;
Infile datalines;
Input id$ x;
Datalines;
A92 70
A92 46
;
Run;
Data ds2;
Infile datalines;
Input id$ Y;
Datalines;
A92 14
A92 72
A92 7
;
Run;
Data ds3a;
Merge ds1 ds2;
By id;
Run;
Data ds3b;
Merge ds2 ds1;
By id;
Run;
Few-to-Many or Many-to-Few merge also dangerous like Many-to-Many Merge
To overcome those problems programmer has to choose correct by variables.
One-to-Many/Many-to-OneMatch Merge
The simplest and most useful merge after the one-to-onematch-merge isthe
One-to-many match-merge.
Data ds1;
Infile datalines;
Input id$ x;
Datalines;
A32 5
A35 3
;Run;
Data ds2;
Infile datalines;
Input id$ Y;
Datalines;
A32 15
A32 22
A32 61
;
Run;
Data ds3a;
Merge ds1 ds2;
By id;
Run;
Data ds3b;
Merge ds2 ds1;
By id;
Run;

In above example there are two BY groups. The first outputrecord is the same as in a
one-to-one match-merge. Butfor the second record in the ds2 dataset there is
nocorresponding ds1 record, so SAS retains the x value from the first ds1 record and
passes it to thesecond output record.

JOINS :

1. LEFT JOIN
2. RIGHT JOIN
3. INNER JOIN
4. FULL JOIN.

Left Join in SAS

When we perform left outer join all the Data comes from left table and matching Data
comes from right table into output Dataset.
Data patdata;
InfileDatalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Input p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a;*/
If a thenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*,b.* from patdata a leftouterjoin adverse b on a.p_id=b.p_id;
Quit;

Right Join in SAS

When we perform right outer join all the Data comes from right table and matching Data
comes from left table into output Dataset.

Data patdata;
InfileDatalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Input p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If b;*/
If bthenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selectb.*,a.* from patdata a rightouterjoin adverse b on b.p_id=a.p_id;
Quit;

Inner Join in SAS

When we perform inner join all matching Data comes from both Datasets into output
Dataset.

Data patdata;
Infile Datalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
Infile Datalines;
Infile p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;

ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a and b;*/
If a and b thenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*, b.* from patdata a, adverse b where a.p_id=b.p_id;
Quit;
Procsql;
Createtable pat_adverse as
Selecta.*, b.* from patdata a innerjoin adverse b on a.p_id=b.p_id;
Quit;

Full Join in SAS

When we perform full outer join all the Data comes from all the tablesinto output
Dataset.

Data patdata;
InfileDatalines;
Infile p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Infile p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a or b;*/
If a or bthenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*,b.* from patdata a fullouterjoin adverse b on a.p_id=b.p_id;
Quit;

UPDATE

To apply the changes in one Dataset with Information of another Dataset.


To perform update we need two Datasets
1) Master Dataset: Contains original Information.
2) Transaction Dataset: Contains changing Information.
Before performing update both Datasets should be in sorting order
Examples:-
Data ds1;
InfileDatalines;
Input id 1-3 name $5-7 sex $8-9 sal: 11-14;
Datalines;
001 abc m 5000
003 jkl f 7000
002 def m 6000
005 xyz m 5000
004 mno f 6000
006 asd f 8000
;
Run;
ProcsortData=ds1;
By id;
Run;
Data ds2;
InfileDatalines;
Input id 1-3 name $5-7 sex $8-9 sal : 11-14;
Datalines;
001 7500
006 9000
002 8000
005 8000
004 7900
003 6500
;
Run;
ProcsortData=ds2;
By id;
Run;
Data ds3;
Update ds1 ds2;
By id;
Run;
If transaction Dataset having any missing values then output Dataset contains
existing value.
Data ds1;
InfileDatalines;
Input id 1-3 name $5-7 sex $8-9 sal: 11-14;
Datalines;
001 abc m 5000
003 jkl f 7000
002 def m 6000
005 xyz m 5000
004 mno f 6000
006 asd f 8000
;
Run;
ProcsortData=ds1;
By id;
Run;
Data ds2;
InfileDatalines;
Input id 1-3 name $5-7 sex $8-9 sal : 11-14;
Datalines;
001 7500
006 9000
002 8000
005 .
004 7900
003 .
;
Run;
ProcsortData=ds2;
By id;
Run;
Data ds3;
Update ds1 ds2;
By id;
Run;
Data master;
Infile datalines;
Input id 1-8 name $ 9-27 street $ 28-47 city $ 48-62 state $ 63-64 zip $ 67-71;
Datalines;
1001 Ericson, Jane 111 Clancey Court Chapel Hill NC 27514
1002 Dix, Martin 4 Shepherd St. Norwich VT 05055
1003 Gabrielli, Theresa 24 Ridgetop Rd. Westboro MA 01581
1004 Clayton, Aria 14 Bridge St. San Francisco CA 94124
1005 Archuleta, Ruby Box 108 Milagro NM 87429
1006 Misiewicz, Jeremy 43-C Lakeview Apts. Madison WI 53704
1007 Ahmadi, Hafez 5203 Marston Way Boulder CO 80302
1008 Jacobson, Becky 1 Lincoln St. Tallahassee FL 32312
1009 An, Ing 95 Willow Dr. Charlotte NC 28211
1010 Slater, Emily 1009 Cherry St. York PA 17407
;
Run;
Data Trans;
Infile datalines;
Input id 1-8 name $ 9-27 street $ 28-47 city $ 48-62 state $ 63-64 zip $ 67-71;
Datalines;
1002 Dix-Rosen, Martin
1001 27516
1006 932 Webster St.
1009 2540 Pleasant St. Raleigh 27622
1011 Mitchell, Wayne 28 Morningside Dr. New York NY 10017
1002 R.R. 2, Box 1850 Hanover NH 03755
1012 Stavros, Gloria 212 Northampton Rd. South Hadley MA 01075
;
Run;
Procsortdata=trans;
By id;
Run;
Data newlist;
Update master trans;
By id;
Run;

FUNCTIONS

-> Character Functions


-> Numeric Functions
-> Date Functions

Character Functions

/* ------------- Functions that change the case of characters--------------*/

Data ds;
Infile datalines;
Input year 1-4 pres $ 6-29 vicepres $ 31-55 result $ 60-64 ;
Datalines;
1920 James M. Cox Franklin D. Roosevelt lost
1924 John W. Davis Charles W. Bryan lost
1928 Alfred E. Smith Joseph T. Robinson lost
1932 Franklin D. Roosevelt John N. Garner won
1936 Franklin D. Roosevelt John N. Garner won
1940 Franklin D. Roosevelt Henry A. Wallace won
1944 Franklin D. Roosevelt Harry S. Truman won
1948 Harry S. Truman Alben W. Barkley won
1952 Adlai E. Stevenson John J. Sparkman lost
1956 Adlai E. Stevenson Estes Kefauver lost
1960 John F. Kennedy Lyndon B. Johnson won
1964 Lyndon B. Johnson Hubert H. Humphrey won
1968 Hubert H. Humphrey Edmund S. Muskie lost
1972 George S. McGovern R. Sargent Shriver Jr. lost
1976 Jimmy Carter Walter F. Mondale won
1980 Jimmy Carter Walter F. Mondale lost
1984 Walter F. Mondale Geraldine Ferraro lost
;
Run;

UPCASE
Converts all letters in an argument to uppercase
Syntax:-Upcase(string)
Example:-
Data ds1;
Set ds;
President=Upcase(pres);
Run;

/*creating same column for same order like base dataset*/


Data ds1a;
Set ds;
Pres=Upcase(pres);
Run;

/*creating different column */


Data ds2(drop=pres vicepres);
Set ds;
President=upcase(pres);
Vicepresident=upcase(vicepres);
Run;

LOWCASE
Converts all letters in an argument to lowercase
Syntax:-Lowcase(string)
Example:-
Data ds3;
Set ds2;
President=lowcase(president);
Vicepresident=lowcase(vicepresident);
Run;
PROPCASE
Converts all words in an argument to proper case (like I Am Krishna)
Syntax:-Propcase(string)
Example:-
Data ds4;
Set ds2;
President=propcase(president);
Vicepresident=lowcase(vicepresident);
Run;
Data allcase;
a=lowcase('THIS IS A DOG');
b=propcase(a);
c=propcase(lowcase('THIS IS A DOG'));
d=upcase('this is a dog');
Put a=;
Put b=;
Put c=;
Put d=;
Run;

/* ------------------------Functions that extract part of strings---------------------*/

SCAN
Selects a given word from a character expression
Selects particular word from character string
Syntax:-SCAN(string ,n<, delimiter(s)>)
Examples:-
Data scn1;
Set ds;
President=scan(pres,2);
Run;
Data scn2;
Set ds;
President=scan(pres,3);
Run;

Data scn3;
Set ds;
President=scan(pres,-3);
Run;

Data scn4;
Set ds;
President=scan(pres,-1);
Run;

SUBSTR
Takes substrings of matrix elements
Selects particular part from character string
Syntax:-SUBSTR( matrix, position<, length>)
Examples:-
Data sbstr1;
Set ds;
President=substr(pres,1,5);
Run;
Data sbstr2;
a='Radhakrishna Reddy';
b=substr(a,6,7);
Run;

Data sbstr2;
a='Radhakrishna Reddy';
Substr(a,1,5)='Rama';
Run;

/* ------------Functions that join two or more strings together strings-----------------*/

CAT
Concatenates character strings without removing leading or trailing blanks
Syntax:-CAT(string-1<, ... string-n>)
Data cat1;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=cat(a,b,c,d);
Put result $char.;
Run;
CATT
Concatenates character strings and removes trailing blanks
Syntax:-CATT(string-1<, ...string-n>)
Data cat2;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catt(a,b,c,d);
Put result $char.;
Run;
CATS
Concatenates character strings and removes leading and trailing blanks
Syntax:-CATS(string-1<, ...string-n>)
Data cat3;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catt(a,b,c,d);
Put result $char.;
Run;

CATX
Concatenates character strings, removes leading and trailing blanks, and inserts
separators
Syntax:-CATX(separator, string-1<, ...string-n>)
Data cat4;
Separator='*';
a='The Olympic';
b='Arts Festival';
c='includes works by';
d='Dale Chihuly.';
Result=catx(separator,a,b,c,d);
Put result $char.;
Run;
Data cat5;
Separator='%%$%%';
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catx(separator,a,b,c,d);
Put result $char.;
Run;

/* ------------------------Functions that remove blanks from string-----------------------*/


LEFT
Left aligns a SAS character expression
Syntax:- LEFT(string)
Data remblank1;
a=' My Name Is Ram';
b=left(a);
Run;

RIGHT
Right aligns a character expression
Syntax:- RIGHT(string);
Data remblank2;
a='My Name Is Ram ';
b=right(a);
Run;

STRIP
Returns a character string with all leading and trailing blanks removed
Syntax:- STRIP(string)
Data remblank3;
Infile datalines;
Input string $char8.;
original = '*' || string || '*';
stripped = '*' || strip(string) || '*';
Datalines;
abcd
abcd
abcd
abcdefgh
xyz
;
Run;

TRIM
Removes trailing blanks from character expressions and returns one blank if the
expression is missing
Syntax:- TRIM(string)
Data remblank4;
Input part1 $ 1-10 part2 $ 11-20;
hasblank=part1||part2;
noblank=trim(part1)||part2;
Put hasblank;
Put noblank;
Datalines;
apple sauce
;
Run;
Data remblank5;
Input part1$ part2$ ;
hasblank=part1||part2;
noblank=trim(part1)||part2;
Put hasblank;
Put noblank;
Datalines;
apple sauce
;
Run;

Data remblank6;
x=" ";
y=">"||trim(x)||"<";
Put y;
Run;

TRIMN
Removes trailing blanks from character expressions and returns a null string (zero
blanks) if the expression is missing
Syntax:- TRIMN(string)
Data remblank6a;
x=" ";
z=">"||trimn(x)||"<";
put z;
Run;

COMPRESS
Removes specific characters(SPACES) from a character string
Syntax:- COMPRESS(<source><, chars><, modifiers>)
Data remblank7;
a='AB C D';
b=compress(a);
Run;
Data remblank8;
x='1 2 3 4 5';
y=compress(x);
Put y;
Run;
COMPBL
Removes multiple blanks from a character string.
Syntax:-Compbl(source)
Data remblank9;
x='my name is ram';
y=compbl(x);
Run;
Data remblank9a;
x='My ';
y=' Name ';
z=' is Ram';
a=x||y||z;
b=compbl(a);
Run;
Data ds1;
Infile datalines;
Input id$ fname$ lname$ sal;
Datalines;
001 mohan arisela 60000
002 padma narni 45000
003 varma maddina 50000
;
Run;
Data remblank10;
Set ds1;
Name1=fname||lname;
Name2=cat(fname,lname);
Name2a=cat(trim(fname),lname);
Name3=compbl(fname||lname);
Run;

/* ---------------Functions that substitute letters or words in string------------------------*/

TRANSLATE
Replaces specific characters in a character expression
Data trns1;
x=translate('XYZW','AB','VW');
Put x;
Run;

Data trns2;
x=translate('abc','sh', 'cg');
Put x;
Run;
TRANWRD
Replaces or removes all occurrences of a word in a character string
Syntax:-TRANWRD(source,target,replacement)
Data trnw1;
name='Mrs.Radhakrishna Reddy';
name1=tranwrd(name, "Mrs.", "Mr.");
put name name1;
run;
Data trnw2;
Infile datalines;
Input salelist $;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,target,replacement);
Datalines;
CATFISH
;
Run;
Data trnw2a;
Infile datalines;
Input salelist $;
length target $10 replacement $3;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,target,replacement);
Datalines;
CATFISH
;
Run;
The LENGTH statement left-aligns TARGET and pads it with blanks to the length of 10.
This causes the TRANWRD function to search for the character string 'FISH ' in SALELIST
Because the search fails, this line is written to the SAS log: CATFISH
You can use the TRIM function to exclude trailing blanks from a target or replacement
variable. Use the TRIM function with TARGET
Data trnw2b;
Infile datalines;
Input salelist $;length target $10 replacement $3;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,trim(target),replacement);
Datalines;
CATFISH
;
Run;

/* --------------------Functions that searches for characters-------------------------*/

INDEX
Searches a character expression for a string of characters
Syntax:-INDEX(source,excerpt)
Data ind1;
a='ABC.DEF (X=Y)';
b='D';
x=index(a,b);
Put x;
Run;
Data ind2;
a='ABC.DEF (X=Y)';
b='X=Y';
x=index(a,b);
Put x;
Run;
Dataind3;
Infile datalines;
input name $ 1-12 age;
Datalines;
Harvey Smith 30
John West 35
Jim Cann 41
James Harvey 32
Harvy Adams 33
;
Run;
Now, let's use the index function to find the cases with "Harvey" in the name
Data ind3a;
Set ind3;
x = index(name, "Harvey");
Run;
INDEXC
Searches a character expression for special characters, and returns the position of the
characters
Syntax:-INDEXC(source,excerpt-1<,... excerpt-n>)
Data indc1;
a='ABC.DEP (X2=Y1)';
x=indexc(a,'.');
Run;
Data indc2;
a='ABC.DEP (X2=Y1)';
b='=';
x=indexc(a,b);
Run;

INDEXW
Searches a character expression for a specified string as a word
Syntax:-INDEXW(source, excerpt<,delimiter>)
Data indw1;
s='asdf adog dog';
p='dog ';
x=indexw(s,p);
Run;
Data indw2;
s='abcdef x=y';
p='def';
x=indexw(s,p);
Run;

/* ----------------------------------- Other Functions --------------------------------*/

LENGTH
Returns length of string
Syntax:-LENGTH(string)
Data len;
a='Mr.Krishna';
b=length(a);
Run;
REVERSE
Returns string in reverse order
Syntax:-REVERSE(string)
Data rev;
a='Mr.Krishna';
b=reverse(a);
Run;
QUOTE
Ads double quotes to character values
Syntax:-QUOTE(string)
Data quot1;
a='Mr.Krishna';
b=quote(a);
Run;
DEQUOTE
Removes double quotes to character values
Syntax:-DEQUOTE(string)
Data quot2;
Set quot1;
c=dequote(a);
Run;
Data quot3;
Infile datalines;
Input id name$ sal;
Datalines;
001 abc 5000
002 def 6000
003 xyz 7000
;
run;
Data quot3a;
Set quot3;
name1=quote(name);
name2=quote(trim(name));
name3=dequote(name2);
Run;

RANK
Returns the position of a character in the ASCII or EBCDIC collating sequence.
Syntax:-RANK(x)
The RANK function returns an integer that represents the position of the first character in
the character expression. The result depends on your operating environment.
Data rnk1;
Infile datalines;
Input id name$ sal;
Rank_var=RANK(name);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;
Datarnk2 ;
a=Rank('A');
b=Rank('krishna'); /* It gives position of first character only*/
Run;

REPEAT
Returns a character value that consists of the first argument repeated n+1 times.
Syntax:- Repeat(Argument,n)
Data rep;
Infile datalines;
Input id name$ sal;
x=repeat(name,10);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;

SOUNDEX
Encodes a string to facilitate searching.
Encodes a string and gives same result for same pronunciation strings in variable
Syntax:- SOUNDEX(Argument)
Data snd;
Infile datalines;
Input id name$ sal;
y=soundex(name);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;

COLLATE
Returns a character string in ASCII or EBCDIC collating sequence.
Syntax:- (start-position<,end-position>) | (start-position<,,length>)
Data col1;
x=collate(45,99);
put @1 x ;
Run;
Data col2;
x=collate(1,,49);
put @1 x ;
Run;
ASCII Result
Data col3;
x=collate(48,,10);/*start-position<,,length*/
y=collate(48,57);/*start-position<,end-position */
put @1 x @14 y;
Run;

EBCIDIC Result
Data col4;
x=collate(240,,10); /*start-position<,,length*/
y=collate(240,249); /*start-position<,end-position */
put @1 x @14 y;
Run;

The maximum end-position for the EBCDIC collating sequence is 255.


ASCII collating sequences, end-position values between 0 and 127

Numeric Functions

MEAN
Returns the arithmetic mean (average)
Argument is numeric At least one non-missing argument is required otherwise, the
function returns a missing value
Syntax: -MEAN(argument<,argument,...>)
Data ds1;
x1=mean(2,.,.,6);
x2=mean(2,4,5,6);
x3=mean(x1-x2); /*x3=mean(4-4.25)=-0.25/1=-0.25*/
x4=mean(of x1-x2); /*it means x1, x2 means 4,4.25 means 8.25/2=4.125*/
x5=mean(x1,x2);
Run;

MEDIAN
Computes median values Category: Descriptive Statistics
Syntax: -MEDIAN(value1<, value2, ...>)
Data ds2;
x=median(2,4,1,3);
y=median(5,8,0,3,4);
z=median(5,.,0,.,4);
Run;
Difference between MEAN & MEDIAN
Mean will give average of numeric values
Ex:- x=mean(70,60,80,75,90)
it gives
x=70+60+80+75+90/5
x=375/5=75

In MEDIAN data will arrange from lowest to highest


in that data middle no is MEDIAN value
it means
60,70,75,80,90
75 is mid value which is median value

Ex:-
x=median(2,4,1,3);
in above example mid value is 4,1
it means 4+1=5
median value is 5/2=2.5;

MIN
Returns the smallest value
Syntax: -MIN(argument,argument,...)
Data ds3;
x1=min(7,4);
x2=min(2,.,6);
x3=min(2,-3,1,-1);
x4=min(0,4);
x6=min(of x1-x3);
x7=min(x1,x3);
Run;

MAX
Returns the largest value
Syntax:-MAX(argument,argument,...)
Data ds4;
x=max(8,3);
x1=max(2,6,.);
x2=max(2.-3,1,-1);
x3=max(3,.,-3);
x4=max(.,.,.);
x5=max(of x1-x3);
Run;
Argument
is numeric. At least two arguments are required. The argument list may consist of a
variable list, which is preceded by OF.
The MAX function returns a missing value (.) only if all arguments are missing.

RANGE
Returns the range of values
Syntax:- RANGE(argument,argument,...)

argument
is numeric At least one nonmissing argument is required. Otherwise, the function returns
a missing value. The argument list can consist of a variable list, which is preceded by OF.
The RANGE function returns the difference between the largest and the smallest of the
nonmissing arguments.
Data ds5;
x1=range(.,.);
x2=range(-2,6,3);
x3=range(2,6,3,.);
x4=range(1,6,3,1);
x5=range(of x1-x3);
run;

SUM
/*SUM Function*/
Returns the sum of the nonmissing arguments
Syntax:-SUM(argument,argument, ...)
argument
is numeric If all the arguments have missing values, the result is a missing value
The argument list can consist of a variable list, which is preceded by OF
Data ds6a;
x1=sum(4,9,3,8);
x2=sum(4,9,3,8,.);
x3=sum(of x1-x2);
Run;
Data ds6b;
x1=5;
x2=6;
x3=4;
x4=9;
y1=34;
y2=12;
y3=74;
y4=39;
result=sum(of x1-x4, of y1-y5);
Run;
Data ds6c;
x1=55;
x2=35;
x3=6;
x4=sum(of x1-x3, 5);
Run;
Data ds6d;
x1=7;
x2=7;
x5=sum(x1-x2);
Run;
Data ds6e;
y1=20;
y2=30;
x6=sum(of y:);
Run;

/*Sum Statement*/

Adds the result of an expression to an accumulator variable


Syntax:-variable+expression;

Data ds6;
x1=sum(4+9+3+8);
x2=sum(4+.+9+3+8+.);
Run;
SUM Function returns the sum of non missing values
ex:- x2=sum(4,.,9,3,8,.);
it gives value 24
SUM Statement Adds the value into variable with non missing values
its wont consider missing values.
if missing value are there value is .
ex:- x2=sum(4+.+9+3+8+.);
it gives value .

CEIL
Returns the smallest integer that is greater than or equal to the argument, fuzzed to
avoid unexpected floating-point results
Syntax :-CEIL (argument)
Data ds7;
var1=2.1;
a=ceil(var1);
Run;
Data ds7;
b=ceil(-2.4);
Run;

FLOOR
Returns the largest integer that is less than or equal to the argument, fuzzed to avoid
unexpected floating-point results Category: Truncation
Syntax :-FLOOR (argument)
Data ds8;
var1=2.1;
a=floor(var1);
Run;
Data ds8;
b=floor(-2.4);
Run;

ABS
Returns the absolute value
Syntax :-ABS (argument)
Data ds9;
x1=abs(2.4);
x2=abs(-3);
Run;

INT
Returns the integer value, fuzzed to avoid unexpected floating-point results.
Syntax:-INT(argument)
Data ds10;
x1=INT(2.4);
x2=INT(2.5);
x3=INT(2.8);
X4=INT(-2.4);
Run;
MOD
Returns the remainder from the division of the first argument by the second argument,
fuzzed to avoid most unexpected floating-point results.
Syntax:-MOD (argument-1, argument-2)
Data ds11;
X1=MOD(10,3);
Run;
Data ds;
A=123456;
X=INT(A/1000);
Y=MOD(A,1000);
Z=MOD(INT(A/100),100);
Run;

ROUND
Rounds the first argument to the nearest multiple of the second argument, or to the
nearest integer when the second argument is omitted.
Syntax:-ROUND (argument <,rounding-unit>)

Data ds12;
x1=ROUND(2.4);
x2=ROUND(2.5);
x3=ROUND(2.8);
X4=ROUND(-2.4);
X4=ROUND(-2.5);
Run;

VAR
Returns the variance
Syntax:-VAR(argument,argument, ...)
argument
is numeric. At least two nonmissing arguments are required. Otherwise, the function
returns a missing value. The argument list can consist of a variable list, which is
preceded by OF.

Data ds13;
x1=Var(4,2,3.5,6);
x2=Var(4,6,.);
x3=Var(of x1-x2);
Run;

SQRT
Returns the square root of a value Category: Mathematical
Syntax :-SQRT(argument)
argument
is numeric and must be nonnegative

Data ds14;
x1=sqrt(36);
x2=sqrt(25);
x3=sqrt(4.4);
x4=sqrt(-49);
Run;

NMISS
Returns the number of missing values
Syntax :-NMISS(argument<,...argument-n>)
argument
is numeric. At least one argument is required. The argument list may consist of a
variable list, which is preceded by OF.

Data ds15;
x1=nmiss(1,0,.,2,5,.);
x2=nmiss(1,0);
x3=nmiss(of x1-x2); /*x1=2 x2=0 so 2,0 it gives 0*/
Run;
N
Returns the number of non missing values
Syntax:-NMISS(argument<,...argument-n>)
argument
is numeric. At least one argument is required. The argument list may consist of a
variable list, which is preceded by OF.

Data ds16;
X1=n(1,0,.,2,5,.);
X2=n(1,0);
X3=n(of x1-x2);
Run;

LAG
Returns values from a queue.
Syntax:-LAG<n>(argument)
Data lg1;
input x @@;
a=lag1(x);
b=lag2(x);
c=lag3(x);
d=lag(x);
datalines;
123456
;
Run;

Data lg2;
input x @@;
y=lag1(x+10);
z=lag2(x);
datalines;
123456
;
Run;

ANY DIGIT
Searches a character string for a digit and returns the first position at which it is found
Syntax:-ANYDIGIT(string <,start>)
DATA SEARCH_NUM;
INPUT STRING $60.;
dg = ANYDIGIT(STRING);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;

ANY SPACE
Searches a character string for space returns the first position at which it is found
Syntax:-ANYSPACE(string <,start>)
DATA SEARCH_SPACE;
INPUT STRING $60.;
sp= ANYSPACE(STRING);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;

How can you separate numeric values from alpha numeric value

DATA EN;
INPUT STRING $60.;
START = ANYDIGIT(STRING);
END = ANYSPACE(STRING,START);
IF START NE 0THEN
NUM = INPUT(SUBSTR(STRING,START,END-START),9.);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;

Data type Converting Functions

INPUT
Converts data values from character to numeric data type with help of Informat
Syntax:-Input(variable, informat);
Example:-
Data ds1;
Infile datalines;
Input id$ name$ sal;
Datalines;
001 abc 60000
002 def 45000
003 xyz 50000
;
Run;

Data cn/*(drop=id rename=(id1=id))*/;


Set ds1;
id1=input(id, best.);
Run;

PUT
Convertsdata values from numeric to character data type with help of Format
Syntax:-put(variable, format);
Example:-
Data ds2;
Infile datalines;
Input id name$ sal;
Datalines;
001 abc 60000
002 def 45000
003 xyz 50000
;
Run;
Data nc/*(drop=id rename=(id1=id))*/;
Set ds2;
id1=put(id, $8.);
Run;

Date Functions

How Dates Works in SAS

The SAS system stores Dates as the number of elapsed days


Since January 1, 1960

Ex:-January 03,1960 is stored as 2


January 02,1960 is stored as 1
January 01,1960 is stored as 0
December 31,1959 is stored as -1
December 30,1959 is stored as -2
December 31,1960 is stored as 365

The SAS system stores Time as the number of elapsed seconds since midnight
of that particular day.

And SAS system stores Datetime variables as the number of elapsed seconds
since midnight January 1, 1960 12:00 am

And SAS system stores Date variables as the number of days since midnight
January 1, 1960

Dates before January 01,1960 are negative integers, after January 01, 1960 are positive
integers

SAS Dates are valid from A.D. 1582 to A.D. 19,900.

How SAS Converts Calendar Dates to SAS Date Values

/*-------------------- Date, Time & Date time Functions---------------------*/

DATE
Returns the current date as a SAS date value
Returns todays date as as a SAS date value
Syntax: - DATE()
Data ds1;
date1=date();
Run;
Data ds1a;
date1=date();
Format date1 date9. ;
Run;

TODAY
Returns the current date as a SAS date value
Syntax:-TODAY()
Data ds2;
Day=today();
Format day date9.;
Run;

DATETIME
Returns the current date and time of day as a SAS datetime value
Syntax:-DATETIME()
Data ds3;
a=datetime();
Format a datetime20.;
Run;

TIME
Returns the current time of day
Syntax:-TIME()
SAS assigns current system time as a SAS time value corresponding to 15:32:00 if the
following statements are executed exactly at 3:32 PM:
Its gives 24 hour format

Data ds4;
Time=time();
Format time time. ;
Run;

DAY
Returns the day of the month from a SAS date value
Syntax:-DAY()
Data ds5;
a='29Jan2010'd;
Day=day(a);
Run;

Data ds5a;
a=date();
b= day(a);
Format a date9.;
Run;

WEEK
Returns the week-number value
Syntax:-WEEK (<SAS_Date>, <descriptor>)
Data ds6;
X=week('29Jan2010'd);
Y= week('10Feb2010'd);
Z= week('31Dec2010'd);
Run;

Data ds6a;
X=date();
Y=week(x);
Format x date9. ;
Run;

WEEKDAY
Returns the day of the week from a SAS date value
For example 17Oct1991 Returns 5 because 17Oct1991 was Thursday so its 5
Syntax:-WEEKDAY(date)
Data ds7;
week1=weekday('16Mar1997'd);
Run;

Data ds7a;
a=date();
week1=weekday(a);
Run;
MONTH
Returns the month from a SAS date value
Syntax:-MONTH (date)

Data ds8;
a='29Jan2010'd;
Mon=month(a);
Run;

Data ds8a;
a=today();
Mon=month(a);
Run;

QTR
Returns the quarter of the year from a SAS date value
Syntax:-QTR(date)
Data ds9;
a='29Jan2010'd;
Quarter=qtr(a);
Run;
Data ds9a;
a='15Nov2010'd;
b=today();
Quarter1=qtr(a);
Quarter2=qtr(b);
Run;

YEAR
Returns the year from a SAS date value
Gives four-digit numeric value that represents the year
Syntax:-YEAR(date)
Data ds10;
Date='25dec97'd;
y=year(date);
Run;

DHMS
Returns a SAS datetime value from date, hour, minute, and second
Syntax: -DHMS (date, hour, minute, second)

Data ds11;
a=dhms('15Nov2010'd,10,02,15);
Format a datetime. ;
Run;
Data ds11a;
a=dhms('15Nov2010'd,10,02,61);
b=dhms('15Nov2010'd,10,02,61);
Format a datetime. ;
Format b datetime20. ;
Run;
Data ds11b;
a=dhms('15Nov2010'd,10,.2,11);
Format a datetime.;
Run;

HMS
Returns a SAS time value from hour, minute, and second values
Syntax: -HMS (hour, minute, second)
Data ds12;
a=HMS(10,02,15);
Format a time.;
Run;
Data ds12;
a=HMS(10,02,15);
b=HMS(10,02,15);
c=HMS(10,02,15);
Format a time.;
Format b time5.;
Format c time8.;
Run;

HOUR
Returns the hour from a SAS time or datetime value
Syntax: - HOUR (<time | datetime>)
Data ds13;
a=hour('10:30't);
Run;
Data ds13a;
a='10:30:05't;
b=hour(a);
Format a time8. ;
Run;
MINUTE
Returns the minutes from a SAS time or datetime value
Syntax: - Minute (<time | datetime>)
Data ds14;
a='10:30:05't;
b=MINUTE(a);
Format a time5.;
Run;

SECOND
Returns the seconds from a SAS time or datetime value
Syntax: -Second (<time | datetime>)
Data ds14a;
a='10:30:05't;
b=second(a);
Format a time. ;
Run;

DATEJUL
Converts a Julian date to a SAS date value
Syntax: -DATEJUL(Julian-date)
Julian-date
Specifies a SAS numeric expression that represents a Julian date
A Julian date in SAS is a date in the form yyddd or yyyyddd,
Where yy or yyyy is a two-digit or four-digit integer that represents
the year and ddd is the number of the day of the year
The value of ddd must be between 1 and 365 (or 366 for a leap year).
10365,2010365
Data ds15;
a=Datejul(10001);
Format a date9.;
Run;
Data ds15a;
a=Datejul(10365);
Format a date9.;
Run;

JULDATE
Returns the Julian date from a SAS date value
Syntax: -JULDATE (date)
The JULDATE function converts a SAS date value to a five- or seven-digit Julian dateIf
date falls within the 100-year span defined by the system option YEARCUTOFF=, the
result has five digits:
The first two digits represent the year, and the next three digits represent the day of the
year (1 to 365, or 1 to 366 for leap years)
Otherwise, the result has seven digits: the first four digits represent the year, and the
next three digits represent the day of the year. For example, if YEARCUTOFF=1920,
JULDATE would return 97001 for January 1, 1997,
and return 1878365 for December 31, 1878.

Data ds16;
a=juldate('01Jan2010'd);
Run;
01001
Data ds16a;
a=date();
b=juldate(a);
Format a date9.;
Run;

MDY
Returns a SAS date value from month, day, and year values
Syntax: - MDY (month,day,year)
Month
Specifies a numeric expression that represents an integer from 1 through 12.

Day
Specifies a numeric expression that represents an integer from 1 through 31.

Year
Specifies a two-digit or four-digit integer that represents the year
The YEARCUTOFF= system option defines the year value for two-digit dates
Data ds17;
x_birthday=mdy(8,27,90);
y_birthday=mdy(05,30,2009);
Format x_birthday worddate20. ;
Format y_birthday weekdate30. ;
Run;

YYQ
Returns a SAS date value from the year and quarteryear
Year
Specifies a two-digit or four-digit integer that represents the year
The YEARCUTOFF= system option defines the year value for two-digit dates
Quarter
Specifies the quarter of the year (1, 2, 3, or 4)
Syntax: -YYQ(year,quarter)
Data ds18;
DateValue1=yyq(2001,3);
DateValue2=yyq(09,2);
Format DateValue1 date7.;
Format DateValue2 date7.;
Run;

TIMEPART
Extracts a time value from a SAS datetime value
Syntax: - TIMEPART (datetime)
Data ds19;
x=datetime();
y=timepart(x);
Format X datetime. Y time. ;
Run;

DATEPART
Extracts the date from a SAS datetime value
Syntax: -DATEPART(datetime)
Data ds20;
X=datetime();
Y=datepart(x);
Format x datetime. y ddmmyy10.;
Run;

Data ds20a;
x=datepart ('01Jan2010:05:30:26'dt);
Format x ddmmyy8.;
Run;
Data ds1;
Infile datalines;
Input id$ fname$ lname$ sal dob datetime.;
Format dob datetime. date date9. time time8.;
Date=datepart(dob);
Time=timepart(dob);
Datalines;
001 mohan arisela 60000 10jan1983:10:30:15
002 padma narni 45000 22feb1983:20:23:52
003 varma maddina 50000 30mar1983:06:55:25
;
Run;

INTCK
Returns the integer count of the number of interval boundaries between two dates, two
times, or two datetime values
Syntax: - INTCK(interval, from, to)

Interval
Specifies a character constant, a variable, or an expression that contains a time interval
such as SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QTR, SEMIYEAR and YEAR

DATA ds21;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('DAYS', BDATE, EDATE);
RUN;
DATA ds21a;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('months', BDATE, EDATE);
RUN;
DATA ds21b;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('Semiyear', BDATE, EDATE);
RUN;
DATA ds21c;
y=trim('year ');
date1='1sep1991'd + 300;
date2='1sep2001'd - 300;
Years=intck (y,date1,date2);
RUN;

YRDIF
Returns the difference in years between two dates
Syntax: - YRDIF (sdate,edate,basis)
sdate
Specifies a SAS date value that identifies the starting date
edate
Specifies a SAS date value that identifies the ending date
basis
Identifies a character constant or variable that describes how SAS calculates the date
difference the following character strings are valid: '30/360'
Specifies a 30-day month and a 360-day year in calculating the number of years
Each month is considered to have 30 days, and each year 360 days, regardless of the
actual number of days in each month or year
DATA ds22;
BDATE='10SEP2000'D;
EDATE='14SEP2010'D;
ACTYEARS=YRDIF(BDATE, EDATE, 'ACTUAL');
Format BDATE date9. EDATE date9. ;
RUN;
DATA ds22a;
Sdate='16Oct1998'd;
Edate='16Feb2003'd;
y30360=yrdif(sdate, edate, '30/360');
Yactact=yrdif(sdate, edate, 'ACT/ACT');
yact360=yrdif(sdate, edate, 'ACT/360');
yact365=yrdif(sdate, edate, 'ACT/365');
Run;
DATA ds22b;
Sdate='16Oct1998'd;
Edate='16Feb2003'd;
YRDIFF=yrdif(sdate, edate, '30/360');
DAYDIFF=yrdif(sdate, edate, 'ACT/365');
Run;

INTNX
Increments a date, time, or datetime value by a given interval or intervals, and returns a
date, time, or datetime value Category: Date and Time
Syntax: -
INTNX (interval<multiple><.shift-index>, start-from, increment<, alignment>)
Interval
Specifies a character constant, a variable, or an expression that contains a time interval
such as SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QTR, SEMIYEAR and YEAR

Data ds23;
Yr=intnx('year','05feb94'd,3);
Format yr date7. ;
Run;
Data ds23a;
Next=intnx('semiyear','01jan97'd,1);
Format next date9.;
Run;
Data ds23b;
X1='month ';
X2=trim(x1);
Date='1jun1990'd - 100;
Next_month=intnx(x2,date,1);
Format Next_month date9.;
Run;

DATA DS23c;
FORMAT TODAY1 DATE9.;
TODAY1=TODAY();
CDATE=PUT (INTNX ('MONTH',TODAY1,0,'S'),DATE9.);
LMCDATE=PUT(INTNX('MONTH',TODAY1,-1,'S'),DATE9.);
BCDATE=PUT(INTNX('DAY',TODAY1,-1,'S'),DATE9.);
LMBCDATE=PUT(INTNX('MONTH',(TODAY1-1),-1,'S'),DATE9.);
BDATE=PUT(INTNX('MONTH',TODAY1,0,'B'),DATE9.);
EDATE=PUT(INTNX('MONTH',TODAY1,0,'E'),DATE9.);
RUN;

HOLIDAY
Returns a SAS date value for the holiday and year specified

Valid values for holiday are 'BOXING', 'CANADA', 'CANADAOBSERVED', 'CHRISTMAS',


'COLUMBUS', 'EASTER', 'FATHERS', 'HALLOWEEN', 'LABOR', 'MLK', 'MEMORIAL',
'MOTHERS', 'NEWYEAR','THANKSGIVING', 'THANKSGIVINGCANADA','USINDEPENDENCE',
'USPRESIDENTS', 'VALENTINES', 'VETERANS', 'VETERANSUSG', 'VETERANSUSPS', and
'VICTORIA'
For example: MOTHERS2011= HOLIDAY (MOTHERS, 2000);
Syntax: - HOLIDAY (holiday, year)
DATA DS24;
THANKSGIVING_2012=HOLIDAY (' THANKSGIVING ', 2012);
Format THANKSGIVING_2012date9. ;
RUN;

You might also like