Professional Documents
Culture Documents
Automotive
Communications
Education
Banking/Financial Services
Government
Health Insurance
Health Care Providers
Hospitality & Entertainment
Insurance
Life Sciences
Manufacturing
Media
Oil & Gas
Retail
Hotels
Utilities
And giving solution lines as
Analytics
Business Intelligence
Customer Intelligence
Data Integration & ETL
Financial Intelligence
Foundation Tools
Fraud Management
Governance, Risk & Compliance
High-Performance Computing
Human Capital Intelligence
IT Management
On Demand Solutions
Performance Management
Risk Management
Supply Chain Intelligence
Sustainability Management
But there was a need for a computerized statistics program to analyze vast amounts of
agricultural data collected through
North Carolina State University, located in the capital city of Raleigh, North Carolina
became the leader in the consortium.
In 1972 NIH stopped to give funds to this team, then the consortium agreed to chip in
$5,000 apiece each year to allow NCSU to continue developing and maintaining the
system and supporting their statistical analysis needs.
During the coming years, SAS software was licensed by pharmaceutical companies,
insurance companies and banks, as well as by the academic community that had given
birth to the project.
Jane Helwig, another Statistics Department employee at NCSU, Joined the project
consortium as documentation writer
John Sall, a graduate student and programmer, rounded out the core team
Incorporation
In 1976 Goodnight, Barr, Helwig and Sall left NCSU and formed
SAS Institute Inc. - a private company "devoted to the maintenance and further
development of SAS." They opened offices in a building #2806 Hillsborough Street,
across from the university.
By 1980, the growing company building capacity is not sufficient in Hillsborough Street
building, and then it's moved to the site of its present headquarters offices just outside
Raleigh in Cary, North Carolina. In that time employes were 20.
In this time SAS was growing, the entire computer hardware and software industry was
changing, with new operating systems and platforms placing new demands on software
developers one of the first steps for SAS was to adapt the software to operate on IBM's
Disk Operating System (DOS).
Now it is working on different operating systems like windows, Dos, Z/OS, UNIX and
various UNIX flavors.
Its turn out from various difficulties along with the millennium and the Y2K frenzy. And
they created new logo and tagline presently which we are seeing Tagline is
SAS has been named one of FORTUNE magazine's "100 Best Companies to Work For"
every year since 1998 and no1 in 2010
SAS can connect to any kind of data source to read the data, thats why SAS is Multi
Database Architecture.
Data sources are databases (like Oracle, SQL Server, DB2, Sybase, Terradata, Informix
and MS-Access etc)
ORACL SQL
E SERVER
NOTEP DB2
ADS
CSV DAT TERAD
ATA
EXCEL A
SYBAS
MS- INFOR E
ACCESS MIX
The functionality of the SAS System is built around the four data-driven tasks.
1. Data access
2. Data management
3. Data analysis
4. Data presentation.
Data access: addresses the data required by the application.
It means read raw data from source to SAS application.
Topics cover Infile Statement, Proc import, sql pass thru, Libname, Proc access, DB Load
procedure
Topics cover Set, Merge, Format, Informat, Update etc statements and Functions
Data analysis: Analyze data by using various procedures to find sum, means and
various statistical calculations.
Or transforms raw data into meaningful and useful information.
Topics cover statistical procedures to find out Sum, Means, Frequency, Univeriate Anova,
chi square, CMH, GLN, Regression, Correlation, STD etc. and reporting procedures like
Proc print, Report, Tabulate and _Null_ Report.
Data presentation: how you are going to present the output to end user.
REPORT/
RAW DATA DATA STEP SAS DATASET INFORMAT
PROC STEP
ION
Example:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27 45000
003 MNO F 21 70000
004 JKL F 23 44000
005 XYZ M 25 58000
;
Run;
Data ds2 (drop=age);
Set ds1;
Format sal comma6. ;
Run;
Proc sortdata=ds2;
By sex;
Run;
ODSpdfclose;
Rules in SAS program
2) Every step (both data step & proc step) end with Run statement
Data types:
Terminology:
Tables are called datasets
Columns are called variables
Rows are called observations
VARIABLES
COLUMNS
TABLES
DATASETS
Missing Data:
The values of particular variables may be missing for some observations in that case
Example:-
BLANK
LOG
EDITOR
EXPLORER
RESULTS
Editor Window:-
The default editor is the Enhanced Editor. The Enhanced Editor is syntax sensitive and
color codes your programs making it easier to read them and find mistakes.The
Enhanced Editor also allows you to collapse and expand the various steps in your
program. For other operating environments, the default editor is the
Program Editor.
Log Window:-
Log window contains information of program which submitted in Editor Window.
Generally we can get here Notes, Warnings and Errors.
How many observations are there and how many variables are there in which library
datasets are storing.
Output Window
If your program generates any printable results, then it will appear in the Output
window.
Explorer Window
The Explorer window gives you easy access to your SAS librariesand files.
Results Window
The Results window is like a table of contents for your Output window. The results tree
lists each part of your results in an outline form.
Command Bar:-
The command bar is a place that you can type in SAS commands
Most of the commands that you can type in the command bar are also accessible
through the pull-down menus or the toolbar.
Example:-
X Notepad
X Time
X Date
X SQL etc
Include 'sample.sas'
Tool Bar:-
Gives you quick access to commands that are already accessible through the pull-down
menus.
(New)-
To open New Window
(Open)
To open the program which save in server/pc location
(Save)
To save the program or Log or Output windows information in
Server location or pc location.
(Print)
To produce print of program or Log or Output windows info.
(Print Preview)
Before giving the print we can check the preview of info
(Cut)
To cut the part of program lines in Editor Window
The same can do thru key board, using Ctrl X
(Copy)-
To select the part of program lines
The same can do thru key board, using Ctrl C
(Paste) -
To Paste the part of program lines
The same can do thru key board, using Ctrl V
(Undo)
To get back the part of program lines those cuts.
The same can do thru key board, using Ctrl Z
(New Library) -
To create a new library for storing datasets
Click on this icon,
Specify new library name,
Specify Engine as default,
Click enable at startup,
And browse the location where datasets should store,
And click OK.
(SAS Explorer)-
To open SAS Explorer Window.
(Submit)-
To submit the program for execution
This we can do in multiple ways
-> Click on this icon to execute entire SAS Session
-> Select some part of program lines and click on this icon only
Selected program lines submit for execution
-> Select some part of program lines and right click on select
Program lines and click Submit selection for execute selected
lines or click Submit All for execute entire SAS Session.
-> In fill down Menus clicks Run then click submits.
-> Use F3 from keyboard.
(Clear All)-
To clean only Editor Window.
Other ways to clean windows
(Help)-
To get the documents and sample programs which help to learn.
Menu Bar:-
In Menu bar located at top of the window contains some full down menus those are
File-
Edit:-
-> For undo, redo, cut, copy, paste, clear all, select all,
Collapse all, expand all, find and replace
View:-
-> For getting back whichever is closed window like Enhanced
Editor, Program Editor, Log, Explorer and Output Windows
Tools:-
-> For create new library, change font type, font size
And enable to create listing output and html output.
Run:-
-> For submitting SAS Program and getting back last
Submitted program.
Solutions:-
For analysis, Reporting
Window:-
For checking what are the windows are opened
Help:-
To get the help from SAS documenting
SHORT CUT KEYS
HelpF1
ExecuteF3
Recall F4
Log F6
Output F7
Zoom off F8
Short cut keys F9
Underlines First letter of Menus in Menu bar F10
Command FocusF11
Sub top Shift F1
Horizontal zoomShift F3
Vertical zoomShift F4
Zoom one on another Shift F5
Left Shift F7
Right Shift F8
Wpopup (Bring up word tip) Shift F10
Hide the current word tip ESC
Libname Ctrl B
Copy Ctrl C
Directory Ctrl D
Clear Ctrl E
Find Ctrl F
Moves line no Ctrl G
Replace Ctrl H
SAS System Options Ctrl I
Log Ctrl L
File name Ctrl Q
RFind Ctrl R
Title Ctrl T
Paste Ctrl V
Cut Ctrl X
Redo Ctrl Y
Undo Ctrl Z
Open Explorer Ctrl W
Execute the last recorded macro Ctrl F1
Move cursor to next case change ALT Right
Move cursor to previous case change ALT Left
Commenting Shift /
Uncommenting Shift Ctrl /
Convert the selected text to lowercaseCtrl Shift L
Convert the selected text to uppercaseCtrl Shift U
Note: Click F9 from your Keyboard to get all the short cut keys into Log.
SAS Program
REPORT/
RAW DATA DATA STEP SAS DATASET INFORMAT
PROC STEP
ION
Example:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000 DATA STEP
002 DEF F 27 45000
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;
Proc print data=ds1; PROC STEP
Run;
Example:-
Data ds1;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27 45000
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;
Examples:-
Data ds;
Infile"C:\Documents and Settings\Administrator\Desktop\SAMPLE.txt";
Input id name$ sex$ age sal;
Run;
Example:-
Data ds2;
Set ds;
Run;
Can read from different databases to sas like Oracle DB2 Sybase etc
Example:-
Procsql;
Connectto oracle (user=Scott password=tiger);
Createtable ds3 as
Select * from connection to oracle
(
Select * from EMP
);
Disconnectfrom oracle;
Quit;
When starts a SAS session with any mode, there is a work library this is just temporary
created directory (default library)where datasets are stored for SAS session.
All the datasets are created in SAS session will be referred as Work. Prefix
Once close the session will be lost all the datasets from work Library,
If we want keep the datasets permanently need to create own library and keepthe
datasets permanently
(Programming method)
LIBNAME Statement
Associates a Libref with a SAS library and lists file attributes for a SAS library.
Syntax: - LIBNAME Libref 'SAS-library';
LIBNAME MY_SAS E:\SAS_CLAS;(Library name not more than 8 char)
(Or)
(GUI method)
In menu bar click on New Library icon and
Specify library name, Click enable at start up
And browse the location where you are going to create datasets as backup
And click ok.
Specify Library Name (MY_SAS) Select Engine as
Default, select Enable at start up and browse the location where you can store datasets
permanently.
Click OK to Create a Library
If already any datasets are there in that location it will be reflect into library
When the data step is submit for execution, it first under goes a syntax check by the
SAS system ,if no errors are found the data step is then complied and executed .When
executing the data step for in stream data, the SAS system creates the following three
items.
INPUT BUFFER:-
Each raw record of data is read into an area of memory when an input statement is
executed.
PROGRAM DATA VECTOR:-
The SAS system builds the data set one observation at a time in this area of
memory as the program is executed; values are read from the input buffer or created by
programming statements and assigned to corresponding variables in the PDV. The
written to a SAS data set as a single observation.
In PDV along with all variables there are 2 automatic variables those are
_ N _ and _ ERROR _
_ N_: indicates how many times the data step has iterated.
By default _ n _ =1 When iterations done its increase +1 Using we can find out how
many observations are there in dataset.
_ Error _: default value =0 when error encounter it gives _ Error _ =1
If 100 of errors also _ Error _ =1 only
_ Error_ =1 is logical error its not a syntax error. For Syntax error you wont get
_error _=value. Syntax errors can see in the log with red color and where ever error is
there it shows red color underline
DESCRIPTOR INFORMATION:-
On each SAS data set, SAS creates and maintains information about data set and
variable attributes like Length, Label, Format, and Informat and data type. To see this
information use Proc contents procedure.
ProccontentsData=Dataset_Name;
Run;
Example:-
Data ds;
Infile datalines;
Input id name age sex$ sal;
Datalines;
001 abc 23 m 5000
002 def 25 f 5600
003 mno 28 f 8000
004 xyz 21 m 6000
;
Run;
(Run above program and see the log for _n_ and _error_ values)
DATA STEP
DATA DATASET;
INFILE DATALINES;
INPUT ID NAME$ AGE SEX$ SALARY;
DATALINES;
001 ABC 23 M 23000
002 DEF 25 F 25000
003 XYZ 22 M 21000
;
RUN;
DATA STATEMENT
Begins a DATA step and provides names for output SAS data sets
Options
KEEP:-
Specifies variables for processing or for writing to output SAS data sets
Syntax: - KEEP=variable(s)
Examples: -
Data ds1 (keep=name sal);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds1a (keep=name sal);
Set ds;
Run;
Data ds1b (keep=name sal);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;
DROP: -
Excludes variables from output SAS data sets
Syntax: DROP variable(s)
Examples: -
Data ds2 (Drop=name sal);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds;
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds2a (drop=name sal);
Set ds;
Run;
Data ds2b (drop=name sal);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;
RENAME: -
Specifies new names for variables in output SAS data sets
Syntax: -
RENAME= (old-name-1=new-name-1 . . . <old-name-N=new-nameN>);
Examples: -
WHERE: -
Selects observations from SAS data sets that meet a particular condition
Examples: -
REPLACE: -
When we are creating dataset with any name that dataset already is exist in our SAS
library, by default it will replace on first dataset when second data step executes but
dont want to replace use replace=No
Default replace=Yes.
Examples: -
PW (password): -
To assign the password to data set.
Syntax: pw=password
Examples: -
Data ds6 (pw=sasadmin);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
Data ds6b (pw=sasadmin);
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt";
Input id name$ sex$ age sal;
Run;
Label: -
To assign the label to data set.
Syntax: Label=Name
Examples: -
Data ds7 (Label=sample);
Infile datalines;
Input id name$ sex$ age sal;
Datalines;
001 abc m 23 45000
002 def f 34 67000
003 mno m 21 36000
004 xyz f 27 45000
;
Run;
_Null_
The data set name _null_ is reserved for a special purpose here no data set will be
createbut the programming step executes.
Example:-
DATA _NULL_;
INFILE DATALINES;
INPUT ID NAME$ SEX$ AGE SAL;
DATALINES;
001 ABC M 20 2500
002 DEF M 22 3000
003 XYZ F 21 5000
;
RUN;
Put Statement:-
It will write information in SAS log.
Syntax: Put Variable(S)
DATA_NULL_;
INFILE DATALINES;
INPUT ID NAME$ SEX$ AGE SAL;
PUT ID NAME$ SEX$ AGE SAL;
DATALINES;
001 ABC M 20 2500
002 DEF M 22 3000
003 XYZ F 21 5000
;
RUN;
Data Health;
Infile Datalines;
Input idno 1-4 name $ 6-24 team $ strtwght endwght;
Loss=strtwght-endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
1095 Susan Stewart blue 135 127
1157 Rose Collins green 155 141
1331 Jason Schock blue 187 172
1067 Kanoko Nagasaka green 135 122
1251 Richard Rose blue 181 166
1192 Charlene Armstrong yellow 152 139
1352 Bette Long green 156 137
1262 Yao Chen blue 196 180
1124 Adrienne Fink green 156 142
1197 Lynne Overby red 138 125
1133 John VanMeter blue 180 167
1057 Margie Vanhoy yellow 146 132
1328 Hisashi Ito red 155 142
1243 Deanna Hicks blue 134 122
1177 Holly Choate red 141 130
1259 Raoul Sanchez green 189 172
1017 Jennifer Brooks blue 138 127
1099 Asha Garg yellow 148 132
;
Run;
Data_null_;
d=today ();
t=time ();
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200;
Put' ';
Put' ';
Put @2 'REQUEST # I11796-S'
@35'NEERU TECHNOLOGIES'
@70"PAGE 1 ";
Put @2'RUN DATE:' d ddmmyys10.
@31'INFORMATION CENTER REQUEST'
@70'RUNTIME:' t time8.
Put' ';
Put' ';
Run;
Data_null_;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put @2'EMP_ID' @10'EMP_NAME' @30'TEAM'@42'STRTWGHT' @55'ENDWGHT';
PUT' ';
Run;
Data_null_;
Set My_SAS.wghtclub;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put @2 idno @10 name @30 team
@42 strtwght @55 endwght;
Run;
Data_null_;
File'E:\SAS\TARGET_DATA\DATASTEP_REPORT\Weight_Club4.rtf'
Linesize=200mod;
Put;
Put;
Put;
Put @2'*************END OF REPORT *************************';
Put @2'**********GENERATED BY Mr.KRISHNA *******************';
Run;
In above example any how we are sending information into rtf file so its waste to create
datasets in library its waste of space
Using _Null_ we can save space in workspace server
DLM(or) DELIMITER(Delimiter):-
When data values having special characters in raw data/an external file. Than
we use DLM option to read data
This special character must be enclosed with quotes.
Examples:-
Data ds3;
Infile datalines dlm=, ;
Input id name$ sex$ age sal;
Datalines;
001,abc,m,23,45000
002,def,f,34,67000
003,mno,m,21,36000
004,xyz,f,27,45000
;
Run;
Data ds3a;
Infile datalines dlm=* ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002*def*f*34*67000
003*mno*m*21*36000
004*xyz*f*27*45000
;
Run;
Data ds3b;
Infile datalines dlm=* ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002*def*f*34*67000
003*mno*m*21*36000
004*xyz*f*27*45000
;
Run;
Data ds3c;
Infile datalines dlm='* ,' ;
Input id name$ sex$ age sal;
Datalines;
001*abc,m*23*45000
002*def,f*34*67000
003*mno*m*21,36000
004*xyz*f*27,45000
;
Run;
Data ds3c1;
Infile datalines DELIMITER='* ,' ;
Input id name$ sex$ age sal;
Datalines;
001*abc,m*23*45000
002*def,f*34*67000
003*mno*m*21,36000
004*xyz*f*27,45000
;
Run;
Data ds3d;
Infile"C:\Documents and Settings\Administrator\Desktop\sample.txt"dlm='* ,' ;
Input id name$ sex$ age sal;
Run;
DLMSTR:-
When data values having strings as a delimeter in raw data/an external file.
Than we use DLMSTR option to read data
This string must be enclosed with quotes and case sensitive.
Example:-
Data ds3e;
Infile datalines dlm='a' ;
Input X Y Z;
Datalines;
1a2a3
4a5a6
7a8a9
;
Run;
Data ds3e1;
Infile datalines dlm='a' ;
Input X Y$ Z;
Datalines;
1ama3
4afa6
7ama9
;
Run;
Data ds3e2;
Infile datalines dlmstr='PRD' ;
Input X Y Z;
Datalines;
1PRD2PRD3
4PRD5PRD6
7PRD8PRD9
;
Run;
DLMSOPT=Options :-
When data values having strings as a delimeter that should be in one case in all
places if it is not in one case use dlmsopt=I to read data properly.
Options=i
specifies that case-insensitive comparisons will be done.
Options=t
specifies that trailing blanks of the string delimiter will be removed.
Example:-
Data ds3e4;
Infile datalines dsd dlmstr='PRD' dlmsopt='i';
Input X Y Z;
Datalines;
1PRD2PRd3
4PrD5Prd6
7pRd8pRD9
;
Run;
FIRST OBS:-
Specify the first observation at which processing starts
Examples:-
Data ds4;
Infile datalines dlm='*'firstobs=2;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds4a;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=2;
Input id name$ sex$ age sal;
Run;
OBS:-
Specify the observation at which processing ends.
Examples:-
Data ds5;
Infile datalines dlm='*' obs=3;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds5a;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=4;
Input id name$ sex$ age sal;
Run;
Data ds5b;
Infile datalines dlm='*' firstobs=2 obs=4;
Input id name$ age sex$ sal;
Datalines;
001*Joseph*25*m*4500
002*Mitchel*24*m*3500
003*john*21*f*2500
004*miller*22*f*3000
005*brans*30*m*5000
;
Run;
Data ds5c;
Infile"C:\Documents and Settings\Administrator\
Desktop\sample.txt"dlm='*'firstobs=2 obs=4;
Input id name$ sex$ age sal;
Run;
FLOWOVER
It is default. Causes the INPUT statement to jump to the next record if it
doesnt find values forall variables.
Examples:-
Data ds6;
Infile datalines flowover;
Input Id Type$ Amount;
Datalines;
101 x 3400
102 x 2000
103 y 3400
104 y 2500
105 x 3000
;
Run;
When we have missing values in raw data or external file at the end of a data
record is encountered than we will use Missover in Infile statement.
Missing value are represented for
Examples:-
When data is separating with any delimiter we can specify DLM option so it will read
missing values also but when data is separating with space we should use MISSOVER
Data ds7;
Infile datalines Missover;
Input Id Type$ Amount;
Datalines;
101 x
102 x 2000
103 y 3400
104 y
105 x 3000
;
Run;
Data ds7a;
Infile datalines Missover;
Input id name$ sex$ age sal;
Datalines;
001 ABC M 23 50000
002 DEF F 27
003 MNO F 21 70000
004 XYZ M 25 58000
;
Run;
Data ds7b;
Infile datalines Missover ;
Input Lname$ Fname$ Emp_id$ Job_code$;
Datalines;
LANGKAMM SARAH E0045 Mechanic
TORRES JAN E0029 Pilot
SMITH MICHAEL E0065
LEISTNER COLIN E0116 Mechanic
TOMAS HARALD
WAUGH TIM E0204 Pilot
;
Run;
In below example no need to specify MISSOVER because data is separating with special
character so it will read missing values without missover.
Data ds7c;
Infile datalines dlm='*' ;
Input id name$ sex$ age sal;
Datalines;
001*abc*m*23*45000
002* * *34*67000
003*mno*m*21*
004*xyz* *27*45000
;Run;
STOPOVER
Stops the DATA step when it reads a short line.
causes the DATA step to stop execution immediately and write a note to the SAS log.
Example:-
Data ds8;
Infile datalines stopover;
Input Id Type$ Amount;
Datalines;
101 x
102 x 2000
103 y 3400
104 y
105 x 3000
;
Run;
INPUT STATEMENT
The order which data values are entered the name of the SAS variables and
their type.
We should use the input statement only for data values stored in external files or for
data immediately following a cards or data lines statement.
Data ds1;
Infile datalines;
Input idno name $ team $ strtwght endwght;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;
The column numbers follow the variable name in the input statement that
numbers indicate where the variable values are found in the input data records.
When to Use Column Input
With column input, the column numbers that contain the value follow a variable name in
the INPUT statement. To read with column input, data values must be in
the same columns in all the input data records
standard numeric form or character form
Useful features of column input are that
Character values can contain embedded blanks.
Character values can be from 1 to 32,767 characters long.
Input values can be read in any order, regardless of their position in the record.
EX: input name $ 1-10 Sal 11-15;
Data ds2;
Infile datalines;
Input idno 1-4 name $ 6-23 team $ 25-30 strtwght 32-34 endwght 36-38;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
List Input:
Data DS3;
Infile datalines;
Input idno 4. Name $19. Team $7. Strtwght 4. Endwght 4. ;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 232
;
Run;
Modified List Input
This format modifier reads the value from the next non-blank column until the pointer
reaches two consecutive blanks,
The defined length of the variable, or the end of the input line, whichever comes first.
Data ds14b;
Infiledatalines;
Input idno name &$ team $ strtwght endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118 s
1221 Jim Brown yellow 220 .
;
Run;
Data ds14c;
Infiledatalines;
Input idno name &$18. team $ strtwght endwght;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;
Restriction: The & modifier must follow the variable name and $ sign that it affects.
:
Enables you to specify an informat that INPUT statement uses to read the
variable value.
For a character variable, this format modifier reads the value from the next non-blank
column until the pointer reaches the next blank column, the defined length of the
variable, or the end of the data line, whichever comes first. For a numeric variable, this
format modifier reads the value from the next non-blank column until the pointer
reaches the next blank column or the end of the data line, whichever comes first.
/*wrong*/
Data ds14d;
Infiledatalines;
Input item $10. Amount;
Datalines;
Trucks 1382
Vans 1235
Sedans 2391
;
Run;
/*right*/
Data ds14d;
Infiledatalines;
Input item: $10. Amount;
Datalines;
Trucks 1382
Vans 1235
Sedans 2391
;
Run;
~
Indicates to treat single quotation marks, double quotation marks, and
delimiters in character values in a special way. This format modifier reads
delimiters within quoted character values as characters instead of as delimiters and
retains the quotation marks when the value is written to a variable.
Restriction: You must use the DSD option in an INFILE statement. Otherwise, the INPUT
statement ignores this option.
Data ds14e;
Infile datalines dsd;
Input id name ~ $ sex$ age sal;
Datalines;
001,"abc",m,23,45000
002,"def",f,34,67000
003,"mno",m,21,36000
004,"xyz",f,27,45000
;
Run;
Data ds14f;
Infile datalines dsd;
Input Name: $9. Score1-Score3 Team ~ $25. Div $;
Datalines;
Joseph,11,32,76,"Red Racers, Washington",AAA
Mitchel,13,29,82,"Blue Bunnies, Richmond",AAA
Sue Ellen,14,27,74,"Green Gazelles, Atlanta",AA
;
Run;
+Moves pointer columns N
Datadsl;
Infiledatalines;
Input team $6. +6 points 2.;
Cards;
red 59
blue 95
yellow 63
green 76
;
Run;
Multiple Input Statements
We can write multiple input statements or # format modifier to read the data when data
is available in multiple lines for one record
Data ds14g;
Infiledatalines;
Input Idno 1-4 name $7-20;
Input team $1-6;
Input strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;
Data ds14h;
Input Idno 1-4;
Input;
Input strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;
#Moves the pointer to record N.
Data ds14i;
Input #1 name $ 6-23 idno 1-4
#2 team $ 1-6
#3 strtwght 1-3 endwght 5-7;
Cards;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
/
Advances the pointer to column 1 of the next input record.
Data ds14k;
Infile datalines;
Input idno 1-4/ / strtwght 1-3 endwght 5-7;
Datalines;
1023 David Shaw
red
189 165
1049 Amelia Serrano
yellow
145 124
1219 Alan Nance
red
210 192
1246 Ravi Sinha
yellow
194 177
1078 Ashley McKnight
red
127 118
1221 Jim Brown
yellow
220 .
;
Run;
Formatted Input:
An in format follows with the variable name in the input statement.
The in format gives the data type and the field width of an input value. In formats also
to read data that are stored in non standard form, such as packed decimals or numbers
that contain special characters such as command.
Data ds15;
Infiledatalines;
Input @1idno 4. @6name $18. @25team $5. @32strtwght 3. @36endwght 3.;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
Named input:
We specify the name of the variable followed by an equal sign. SAS looks for n
variable name and an equal sign in the input record.
Data ds16;
Infiledatalines;
Input Id= Name=$18. Team=$6. Strtwght= Endwght=3. ;
Cards;
ID=1023 NAME=David Shaw TEAM=red strtwght=189 endwght=165
ID=1049 NAME=Amelia Serrano TEAM=yellow strtwght=145 endwght=124
ID=1219 NAME=Alan Nance TEAM=red strtwght=210 endwght=192
ID=1246 NAME=Ravi Sinha TEAM=yellow strtwght=194 endwght=177
ID=1078 NAME=Ashley McKnight TEAM=red strtwght=127 endwght=118
ID=1221 NAME=Jim Brown TEAM=yellow strtwght=220
;
Run;
Data DS16a;
Infiledatalines;
Input Id= Name=$18. Team=$6. Strtwght= Endwght=3. ;
Cards;
NAME=David Shaw TEAM=red strtwght=189 endwght=165 ID=1023
NAME=Amelia Serrano TEAM=yellow strtwght=145 endwght=124 ID=1049
NAME=Alan Nance TEAM=red strtwght=210 endwght=192 ID=1219
ID=1246 NAME=Ravi Sinha TEAM=yellow strtwght=194 endwght=177
ID=1078 NAME=Ashley McKnight TEAM=red strtwght=127 endwght=118
ID=1221 NAME=Jim Brown TEAM=yellow strtwght=220
;
Run;
When to Use Named Input:
Named input reads the input data records that contain a variable name followed by an
equal sign and a value for the variable
The INPUT statement reads the input data record at the current location of the input
pointer. If the input data records contain data values at the start of the record that the
INPUT statement cannot read with named input, use another input style to read
them.
Null Input:
The INPUT statement with no arguments (variables) is called a null INPUT. The
DATA step copies records from the input file to the output file without creating any SAS
variables.
Data ds17;
Input;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
In above program when input statement executes, one record at a time is storing into
input buffer then pdv picks data from input buffer and assigns to corresponding variables
But there is no variables so its create zero variable dataset.
Mixed Input:
The input statement with all input styles called mixed input
EX: input city = $1-8. State = $6. Date mmddyy8.
Data ds18;
Infiledatalines;
Input Idno Name $ 6-23 @25Team $7.Strtwght 3. Endwght 36-38;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220
;
Run;
INPUT SPECIFICATIONS
@ (Single Trailing)
Holds an input record for the execution of the next INPUT statement within the same
iteration of the DATA step.
Restriction: The trailing @ must be the last item in the INPUT statement.
Data redteam;
Infile datalines;
Input team $ 13-18@;
If team='red';
Input idno 1-4 strtwght 20-23 endwght 24-26;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;
Data ds;
Input id 1-3@;
If id in (101,103,105) ;/*if id=101; ---- Only for one value*/
Input name$ 5-7 sal 9-12;
Datalines;
101 abc 1000
102 asd 2000
103 dfg 3000
104 hjk3400
105 xyz 5000
;
Run;
@@ (Double Trailing)
Holds an input record for the execution of the next INPUT statement across iterations of
the DATA step.
DATALINES STATEMENT
Indicates that data lines follow
Syntax:- DATALINES;
Use the DATALINES statement with an INPUT statement to read data that you enter
directly in the program.
The DATALINES statement is the last statement in the DATA step and immediately
precedes the first data line
/*Generally SAS processes data lines longer than 80 columns in their entirety*/
/*if we need more then that need to use CARDIMAGE system option*/
/*If we use CARDIMAGE, SAS processes data lines exactly like 80-byte punched card
images padded with blanks*/
Use the DATALINES statement whenever data does not contain semicolons
If data contains semicolons use DATALINES4 statement
Example: - (datalines)
Data health;
Infiledatalines;
Input id name &$18. Sex$ RBC WBC;
Datalines;
1023 David Shaw f 1900 120
1049 Amelia Serrano m 2000 125
1219 Alan Nance m 2100 130
1246 Ravi Sinha f 2050 122
1078 Ashley McKnight f 2200 150
;
Run;
Example: - (datalines4)
Data health;
Infile datalines;
Input id name &$18. Sex$ RBC WBC;
Datalines4;
1023 David Shaw f 1900 120 ;
1049 Amelia Serrano m 2000 125 ;
1219 Alan Nance m 2100 130 ;
1246 Ravi Sinha f 2050 122 ;
1078 Ashley McKnight f 2200 150 ;
;;;;
Run;
Data health;
Infile datalines;
Input id name &$18. Sex$ RBC WBC 30-32;
Datalines4;
1023 David Shaw f 1900 120;
1049 Amelia Serrano m 2000 125;
1219 Alan Nance m 2100 130;
1246 Ravi Sinha f 2050 122;
1078 Ashley McKnight f 2200 150;
;;;;
Run;
INFORMAT STATEMENT
Informat is an instruction that SAS uses to read data values into a variable.
Informats are usually specified in an input statement.If coded with the informat
statements, attach an informat to a variable for subsequentinput.
Informats can be user-written informats also.
Syntax: -INFORMAT variable-1<informat-1>variable-N<informat-N>;
Categories ofInformats:-
Character Informats: -
Reads character data into character variables.
Syntax: -$informatw.
Ex: - $
$10.
$20.
$Char.
Examples:-
Data infmt6;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
Data informat6a;
Infile datalines;
Input idno name &$18.team$ strtwght endwght;
Cards;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;
Numeric Informats: -
Reads numeric data values from numeric variables
Syntax: -informatw.d
Ex: - comma12.
dollar10.2
Examples:-
Data infmt1;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
In above example data is there properly so no need to worry for reading data, but see
the below program sal variable is containing comma with values here sal is numeric
variable but comma is special character so we cant read data in this case so we can
specify informat to read data, not only with comma when numeric data contains comma,
dollar we can specify Numeric informats like below.
Data infmt2;
Infile datalines;
Input id name$ age sex$ sal comma6.;
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run
Data infmt3;
Infile datalines;
Input id name$ age sex$ sal;
Informat sal comma6.;
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run;
Data infmt4;
Infile datalines;
Input id name$ age sex$ sal dollar5.;
Datalines;
001 David 23 m $5000
002 Amelia 32 f $2500
003 Alan 31 f $3000
004 Ravi 21 m $4500
005 Jim 35 f $28000
;
Run;
Data infmt5;
Infile datalines;Informat we can write with Input
Input id name$ age sex$ sal dollar6.;statement, after the variable or we can
/*Informat sal dollar6.;*/write as a separate statement like this
Datalines;
001 David 23 m $5,000
002 Amelia 32 f $2,500
003 Alan 31 f $3,000
004 Ravi 21 m $4,500
005 Jim 35 f $2,800
;
Run;
Date and time Informats:-
Reads date values into variables representing time, dates and date times.
Syntax: -informatw.
Ex: - date7. Ex: - 23Oct09
date9. Ex: - 23Oct2009
ddmmyy8. Ex: - 23/10/09
ddmmyy10. Ex: - 23/10/2009
anydtdte. If u doesnt know about data informat then you can use this
time. Ex: - 06:33:45
datetime. Ex: - 23Oct09:06:33:45
Examples:-
Data infmt7;
Infile datalines;
Input id name$ age sex$ sal dob;
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
In SAS dates are Numeric data type but DOB values contains character values in above
example so we cant read, to read dates we should use date informats like below
Data infmt7a;
Infile datalines;
Input id name$ age sex$ sal dob date9.;
/*Informat dob date9.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data infmt7b;
Infile datalines;
Input id name$ age sex$ sal dob date7.;
/*Informat dob date7.;*/
Datalines;
001 David 23 m 50000 10Feb83
002 Amelia 32 f 25000 15May84
003 Alan 31 f 30000 21Jul84
004 Ravi 21 m 45000 05Aug84
005 Jim 35 f 28000 30Jan85
;
Run;
Data infmt7c;
Infile datalines;
Input id name$ age sex$ sal dob anydtdte.;
/*Informat dob date9.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data infmt8;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy10.;
/*Input id name$ age sex$ sal dob anydtdte9. doj:anydtdte10.;*/
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data infmt8a;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy8. ;
/*Informat dob date9. doj ddmmyy8.;*/
Datalines;
001 David 23 m 50000 10Feb1983 12/01/11
002 Amelia 32 f 25000 15May1984 15/01/11
003 Alan 31 f 30000 21Jul1984 31/01/11
004 Ravi 21 m 45000 05Aug1984 25/02/11
005 Jim 35 f 28000 30Jan1985 08/03/11
;
Run;
Data infmt8b;
Infile datalines;
Input id name$ age sex$ sal dob doj;
Informat dob date9. doj ddmmyy10.;
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data infmt9;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob datetime.;
Datalines;
001 David 23 m 50000 10Feb1983:10:30:15
002 Amelia 32 f 25000 15May1984:11:23:23
003 Alan 31 f 30000 21Jul1984:08:34:45
004 Ravi 21 m 45000 05Aug1984:12:43:56
005 Jim 35 f 28000 30Jan1985:03:35:12
;
Run;
Data infmt9a;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob time.;
Datalines;
001 David 23 m 50000 10:30:15
002 Amelia 32 f 25000 11:23:23
003 Alan 31 f 30000 08:34:45
004 Ravi 21 m 45000 12:43:56
005 Jim 35 f 28000 03:35:12
;Run;
Column binary Informats:-
Reads data stored in column- binary or multi punched form into character and numeric
variables
Ex: - row 12.3, $ cd4.
FORMAT STATEMENT
Categories of Formats:-
Character Formats: -
Writes character data values from character variables
Character informats and character formats both are same
Syntax: -$ formatw.
Ex: - $
$10.
$20.
Examples:-
Data fmt5;
Infile datalines;
Input id name$ age sex$ sal;
Datalines;
001 David 23 m 50000
002 Amelia 32 f 25000
003 Alan 31 f 30000
004 Ravi 21 m 45000
005 Jim 35 f 28000
;
Run;
Data fmt5a;
Infile datalines;
Input idno name &$18.team$ strtwght endwght;
Datalines;
1023 David Shaw red 189 165
1049 Amelia Serrano yellow 145 124
1219 Alan Nance red 210 192
1246 Ravi Sinha yellow 194 177
1078 Ashley McKnight red 127 118
1221 Jim Brown yellow 220 .
;
Run;
Numeric Formats: -
Writes numeric data values from numeric variables.
Syntax: -formatw.d
Ex: - dollar10.
dollar 10.2
comma10.
comma10.2
percent 4.2
best10.
Examples:-
Data fmt1;
Infile datalines;
Input id name$ age sex$ sal comma6.;
Format sal comma6.;
/*Format sal comma9.2;*/
Datalines;
001 David 23 m 50,000
002 Amelia 32 f 25,000
003 Alan 31 f 30,000
004 Ravi 21 m 45,000
005 Jim 35 f 28,000
;
Run;
Data fmt2;
Infile datalines;
Input id name$ age sex$ sal;
Informat sal comma9.2;
Format sal comma9.2;
Datalines;
001 David 23 m 50,000.55
002 Amelia 32 f 25,000.00
003 Alan 31 f 30,000.60
004 Ravi 21 m 45,000.77
005 Jim 35 f 28,000.50
;
Run;
Data fmt3;
Infile datalines;
Input id name$ age sex$ sal dollar5. ;
Format sal dollar5.;
Datalines;
001 David 23 m $5000
002 Amelia 32 f $2500
003 Alan 31 f $3000
004 Ravi 21 m $4500
005 Jim 35 f $28000
;
Run;
Data fmt3a;
Infile datalines;
Input id name$ age sex$ sal dollar6. ;
/*Format sal dollar6.;*/
Format sal dollar9.2;
/*Format sal comma6.;*/
Datalines;
001 David 23 m $5,000
002 Amelia 32 f $2,500
003 Alan 31 f $3,000
004 Ravi 21 m $4,500
005 Jim 35 f $2,000
;
Run;
Examples:-
Data fmt6;
Infile datalines;
Input id name$ age sex$ sal dob date9.;
Format dob date9.;
/*Format dob date7.;*/
/*Format dob date9.;*/
/*Format dob ddmmyy8.;*/
/*Format dob ddmmyy10.;*/
/*Format dob worddate20.;*/
/*Format dob weekdate30.;*/
/*Format dob yymmddN8.;*/
/*Format dob yymmddS8.;*/
/*Format dob yymmddS10.;*/
/*Format dob yymmddD8.;*/
/*Format dob yymmddD10.;*/
/*Format dob yymmddC8.;*/
/*Format dob yymmddC10.;*/
Datalines;
001 David 23 m 50000 10Feb1983
002 Amelia 32 f 25000 15May1984
003 Alan 31 f 30000 21Jul1984
004 Ravi 21 m 45000 05Aug1984
005 Jim 35 f 28000 30Jan1985
;
Run;
Data fmt7;
Infile datalines;
Input id name$ age sex$ sal dob date9. doj:ddmmyy10. ;
Format dob worddate20. doj weekdate30.;
Datalines;
001 David 23 m 50000 10Feb1983 12/01/2011
002 Amelia 32 f 25000 15May1984 15/01/2011
003 Alan 31 f 30000 21Jul1984 31/01/2011
004 Ravi 21 m 45000 05Aug1984 25/02/2011
005 Jim 35 f 28000 30Jan1985 08/03/2011
;
Run;
Data fmt8;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob datetime.;
Format dob datetime.;
/*Format dob datetime20.;*/
Datalines;
001 David 23 m 50000 10Feb1983:10:30:15
002 Amelia 32 f 25000 15May1984:11:23:23
003 Alan 31 f 30000 21Jul1984:08:34:45
004 Ravi 21 m 45000 05Aug1984:12:43:56
005 Jim 35 f 28000 30Jan1985:03:35:12
;
Run;
Data fmt9;
Infile datalines;
Input id name$ age sex$ sal dob ;
Informat dob time.;
Format dob time.;
/*Format dob time8.;*/
/*Format dob time5.;*/
Datalines;
001 David 23 m 50000 10:30:15
002 Amelia 32 f 25000 11:23:23
003 Alan 31 f 30000 08:34:45
004 Ravi 21 m 45000 12:43:56
005 Jim 35 f 28000 03:35:12
;
Run;
Procformat;
value $gen 'f'='Female'
'm'='Male';
Run;
Data fmt9b;
Set fmt9;
Format sex $gen.;
Run;
Procreportdata=fmt9a nowd;
Column id name age sex sal dob ;
Define Sex/displayformat=$gen.;
Run;
LENGTH STATEMENT
Specifies the number of bytes for storing variable values.
We can assign length for variables.
Syntax: -LENGTH variable(s)<$>length
Examples:-
Data DS1;
Length name $10.;
Infile datalines;
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data ds2;
Length name $10.;
Set ds1;
Run;
Length statement always should write before Input or before Set statement
Otherwise when input executes whatever the length is there(default is 8) that will come
into output.
LABEL STATEMENT
Assigns descriptive labels to variables.
Syntax: - LABELvariable-1='label-1' . . . <variable-n='label-n'>;
LABELvariable-1=' ' . . . <variable-n=' '>;
Examples:-
Data DS1;
Infile datalines;
Input id name$ sex$ age sal ;
Label name='Emp Name'sex='Gender' sal='Income';
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS2;
Infile datalines;
Label name='Emp Name'sex='Gender' sal='Income';
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
If we write Label statement after input statement output dataset will be in input order
But if we write before input, output dataset order should be in label order then what ever
variable is not there in label those will come in input order
If we specify label is ' '
raw data variable name should come into output
dataset.
Means below exampleLabel name=' ' so output dataset contains variable is name
Data DS3;
Infile datalines;
Input id name$ sex$ age sal ;
Label name=' 'sex='Gender'sal=' ';
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Note: we can specify attributes as individually with above statements like Length, Label,
Informt and Format or all attributes we can specify with one statement that is Attrib
Statement.
ATTRIB STATEMENT
Associates a format, informat, label, and/or length with one or more variables
Syntax: -ATTRIB variable-list(s) attribute-list(s) ;
Generally using Attrib statement we can change length, format, informat and label.
Examples:-
Data DS1;
Attrib name length=$10.;
Input id name$ sex$ age sal ;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS2;
Attrib name length=$10.label='Emp Name';
Input id name$ sex$ age sal;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS3;
Attrib name length=$10. label='Emp name'
sal format=comma6.label='Income';
Input id name$ sex$ age sal;
Datalines;
001 Ronald m 23 50000
002 Clark f 22 34500
003 Roopa f 26 45000
;
Run;
Data DS4;
Attrib name length=$10.
doj format=worddate30.Label='Date of joining'
sex label='gender' sal format=dollar7. ;
Infile datalines;
Input id name$ sex$ age sal doj date9.;
Datalines;
001 Ronald m 23 50000 02Jan2009
002 Clark f 22 34500 22Feb2010
003 Roopa f 26 45000 30Apr2010
;
Run;
Data DS5;
Attrib name length=$10.
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Infile datalines;
Input id name$ sex$ age sal dob doj ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;
Data DS6;
Infile datalines;
Input id name$ sex$ age sal dob doj ;
Attrib name label='Empname'
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;
Data DS7;
Infile datalines;
Input id name$ sex$ age sal dob date9. doj:date9. ;
Datalines;
001 Ronald m 23 50000 11Mar1986 02Jan2009
002 Clark f 22 34500 30Dec1986 22Feb2010
003 Roopa f 26 45000 06Aug1987 30Apr2010
;
Run;
Data DS7a;
Attrib name length=$10.
dob informat=date9.format=ddmmyy8.Label='Date of Birth'
doj informat=anydtdte9.format=ddmmyy10.Label='Date of Joining'
sex label='gender'
sal format=dollar7. ;
Set DS6;
Run;
Note: When we specify length as an attribute with Attrib statement, Attrib statement
must write before Input or before Set statement otherwise when input statement or set
statement executes whatever the length is there for variable that will come into output.
And the output dataset order also change accordingly attrib statement variable order
that we can change again into required order in Proc print
But if we specify Attrib statement after Input or Set statement dataset order is Input
statement order (we can specify like this when we are not using Length in attrib
statement).
SUM STATEMENT;
Adds the result of an expression to an accumulator variable
Syntax: -variable+expression;
Examples:-
Data summ1;
a=2;
b=3;
c=4;
d=5;
Run;
Data summ2;
Set summ1;
Total=a+b+c+d;
Run;
Data summ3;
a=2;
b=3;
c=4;
d=.;
e=5;
g=.;
Run;
Data summ4;
Set summ3;
Total=a+b+c+d+e+g;
Run;
Sum Function
Data summ5;
Set summ3;
Total=Sum(a,b,c,d,e,g);
Run;
Data Loan_info;
Infile datalines dsd;
Input Loan_id$ Cust_Name : $15. Age Loan_amt1:dollar5.
Loan_amt2:dollar5. Loan_amt3 dollar5.;
FormatLoan_amt1 dollar5.Loan_amt2 dollar5.Loan_amt3 dollar5.;
Datalines;
LP101,Ravi Sinha,23,$3000,$3500,$2000
LP102,Alan Nance,29,$2500,$1500,
LP103,Brown lee,31,$5000,$1000,$2000
LP104,Ashley McKnight,22,$1500, ,$3000
LP105,Jim Brown,25,$4500,$1000,$1200
;
Run;
Data Loan_info;
Set Loan_info;
Format Total1 dollar6. Total2 dollar6.;
Total1=Loan_amt1+Loan_amt2+Loan_amt3; /* Sum Statement */
Total2=Sum(Loan_amt1,Loan_amt2,Loan_amt3); /* Sum Function */
Run;
RETAIN STATEMENT:
Retain the values of the variable in subsequent iterations of the data step.
Retain statement prevents SAS form re-initializing the values of new variables
At the top of data step and can be used to create an accumulator variable.
Syntax: -RETAIN <element-list(s)<initial-value(s)
Examples:-
Data Ret1;
Input Id Mon$ Sales;
Datalines;
101 Jan 230
102 Feb 320
103 Mar 210
104 Apr 210
105 May 180
106 Jun 310
;
Run;
Data Ret2;
Set Ret1;
Retain total 0;
Total=Total+Sales;
Run;
Data Ret3;
Retain Row;
Set Ret1;
Row+1;
Run;
STOP STATEMENT
Stops execution of the current Data step.
It create 0 observations dataset
Syntax: -STOP;
Examples:-
Data DS1;
Stop;
Infile datalines;
Input Loan_id$ Cust_Name : &$15. Loan_amt: dollar5.;
/*Stop;*/
Format Loan_amt dollar5.;
Datalines;
LP101 Ravi Sinha $4500
LP102 Alan Nance $7000
LP103 Brown lee $6000
LP104 Jim Brown $5000
LP105 McKnight $8000
;
Run;
The data set WORK.DS1 has 0 observations and 3 variables.
Data DS2;
Stop;
Set sashelp.class;
Run;
The data set WORK.DS2 has 0 observations and 5 variables.
In above programs when input/set statement executes data should read into input buffer
but we are using stop statement so it wont read into input buffer, so pdv cant assign
data to variables thats why we are getting 0 observations dataset.
IF Statement, Subsetting
Continues processing only those observations that meet the condition.
Syntax:- IF expression;
Examples:-
Data ds1;
Input idno name $ team $ strtwght endwght;
Cards;
1023 David red 189 165
1049 Amelia yellow 145 124
1219 Alan red 210 192
1246 Ravi yellow 194 177
1078 Ashley red 127 118
1221 Jim yellow 220 .
;
Run;
Data ds2;
Set ds1;
If team='red';
Run;
IF-THEN Statement
Executes a SAS statement for observations that meet specific conditions
Syntax:-IF expression THEN statement;
Datads3 ;
Set ds1;
If team='red' then team=1;
Run;
Data ds3a ;
Setds1;
If team='red'then team=1;
If team='yellow'then team=2;
If team='green'then team=3;
If team='blue'then team=4;
Run;
Data ds3b ;
Setds1;
If team='red'then team='R';
If team='yellow'then team='Y';
If team='green'then team='G';
If team='blue'then team='B';
Run;
Data ds3c ;
Setds1;
If team='red'then team1='R';
If team='yellow'then team1='Y';
If team='green'then team1='G';
If team='blue'then team1='B';
Run;
IF-THEN/ELSE Statement
Executes a SAS statement for observations that meet specific conditions
Syntax:-IFexpressionTHENstatement; <ELSEstatement ;>
Data ds4 ;
Setds1;
If team='red'then team=1;
Else team=2;
Run;
Data ds4a ;
Setds1;
If team='red'then team=1;
Elseif team='yellow'then team=2;
Elseif team='green'then team=3;
Else team=4;
Run;
IF-THEN/ELSE OUTPUT
Executes a SAS statement for observations that meet specific conditions
Using this we can create multiple datasets at a time based on conditions.
Data ds5 ds6 ;
Setds1;
If team='red'thenoutput ds5;
Elseoutput ds6;
Run;
Data ds5 ds6 ds7 ds8 ;
Setds1;
If team='red'thenoutput ds5;
Elseif team='yellow'thenoutput ds6;
Elseif team='green'thenoutput ds7;
Elseoutput ds8;
Run;
IF-THEN/ELSE DELETE
Executes a SAS statement for observations that meet specific conditions
Using this we can delete observations based on condition
Data ds9;
Set ds1;
If team='red'thendelete;
Run;
Data ds9 ds10;
Set ds1;
If team='red'then delete;
Else output ds10;
Run;
WHERE Statement
Selects observations from SAS data sets that meet a particular condition
Syntax:-
WHEREwhere-expression-1<logical-operator>where-expression-n;
Data ds2a;
Set ds1;
Where pid=>101;
Run;
Data ds2b;
Set ds1;
Where drug='asp-10mg';
Run;
Data ds2c;
Set ds1;
Where date='15jan2005'd;
Run;
Where with Operators
WHERE AND
Data ds3a;
Set ds1;
where visit_date >'12jan2005'd and visit_date <'20jan2005'd ;
Run;
Data ds3b;
Set ds1;
Where p_id >101 and p_id <104 ;
Run;
WHERE BETWEEN
Data ds4;
Set ds1;
Wherevisit_date between '15jan2005'd and '21jan2005'd ;
Run;
WHERE IN
Data ds5;
Set ds1;
where p_id in (102103 ) ;
Run;
WHERE LIKE
Like operator is useful to select data with particular letter in a variable
Data ds6;
Input p_id drug_name$ visit_date date9.;
Format visit_date date9.;
Cards;
101 asp-05mg 12jan2005
102 asp-10mg 14jan2005
101 bsp-05mg 18jan2005
102 aap-10mg 12jan2005
101 csp-05mg 21jan2005
103 amp-15mg 12jan2005
101 dsp-05mg 30jan2005
102 dsp-10mg 12jan2005
;
Run;
Data ds6a;
Set ds6;
Where drug_name like 'c%' ;
Run;
Data ds6b;
Set ds6;
Where drug_name like '_a%' ;
Run;
Data ds6c;
Set ds6;
Where drug_name like '_____5%' ;
Run;
Data ds6d;
Set ds6;
Where drug_name like '%g' ;
Run;
Data ds6e;
Set ds6;
Where drug_name like '%m_' ;
Run;
Data ds6f;
Set ds6;
Where drug_name like '%0__' ;
Run;
WHERE CONTAINS(?)
Select the data where ever that letter is there in variable
But letter is case sensitive because it works on only character
Data ds1;
Infile datalines;
Length name $12.;
Input name$ sex$ sal dollar5.;
Format sal dollar6.;
Datalines;
Ramakrishna m $5000
pragna f $3500
Raju m $4500
Mohanprasad m $6000
;
Run;
Data ds2;
Set ds1;
Where name contains 'r';
Run;
Data ds3;
Set ds1;
Where name contains 'R';
Run;
Data ds4;
Set ds1;
Where name ? 'R';
Run;
WHERE NULL/MISSING
Select the data only null/missing values
Data ds1;
Input p_id 3. +1 drug_name$8. +1 visit_date date9.;
Format visit_date date9.;
Cards;
101 asp-05mg 12jan2005
102 asp-10mg 14jan2005
101 bsp-05mg 18jan2005
102 12jan2005
101 csp-05mg 21jan2005
103 amp-15mg 12jan2005
101 30jan2005
102 dsp-10mg 12jan2005
;
Run;
Data ds1a;
Set ds4;
where drug_name is null;
run;
Data ds1b;
Set ds4;
where drug_name is missing;
run;
WHERE SOUNDS-LIKE
Select the data only when sound is same .
Even spelling is different also it will pick if pronunciation is same.
Data ds1;
Input p_id p_name$ drug_name$ visit_date date9.;
Format visit_date date9.;
Cards;
101 john asp-05mg 12jan2005w
102 smith asp-10mg 14jan2005
101 smit bsp-05mg 18jan2005
102 clark aap-10mg 12jan2005
101 manish csp-05mg 21jan2005
103 clarc amp-15mg 12jan2005
101 ronald dsp-05mg 30jan2005
102 ronold dsp-10mg 12jan2005i
;
Run;
Data ds1a;
Set ds1;
where p_name='smith';
Run;
Data ds1b;
Set ds1;
where p_name=*'smith';
Run;
Concatenation:
Combining two or more SAS Datasets into a single SAS Dataset one after other
using SET Statement.
The number of observations in new sas dataset is equal to the sum of the number
observations from original datasets.
Ex;
DS3(20) = DS1(10) + DS2(10)
The new data set contains all observations from DS1 followed by all observations from DS2
DS1 OUTPUT
DS2 OUTPUT
Syntax:-
Set dataset(s);
Examples:-
If original datasets contain same variables, the variables in new dataset are
same as the variables in the original datasets.
Data ds1;
Infile datalines;
Input P_id Drug_name$ Visit_date;
InformatVisit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
101 asp-05mg 18Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 30Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 23Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds2;
Infile datalines;
Input P_id Drug_name$ Visit_date;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3;
Set ds1 ds2;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date Age;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011 34
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 23
104 asp-20mg 12Jan2011 28
101 asp-05mg 16Jan2011 21
102 asp-10mg 12Jan2011 30
101 asp-05mg 17Jan2011 28
103 asp-15mg 12Jan2011 23
103 asp-15mg 12Jan2011 32
101 asp-05mg 15Jan2011 25
;
Run;
Data ds3;
Set ds1a ds2;
Run;
Data ds3;
Set ds1(firstobs=4) ds2;
Run;
Data ds3;
Set ds1(firstobs=4) ds2(obs=7);
Run;
Data ds3;
Set ds1 ds2(firstobs=4 obs=8);
Run;
Point=Slice
We can use this option for selecting particular observations from dataset.
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
103 asp-10mg 12Jan2011
104 asp-05mg 21Jan2011
105 asp-15mg 12Jan2011
106 asp-10mg 12Jan2011
107 asp-15mg 12Jan2011
;
Run;
Data ds2;
Slice=2;
Set ds1 point=slice;
Output;
Stop;
Run;
Data ds3;
Do slice=2,4,5;
Set ds1 point=slice;
Output;
End;
Stop;
Run;
Concatenation with multiple SET statements (one to one reading)
Combines observations from two or more SAS Datasets into a one observation
using two or more SET statements. The new Dataset contains all the variables
from all Input Datasets.
Syntax:-
Set dataset1;
Set dataset2;
Set datasetN;
Data ds3b;
Set ds2;
Set ds1;
Run;
Data ds1;
Infile Datalines;
Input a b c d;
Datalines;
1234
4567
;
Run;
Data ds2;
Infile Datalines;
Input a b c;
Datalines;
345
678
;
Run;
Data ds3; In this example second dataset variables (a b c) are same with first
Set ds1; dataset so a b c variables come from second dataset and d variable
Set ds2; comes from first dataset. Here same variables over write from
Run; second dataset and extra variables comes from any one dataset
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date Sex$ ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 m
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 f
103 asp-15mg 12Jan2011 m
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;
Both datasets contains same variables and different number of observations
Both datasets contain same variables so second dataset values are overwrite on first
dataset but no of observations are different so which ever dataset contains less
observations that many observations come from second dataset.
Data ds1;
Infile Datalines;
Input a b c ;
Datalines;
123
456
789
;
Run;
Data ds2;
Infile Datalines;
Input a b c;
Datalines;
345
678
;
Run;
Data ds3; In this example second dataset ds2 variables (a b c) are same with
Set ds1; first dataset ds1 so a b c variable values come from ds2 dataset.
Set ds2; first dataset(ds1) 3 obs are there but second dataset(ds2) 2 obs are
Run; there, will give in output 2 observations only from ds2.
Data ds3; In this example second dataset (ds1) variables (a b c) are same with
Set ds2; dataset ds2 so a b c variable values come from second dataset. in
Set ds1; first dataset(ds2) 2 obs are there but second dataset(ds1) 3 obs are
Run; there so it will give in output 2 observations only
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011
102 asp-10mg 14Jan2011
102 asp-10mg 12Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 12Jan2011
102 asp-10mg 12Jan2011
103 asp-15mg 12Jan2011
104 asp-20mg 15Jan2011
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;
Both datasets contains different variables and different number of observations
If first dataset contains more observations and both datasets contains different variables
second dataset overwrite on first dataset values and unmatched variables also comes in
output dataset
But second dataset contains more observations and both datasets different variables it
will read data from second dataset only lowest number of observations come to output
dataset from second dataset and unmatched variables also comes into output dataset
Data ds1;
Infile Datalines;
Input a b c ;
Datalines;
123
456
789
;
Run;
Data ds2;
Infile Datalines;
Input a b c d;
Datalines;
3456
6789
;
Run;
Data ds3;
Set ds1; In this example all the data comes from second dataset
Set ds2;
Run;
Data ds1;
Infile Datalines;
Input P_id Drug_name$ Visit_date Sex$ ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 12Jan2011 m
102 asp-10mg 14Jan2011 m
102 asp-10mg 12Jan2011 f
101 asp-05mg 21Jan2011 m
103 asp-15mg 12Jan2011 f
102 asp-10mg 12Jan2011 f
103 asp-15mg 12Jan2011 m
104 asp-20mg 15Jan2011 m
;
Run;
Data ds2;
Infile Datalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 11Jan2011
103 asp-15mg 12Jan2011
101 asp-05mg 15Jan2011
104 asp-20mg 12Jan2011
101 asp-05mg 16Jan2011
102 asp-10mg 12Jan2011
;
Run;
Data ds3a;
Set ds1;
Set ds2;
Run;
Data ds3b;
Set ds2;
Set ds1;
Run;
Interleaving:
Use SET statement and BY statement to combine multiple Datasets into single Dataset.
The number of observations in new Dataset is equal to the sum of the number of
observations from original Datasets.
The observations in new Dataset are arranged the values of the BY variables.
We can interleave Datasets using BY variable or using Index.
Note: To perform interleave both Datasets variables should be same, same Data
types, same length and should be sorting order.
Syntax:-
Set Dataset(s);
By variable(S);
Examples:-
Data ds1;
InfileDatalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 02Jan2011
102 asp-10mg 02Jan2011
101 asp-05mg 10Jan2011
102 asp-10mg 10Jan2011
101 asp-05mg 21Jan2011
103 asp-15mg 02Jan2011
;
Run;
Data ds2;
InfileDatalines;
Input P_id Drug_name$ Visit_date ;
Informat Visit_date date9.;
Format Visit_date date9.;
Datalines;
101 asp-05mg 30Jan2011
103 asp-15mg 10Jan2011
102 asp-10mg 20Jan2011
104 asp-05mg 02Jan2011
;
Run;
Data ds3;
Set ds1 ds2;
By p_id;
Run;
ERROR: BY variables are not properly sorted on Data set WORK.DS1.
As per syntax rules both Dataset should be in sorting order
ProcsortData=ds1;
By p_id;
Run;
ProcsortData=ds2;
By p_id;
Run;
Data ds3;
Set ds1 ds2;
By p_id;
Run;
First.Variable:-
Value is 1 for the first observation in the by group and value 0 for all other
observations in the by group
Last.Variable:-
Value is one for the last observation in the by group and value 0 for all other
observations in the by group
Banking Example
Data ds1;
Infile Datalines;
Input Loan_no$ 1-5 Customer $7-15 Loan_amt Loan_date ;
Informat Loan_date date9. Loan_amt dollar5. ;
Format Loan_date date9. Loan_amt dollar5. ;
Datalines;
LP101 RaviSinha 3000 02Jan2011
LP102 AlanNance 2500 02Jan2011
LP101 RaviSinha 5000 10Jan2011
LP102 AlanNance 1500 10Jan2011
LP101 RaviSinha 4500 20Jan2011
LP103 JimBrown 4500 02Jan2011
;
Run;
Procsortdata=ds1;
By Loan_no;
Run;
Data ds2;
Infile Datalines;
Input Loan_no$ 1-5 Customer $7-15 Loan_amt Loan_date ;
Informat Loan_date date9. Loan_amt dollar5. ;
Format Loan_date date9. Loan_amt dollar5. ;
Datalines;
LP101 RaviSinha $3000 30Jan2011
LP103 JimBrown $2500 10Jan2011
LP102 AlanNance $5000 20Jan2011
LP104 AshleyMcK $1500 01Jan2011
;
Run;
Procsortdata=ds2;
By Loan_no;
Run;
Data ds3;
Set ds1 ds2;
By Loan_no;
Run;
MERGE
1) Merge in DATASTEP
Data ds3;
Merge ds1 ds2;
Run;
Data ds1;
InfileDatalines;
Input id name$ sex$ address$;
Datalines;
001 abc m bang
002 def m hyd
003 jkl f che
004 mno f bang
005 xyz m mum
006 asd f hyd
;
Run;
Data ds2;
InfileDatalines;
Input dob date9. doj:date9. sal;
Format dob date9. doj date9.;
Datalines;
01Feb1983 12Jan2011 45000
23Mar1983 20Jan2011 50000
12Oct1983 13Feb2011 34000
02Jan1984 19May2011 28000
28Apr1985 11Jun2011 29000
;
Run;
Data ds3;
Merge ds1 ds2;
Run;
Syntax:-
MergeDataset(s);
By variable(s);
Examples:-
Data ds1;
InfileDatalines;
Input id name$ sex$ address$;
Datalines;
001 abc m bang
002 def m hyd
003 jkl f che
004 mno f bang
005 xyz m mum
006 asd f hyd
;
Run;
ProcsortData=ds1;
By id;
Run;
Data ds2;
InfileDatalines;
Input id dob date9. doj:date9. sal ;
Format dob date9. doj date9.;
Datalines;
001 01Feb1983 12Jan2011 45000
004 23Mar1983 20Jan2011 50000
005 12Oct1983 13Feb2011 34000
002 02Jan1984 19May2011 28000
003 28Apr1985 11Jun2011 29000
;
Run;
ProcsortData=ds2;
By id;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;
In above example both ds1 and ds2 datasets contains common variable id, based on
common variable need to sort both datasets and can perform match merge.
Data demo;
Infile datalines;
Input name$ 1-25 age 27-28 sex$30;
Datalines;
Vincent, Martina 34 F
Phillipon, Marie-Odile 28 F
Gunter, Thomas 27 M
Harbinger, Nicholas 36 M
Benito, Gisela 32 F
Rudelich, Herbert 39 M
Sirignano, Emily 12 F
Morrison, Michael 32 M
;
Run;
Procsortdata=demo;
By name;
Run;
Data finance;
Infile datalines;
Input ssn$ 1-11 name$ 13-40 salary;
Datalines;
074-53-9892 Vincent, Martina 35000
776-84-5391 Phillipon, Marie-Odile 29750
929-75-0218 Gunter, Thomas 27500
446-93-2122 Harbinger, Nicholas 33900
228-88-9649 Benito, Gisela 28000
029-46-9261 Rudelich, Herbert 35000
442-21-8075 Sirignano, Emily 5000
;
Run;
Procsortdata=finance;
By name;
Run;
Data new;
Merge demo (drop=age) finance;
By name;
Run;
Techniques, Tricks, and Traps in Match Merge
A common mistake in Match merge is forget to include the BY statement.
Data ds1;
Infile datalines;
Input id$ name$;
Datalines;
A01 SUE
A02 TOM
A05 KAY
A10 JIM
;
Run;
Data ds2;
Infile datalines;
Input id$ age sex$;
Datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Data ds3;
Merge ds1 ds2;
Run;
Even with this simple example, there is already a hint of problems. Observe that the
records A05 (3rd record from ds1) is not there in ds2. So A04(3rd record from ds2) is
merging with A05record.Notice that the A05 ID is lost in this merge and thename Kay is
moved from ID=A05 to ID=A04, and onedoes not even get a note or error I log to say
that something is wrong. This kind of merge we getting wrong output so before
performing match merge both datasets should be in sorting order and need to use that
BY statement with Merge statement like below
Data ds1;
Infile datalines;
Input id$ name$;
Datalines;
A01 SUE
A02 TOM
A05 KAY
A10 JIM
;
Run;
Procsortdata=ds1;
By id;
Run;
Data ds2;
Infile datalines;
Input id$ age sex$;
Datalines;
A01 58 F
A02 20 M
A04 47 F
A10 11 M
;
Run;
Procsortdata=ds2;
By id;
Run;
Data ds3;
Merge ds1 ds2;
By id;
Run;
Data ds3a;
Merge ds2 ds1;
By id;
Run;
In above example, shows how reversing the order of thedata sets in the merge
statement can sometimes change thevalues and records in the output file. In this case
merging is happening correctly because second dataset is scanning first in pdv so length
For id are 4 for both datasets in output so we can get proper results.
1) zero-to-one
2) one-to-zero
3) one-to-one
4) one-to-many
5) many-to-one
6) few-to-many
7) many-to-few
8) many-to-many
The many-to-many match-merge is essentially a one-to-one Merge (Merge with out by)
and has the samedrawbacks and dangers. Specifically, one has very littlecontrol over the
actual order of the records within the BYgroup for each of the input data sets.
For example, how does one know that the first value of x=24 is supposed to be matched
with the first value of y=4 Why shouldnt x=24 be matched with y=91 (the second value
of y)? If great care is not taken, a many-to-many merge can result in random matching
of variable values.
This Many-to-Many merge is dangerous and unreliable sometimes so program has to
take care and he has to choose some additional by variables to merge properly
In above example there are two BY groups. The first outputrecord is the same as in a
one-to-one match-merge. Butfor the second record in the ds2 dataset there is
nocorresponding ds1 record, so SAS retains the x value from the first ds1 record and
passes it to thesecond output record.
JOINS :
1. LEFT JOIN
2. RIGHT JOIN
3. INNER JOIN
4. FULL JOIN.
When we perform left outer join all the Data comes from left table and matching Data
comes from right table into output Dataset.
Data patdata;
InfileDatalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Input p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a;*/
If a thenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*,b.* from patdata a leftouterjoin adverse b on a.p_id=b.p_id;
Quit;
When we perform right outer join all the Data comes from right table and matching Data
comes from left table into output Dataset.
Data patdata;
InfileDatalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Input p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If b;*/
If bthenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selectb.*,a.* from patdata a rightouterjoin adverse b on b.p_id=a.p_id;
Quit;
When we perform inner join all matching Data comes from both Datasets into output
Dataset.
Data patdata;
Infile Datalines;
Input p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
Infile Datalines;
Infile p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a and b;*/
If a and b thenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*, b.* from patdata a, adverse b where a.p_id=b.p_id;
Quit;
Procsql;
Createtable pat_adverse as
Selecta.*, b.* from patdata a innerjoin adverse b on a.p_id=b.p_id;
Quit;
When we perform full outer join all the Data comes from all the tablesinto output
Dataset.
Data patdata;
InfileDatalines;
Infile p_id trt_code$;
Datalines;
101 A
102 A
103 B
104 B
;
Run;
ProcsortData=patdata;
By p_id;
Run;
Data adverse;
InfileDatalines;
Infile p_id event$;
Datalines;
101 headaches
107 fever
103 fracture
109 nausea
;
Run;
ProcsortData=adverse;
By p_id;
Run;
/*Thru Data step*/
Data pat_adverse;
Merge patdata(in=a) adverse(in=b);
By p_id;
/*If a or b;*/
If a or bthenoutput;
Run;
/*Thru SQL*/
Procsql;
Createtable pat_adverse as
Selecta.*,b.* from patdata a fullouterjoin adverse b on a.p_id=b.p_id;
Quit;
UPDATE
FUNCTIONS
Character Functions
Data ds;
Infile datalines;
Input year 1-4 pres $ 6-29 vicepres $ 31-55 result $ 60-64 ;
Datalines;
1920 James M. Cox Franklin D. Roosevelt lost
1924 John W. Davis Charles W. Bryan lost
1928 Alfred E. Smith Joseph T. Robinson lost
1932 Franklin D. Roosevelt John N. Garner won
1936 Franklin D. Roosevelt John N. Garner won
1940 Franklin D. Roosevelt Henry A. Wallace won
1944 Franklin D. Roosevelt Harry S. Truman won
1948 Harry S. Truman Alben W. Barkley won
1952 Adlai E. Stevenson John J. Sparkman lost
1956 Adlai E. Stevenson Estes Kefauver lost
1960 John F. Kennedy Lyndon B. Johnson won
1964 Lyndon B. Johnson Hubert H. Humphrey won
1968 Hubert H. Humphrey Edmund S. Muskie lost
1972 George S. McGovern R. Sargent Shriver Jr. lost
1976 Jimmy Carter Walter F. Mondale won
1980 Jimmy Carter Walter F. Mondale lost
1984 Walter F. Mondale Geraldine Ferraro lost
;
Run;
UPCASE
Converts all letters in an argument to uppercase
Syntax:-Upcase(string)
Example:-
Data ds1;
Set ds;
President=Upcase(pres);
Run;
LOWCASE
Converts all letters in an argument to lowercase
Syntax:-Lowcase(string)
Example:-
Data ds3;
Set ds2;
President=lowcase(president);
Vicepresident=lowcase(vicepresident);
Run;
PROPCASE
Converts all words in an argument to proper case (like I Am Krishna)
Syntax:-Propcase(string)
Example:-
Data ds4;
Set ds2;
President=propcase(president);
Vicepresident=lowcase(vicepresident);
Run;
Data allcase;
a=lowcase('THIS IS A DOG');
b=propcase(a);
c=propcase(lowcase('THIS IS A DOG'));
d=upcase('this is a dog');
Put a=;
Put b=;
Put c=;
Put d=;
Run;
SCAN
Selects a given word from a character expression
Selects particular word from character string
Syntax:-SCAN(string ,n<, delimiter(s)>)
Examples:-
Data scn1;
Set ds;
President=scan(pres,2);
Run;
Data scn2;
Set ds;
President=scan(pres,3);
Run;
Data scn3;
Set ds;
President=scan(pres,-3);
Run;
Data scn4;
Set ds;
President=scan(pres,-1);
Run;
SUBSTR
Takes substrings of matrix elements
Selects particular part from character string
Syntax:-SUBSTR( matrix, position<, length>)
Examples:-
Data sbstr1;
Set ds;
President=substr(pres,1,5);
Run;
Data sbstr2;
a='Radhakrishna Reddy';
b=substr(a,6,7);
Run;
Data sbstr2;
a='Radhakrishna Reddy';
Substr(a,1,5)='Rama';
Run;
CAT
Concatenates character strings without removing leading or trailing blanks
Syntax:-CAT(string-1<, ... string-n>)
Data cat1;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=cat(a,b,c,d);
Put result $char.;
Run;
CATT
Concatenates character strings and removes trailing blanks
Syntax:-CATT(string-1<, ...string-n>)
Data cat2;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catt(a,b,c,d);
Put result $char.;
Run;
CATS
Concatenates character strings and removes leading and trailing blanks
Syntax:-CATS(string-1<, ...string-n>)
Data cat3;
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catt(a,b,c,d);
Put result $char.;
Run;
CATX
Concatenates character strings, removes leading and trailing blanks, and inserts
separators
Syntax:-CATX(separator, string-1<, ...string-n>)
Data cat4;
Separator='*';
a='The Olympic';
b='Arts Festival';
c='includes works by';
d='Dale Chihuly.';
Result=catx(separator,a,b,c,d);
Put result $char.;
Run;
Data cat5;
Separator='%%$%%';
a=' The Olym';
b='pic Arts Festi';
c=' val includes works by D ';
d='ale Chihuly.';
Result=catx(separator,a,b,c,d);
Put result $char.;
Run;
RIGHT
Right aligns a character expression
Syntax:- RIGHT(string);
Data remblank2;
a='My Name Is Ram ';
b=right(a);
Run;
STRIP
Returns a character string with all leading and trailing blanks removed
Syntax:- STRIP(string)
Data remblank3;
Infile datalines;
Input string $char8.;
original = '*' || string || '*';
stripped = '*' || strip(string) || '*';
Datalines;
abcd
abcd
abcd
abcdefgh
xyz
;
Run;
TRIM
Removes trailing blanks from character expressions and returns one blank if the
expression is missing
Syntax:- TRIM(string)
Data remblank4;
Input part1 $ 1-10 part2 $ 11-20;
hasblank=part1||part2;
noblank=trim(part1)||part2;
Put hasblank;
Put noblank;
Datalines;
apple sauce
;
Run;
Data remblank5;
Input part1$ part2$ ;
hasblank=part1||part2;
noblank=trim(part1)||part2;
Put hasblank;
Put noblank;
Datalines;
apple sauce
;
Run;
Data remblank6;
x=" ";
y=">"||trim(x)||"<";
Put y;
Run;
TRIMN
Removes trailing blanks from character expressions and returns a null string (zero
blanks) if the expression is missing
Syntax:- TRIMN(string)
Data remblank6a;
x=" ";
z=">"||trimn(x)||"<";
put z;
Run;
COMPRESS
Removes specific characters(SPACES) from a character string
Syntax:- COMPRESS(<source><, chars><, modifiers>)
Data remblank7;
a='AB C D';
b=compress(a);
Run;
Data remblank8;
x='1 2 3 4 5';
y=compress(x);
Put y;
Run;
COMPBL
Removes multiple blanks from a character string.
Syntax:-Compbl(source)
Data remblank9;
x='my name is ram';
y=compbl(x);
Run;
Data remblank9a;
x='My ';
y=' Name ';
z=' is Ram';
a=x||y||z;
b=compbl(a);
Run;
Data ds1;
Infile datalines;
Input id$ fname$ lname$ sal;
Datalines;
001 mohan arisela 60000
002 padma narni 45000
003 varma maddina 50000
;
Run;
Data remblank10;
Set ds1;
Name1=fname||lname;
Name2=cat(fname,lname);
Name2a=cat(trim(fname),lname);
Name3=compbl(fname||lname);
Run;
TRANSLATE
Replaces specific characters in a character expression
Data trns1;
x=translate('XYZW','AB','VW');
Put x;
Run;
Data trns2;
x=translate('abc','sh', 'cg');
Put x;
Run;
TRANWRD
Replaces or removes all occurrences of a word in a character string
Syntax:-TRANWRD(source,target,replacement)
Data trnw1;
name='Mrs.Radhakrishna Reddy';
name1=tranwrd(name, "Mrs.", "Mr.");
put name name1;
run;
Data trnw2;
Infile datalines;
Input salelist $;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,target,replacement);
Datalines;
CATFISH
;
Run;
Data trnw2a;
Infile datalines;
Input salelist $;
length target $10 replacement $3;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,target,replacement);
Datalines;
CATFISH
;
Run;
The LENGTH statement left-aligns TARGET and pads it with blanks to the length of 10.
This causes the TRANWRD function to search for the character string 'FISH ' in SALELIST
Because the search fails, this line is written to the SAS log: CATFISH
You can use the TRIM function to exclude trailing blanks from a target or replacement
variable. Use the TRIM function with TARGET
Data trnw2b;
Infile datalines;
Input salelist $;length target $10 replacement $3;
target='FISH';
replacement='NIP';
salelist1=tranwrd(salelist,trim(target),replacement);
Datalines;
CATFISH
;
Run;
INDEX
Searches a character expression for a string of characters
Syntax:-INDEX(source,excerpt)
Data ind1;
a='ABC.DEF (X=Y)';
b='D';
x=index(a,b);
Put x;
Run;
Data ind2;
a='ABC.DEF (X=Y)';
b='X=Y';
x=index(a,b);
Put x;
Run;
Dataind3;
Infile datalines;
input name $ 1-12 age;
Datalines;
Harvey Smith 30
John West 35
Jim Cann 41
James Harvey 32
Harvy Adams 33
;
Run;
Now, let's use the index function to find the cases with "Harvey" in the name
Data ind3a;
Set ind3;
x = index(name, "Harvey");
Run;
INDEXC
Searches a character expression for special characters, and returns the position of the
characters
Syntax:-INDEXC(source,excerpt-1<,... excerpt-n>)
Data indc1;
a='ABC.DEP (X2=Y1)';
x=indexc(a,'.');
Run;
Data indc2;
a='ABC.DEP (X2=Y1)';
b='=';
x=indexc(a,b);
Run;
INDEXW
Searches a character expression for a specified string as a word
Syntax:-INDEXW(source, excerpt<,delimiter>)
Data indw1;
s='asdf adog dog';
p='dog ';
x=indexw(s,p);
Run;
Data indw2;
s='abcdef x=y';
p='def';
x=indexw(s,p);
Run;
LENGTH
Returns length of string
Syntax:-LENGTH(string)
Data len;
a='Mr.Krishna';
b=length(a);
Run;
REVERSE
Returns string in reverse order
Syntax:-REVERSE(string)
Data rev;
a='Mr.Krishna';
b=reverse(a);
Run;
QUOTE
Ads double quotes to character values
Syntax:-QUOTE(string)
Data quot1;
a='Mr.Krishna';
b=quote(a);
Run;
DEQUOTE
Removes double quotes to character values
Syntax:-DEQUOTE(string)
Data quot2;
Set quot1;
c=dequote(a);
Run;
Data quot3;
Infile datalines;
Input id name$ sal;
Datalines;
001 abc 5000
002 def 6000
003 xyz 7000
;
run;
Data quot3a;
Set quot3;
name1=quote(name);
name2=quote(trim(name));
name3=dequote(name2);
Run;
RANK
Returns the position of a character in the ASCII or EBCDIC collating sequence.
Syntax:-RANK(x)
The RANK function returns an integer that represents the position of the first character in
the character expression. The result depends on your operating environment.
Data rnk1;
Infile datalines;
Input id name$ sal;
Rank_var=RANK(name);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;
Datarnk2 ;
a=Rank('A');
b=Rank('krishna'); /* It gives position of first character only*/
Run;
REPEAT
Returns a character value that consists of the first argument repeated n+1 times.
Syntax:- Repeat(Argument,n)
Data rep;
Infile datalines;
Input id name$ sal;
x=repeat(name,10);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;
SOUNDEX
Encodes a string to facilitate searching.
Encodes a string and gives same result for same pronunciation strings in variable
Syntax:- SOUNDEX(Argument)
Data snd;
Infile datalines;
Input id name$ sal;
y=soundex(name);
Datalines;
001 clarc 5000
002 def 4000
003 clark 7000
;
Run;
COLLATE
Returns a character string in ASCII or EBCDIC collating sequence.
Syntax:- (start-position<,end-position>) | (start-position<,,length>)
Data col1;
x=collate(45,99);
put @1 x ;
Run;
Data col2;
x=collate(1,,49);
put @1 x ;
Run;
ASCII Result
Data col3;
x=collate(48,,10);/*start-position<,,length*/
y=collate(48,57);/*start-position<,end-position */
put @1 x @14 y;
Run;
EBCIDIC Result
Data col4;
x=collate(240,,10); /*start-position<,,length*/
y=collate(240,249); /*start-position<,end-position */
put @1 x @14 y;
Run;
Numeric Functions
MEAN
Returns the arithmetic mean (average)
Argument is numeric At least one non-missing argument is required otherwise, the
function returns a missing value
Syntax: -MEAN(argument<,argument,...>)
Data ds1;
x1=mean(2,.,.,6);
x2=mean(2,4,5,6);
x3=mean(x1-x2); /*x3=mean(4-4.25)=-0.25/1=-0.25*/
x4=mean(of x1-x2); /*it means x1, x2 means 4,4.25 means 8.25/2=4.125*/
x5=mean(x1,x2);
Run;
MEDIAN
Computes median values Category: Descriptive Statistics
Syntax: -MEDIAN(value1<, value2, ...>)
Data ds2;
x=median(2,4,1,3);
y=median(5,8,0,3,4);
z=median(5,.,0,.,4);
Run;
Difference between MEAN & MEDIAN
Mean will give average of numeric values
Ex:- x=mean(70,60,80,75,90)
it gives
x=70+60+80+75+90/5
x=375/5=75
Ex:-
x=median(2,4,1,3);
in above example mid value is 4,1
it means 4+1=5
median value is 5/2=2.5;
MIN
Returns the smallest value
Syntax: -MIN(argument,argument,...)
Data ds3;
x1=min(7,4);
x2=min(2,.,6);
x3=min(2,-3,1,-1);
x4=min(0,4);
x6=min(of x1-x3);
x7=min(x1,x3);
Run;
MAX
Returns the largest value
Syntax:-MAX(argument,argument,...)
Data ds4;
x=max(8,3);
x1=max(2,6,.);
x2=max(2.-3,1,-1);
x3=max(3,.,-3);
x4=max(.,.,.);
x5=max(of x1-x3);
Run;
Argument
is numeric. At least two arguments are required. The argument list may consist of a
variable list, which is preceded by OF.
The MAX function returns a missing value (.) only if all arguments are missing.
RANGE
Returns the range of values
Syntax:- RANGE(argument,argument,...)
argument
is numeric At least one nonmissing argument is required. Otherwise, the function returns
a missing value. The argument list can consist of a variable list, which is preceded by OF.
The RANGE function returns the difference between the largest and the smallest of the
nonmissing arguments.
Data ds5;
x1=range(.,.);
x2=range(-2,6,3);
x3=range(2,6,3,.);
x4=range(1,6,3,1);
x5=range(of x1-x3);
run;
SUM
/*SUM Function*/
Returns the sum of the nonmissing arguments
Syntax:-SUM(argument,argument, ...)
argument
is numeric If all the arguments have missing values, the result is a missing value
The argument list can consist of a variable list, which is preceded by OF
Data ds6a;
x1=sum(4,9,3,8);
x2=sum(4,9,3,8,.);
x3=sum(of x1-x2);
Run;
Data ds6b;
x1=5;
x2=6;
x3=4;
x4=9;
y1=34;
y2=12;
y3=74;
y4=39;
result=sum(of x1-x4, of y1-y5);
Run;
Data ds6c;
x1=55;
x2=35;
x3=6;
x4=sum(of x1-x3, 5);
Run;
Data ds6d;
x1=7;
x2=7;
x5=sum(x1-x2);
Run;
Data ds6e;
y1=20;
y2=30;
x6=sum(of y:);
Run;
/*Sum Statement*/
Data ds6;
x1=sum(4+9+3+8);
x2=sum(4+.+9+3+8+.);
Run;
SUM Function returns the sum of non missing values
ex:- x2=sum(4,.,9,3,8,.);
it gives value 24
SUM Statement Adds the value into variable with non missing values
its wont consider missing values.
if missing value are there value is .
ex:- x2=sum(4+.+9+3+8+.);
it gives value .
CEIL
Returns the smallest integer that is greater than or equal to the argument, fuzzed to
avoid unexpected floating-point results
Syntax :-CEIL (argument)
Data ds7;
var1=2.1;
a=ceil(var1);
Run;
Data ds7;
b=ceil(-2.4);
Run;
FLOOR
Returns the largest integer that is less than or equal to the argument, fuzzed to avoid
unexpected floating-point results Category: Truncation
Syntax :-FLOOR (argument)
Data ds8;
var1=2.1;
a=floor(var1);
Run;
Data ds8;
b=floor(-2.4);
Run;
ABS
Returns the absolute value
Syntax :-ABS (argument)
Data ds9;
x1=abs(2.4);
x2=abs(-3);
Run;
INT
Returns the integer value, fuzzed to avoid unexpected floating-point results.
Syntax:-INT(argument)
Data ds10;
x1=INT(2.4);
x2=INT(2.5);
x3=INT(2.8);
X4=INT(-2.4);
Run;
MOD
Returns the remainder from the division of the first argument by the second argument,
fuzzed to avoid most unexpected floating-point results.
Syntax:-MOD (argument-1, argument-2)
Data ds11;
X1=MOD(10,3);
Run;
Data ds;
A=123456;
X=INT(A/1000);
Y=MOD(A,1000);
Z=MOD(INT(A/100),100);
Run;
ROUND
Rounds the first argument to the nearest multiple of the second argument, or to the
nearest integer when the second argument is omitted.
Syntax:-ROUND (argument <,rounding-unit>)
Data ds12;
x1=ROUND(2.4);
x2=ROUND(2.5);
x3=ROUND(2.8);
X4=ROUND(-2.4);
X4=ROUND(-2.5);
Run;
VAR
Returns the variance
Syntax:-VAR(argument,argument, ...)
argument
is numeric. At least two nonmissing arguments are required. Otherwise, the function
returns a missing value. The argument list can consist of a variable list, which is
preceded by OF.
Data ds13;
x1=Var(4,2,3.5,6);
x2=Var(4,6,.);
x3=Var(of x1-x2);
Run;
SQRT
Returns the square root of a value Category: Mathematical
Syntax :-SQRT(argument)
argument
is numeric and must be nonnegative
Data ds14;
x1=sqrt(36);
x2=sqrt(25);
x3=sqrt(4.4);
x4=sqrt(-49);
Run;
NMISS
Returns the number of missing values
Syntax :-NMISS(argument<,...argument-n>)
argument
is numeric. At least one argument is required. The argument list may consist of a
variable list, which is preceded by OF.
Data ds15;
x1=nmiss(1,0,.,2,5,.);
x2=nmiss(1,0);
x3=nmiss(of x1-x2); /*x1=2 x2=0 so 2,0 it gives 0*/
Run;
N
Returns the number of non missing values
Syntax:-NMISS(argument<,...argument-n>)
argument
is numeric. At least one argument is required. The argument list may consist of a
variable list, which is preceded by OF.
Data ds16;
X1=n(1,0,.,2,5,.);
X2=n(1,0);
X3=n(of x1-x2);
Run;
LAG
Returns values from a queue.
Syntax:-LAG<n>(argument)
Data lg1;
input x @@;
a=lag1(x);
b=lag2(x);
c=lag3(x);
d=lag(x);
datalines;
123456
;
Run;
Data lg2;
input x @@;
y=lag1(x+10);
z=lag2(x);
datalines;
123456
;
Run;
ANY DIGIT
Searches a character string for a digit and returns the first position at which it is found
Syntax:-ANYDIGIT(string <,start>)
DATA SEARCH_NUM;
INPUT STRING $60.;
dg = ANYDIGIT(STRING);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;
ANY SPACE
Searches a character string for space returns the first position at which it is found
Syntax:-ANYSPACE(string <,start>)
DATA SEARCH_SPACE;
INPUT STRING $60.;
sp= ANYSPACE(STRING);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;
How can you separate numeric values from alpha numeric value
DATA EN;
INPUT STRING $60.;
START = ANYDIGIT(STRING);
END = ANYSPACE(STRING,START);
IF START NE 0THEN
NUM = INPUT(SUBSTR(STRING,START,END-START),9.);
DATALINES;
This line has a 56 in it
two numbers 123 and 456 in this line
No digits here
;
run;
INPUT
Converts data values from character to numeric data type with help of Informat
Syntax:-Input(variable, informat);
Example:-
Data ds1;
Infile datalines;
Input id$ name$ sal;
Datalines;
001 abc 60000
002 def 45000
003 xyz 50000
;
Run;
PUT
Convertsdata values from numeric to character data type with help of Format
Syntax:-put(variable, format);
Example:-
Data ds2;
Infile datalines;
Input id name$ sal;
Datalines;
001 abc 60000
002 def 45000
003 xyz 50000
;
Run;
Data nc/*(drop=id rename=(id1=id))*/;
Set ds2;
id1=put(id, $8.);
Run;
Date Functions
The SAS system stores Time as the number of elapsed seconds since midnight
of that particular day.
And SAS system stores Datetime variables as the number of elapsed seconds
since midnight January 1, 1960 12:00 am
And SAS system stores Date variables as the number of days since midnight
January 1, 1960
Dates before January 01,1960 are negative integers, after January 01, 1960 are positive
integers
DATE
Returns the current date as a SAS date value
Returns todays date as as a SAS date value
Syntax: - DATE()
Data ds1;
date1=date();
Run;
Data ds1a;
date1=date();
Format date1 date9. ;
Run;
TODAY
Returns the current date as a SAS date value
Syntax:-TODAY()
Data ds2;
Day=today();
Format day date9.;
Run;
DATETIME
Returns the current date and time of day as a SAS datetime value
Syntax:-DATETIME()
Data ds3;
a=datetime();
Format a datetime20.;
Run;
TIME
Returns the current time of day
Syntax:-TIME()
SAS assigns current system time as a SAS time value corresponding to 15:32:00 if the
following statements are executed exactly at 3:32 PM:
Its gives 24 hour format
Data ds4;
Time=time();
Format time time. ;
Run;
DAY
Returns the day of the month from a SAS date value
Syntax:-DAY()
Data ds5;
a='29Jan2010'd;
Day=day(a);
Run;
Data ds5a;
a=date();
b= day(a);
Format a date9.;
Run;
WEEK
Returns the week-number value
Syntax:-WEEK (<SAS_Date>, <descriptor>)
Data ds6;
X=week('29Jan2010'd);
Y= week('10Feb2010'd);
Z= week('31Dec2010'd);
Run;
Data ds6a;
X=date();
Y=week(x);
Format x date9. ;
Run;
WEEKDAY
Returns the day of the week from a SAS date value
For example 17Oct1991 Returns 5 because 17Oct1991 was Thursday so its 5
Syntax:-WEEKDAY(date)
Data ds7;
week1=weekday('16Mar1997'd);
Run;
Data ds7a;
a=date();
week1=weekday(a);
Run;
MONTH
Returns the month from a SAS date value
Syntax:-MONTH (date)
Data ds8;
a='29Jan2010'd;
Mon=month(a);
Run;
Data ds8a;
a=today();
Mon=month(a);
Run;
QTR
Returns the quarter of the year from a SAS date value
Syntax:-QTR(date)
Data ds9;
a='29Jan2010'd;
Quarter=qtr(a);
Run;
Data ds9a;
a='15Nov2010'd;
b=today();
Quarter1=qtr(a);
Quarter2=qtr(b);
Run;
YEAR
Returns the year from a SAS date value
Gives four-digit numeric value that represents the year
Syntax:-YEAR(date)
Data ds10;
Date='25dec97'd;
y=year(date);
Run;
DHMS
Returns a SAS datetime value from date, hour, minute, and second
Syntax: -DHMS (date, hour, minute, second)
Data ds11;
a=dhms('15Nov2010'd,10,02,15);
Format a datetime. ;
Run;
Data ds11a;
a=dhms('15Nov2010'd,10,02,61);
b=dhms('15Nov2010'd,10,02,61);
Format a datetime. ;
Format b datetime20. ;
Run;
Data ds11b;
a=dhms('15Nov2010'd,10,.2,11);
Format a datetime.;
Run;
HMS
Returns a SAS time value from hour, minute, and second values
Syntax: -HMS (hour, minute, second)
Data ds12;
a=HMS(10,02,15);
Format a time.;
Run;
Data ds12;
a=HMS(10,02,15);
b=HMS(10,02,15);
c=HMS(10,02,15);
Format a time.;
Format b time5.;
Format c time8.;
Run;
HOUR
Returns the hour from a SAS time or datetime value
Syntax: - HOUR (<time | datetime>)
Data ds13;
a=hour('10:30't);
Run;
Data ds13a;
a='10:30:05't;
b=hour(a);
Format a time8. ;
Run;
MINUTE
Returns the minutes from a SAS time or datetime value
Syntax: - Minute (<time | datetime>)
Data ds14;
a='10:30:05't;
b=MINUTE(a);
Format a time5.;
Run;
SECOND
Returns the seconds from a SAS time or datetime value
Syntax: -Second (<time | datetime>)
Data ds14a;
a='10:30:05't;
b=second(a);
Format a time. ;
Run;
DATEJUL
Converts a Julian date to a SAS date value
Syntax: -DATEJUL(Julian-date)
Julian-date
Specifies a SAS numeric expression that represents a Julian date
A Julian date in SAS is a date in the form yyddd or yyyyddd,
Where yy or yyyy is a two-digit or four-digit integer that represents
the year and ddd is the number of the day of the year
The value of ddd must be between 1 and 365 (or 366 for a leap year).
10365,2010365
Data ds15;
a=Datejul(10001);
Format a date9.;
Run;
Data ds15a;
a=Datejul(10365);
Format a date9.;
Run;
JULDATE
Returns the Julian date from a SAS date value
Syntax: -JULDATE (date)
The JULDATE function converts a SAS date value to a five- or seven-digit Julian dateIf
date falls within the 100-year span defined by the system option YEARCUTOFF=, the
result has five digits:
The first two digits represent the year, and the next three digits represent the day of the
year (1 to 365, or 1 to 366 for leap years)
Otherwise, the result has seven digits: the first four digits represent the year, and the
next three digits represent the day of the year. For example, if YEARCUTOFF=1920,
JULDATE would return 97001 for January 1, 1997,
and return 1878365 for December 31, 1878.
Data ds16;
a=juldate('01Jan2010'd);
Run;
01001
Data ds16a;
a=date();
b=juldate(a);
Format a date9.;
Run;
MDY
Returns a SAS date value from month, day, and year values
Syntax: - MDY (month,day,year)
Month
Specifies a numeric expression that represents an integer from 1 through 12.
Day
Specifies a numeric expression that represents an integer from 1 through 31.
Year
Specifies a two-digit or four-digit integer that represents the year
The YEARCUTOFF= system option defines the year value for two-digit dates
Data ds17;
x_birthday=mdy(8,27,90);
y_birthday=mdy(05,30,2009);
Format x_birthday worddate20. ;
Format y_birthday weekdate30. ;
Run;
YYQ
Returns a SAS date value from the year and quarteryear
Year
Specifies a two-digit or four-digit integer that represents the year
The YEARCUTOFF= system option defines the year value for two-digit dates
Quarter
Specifies the quarter of the year (1, 2, 3, or 4)
Syntax: -YYQ(year,quarter)
Data ds18;
DateValue1=yyq(2001,3);
DateValue2=yyq(09,2);
Format DateValue1 date7.;
Format DateValue2 date7.;
Run;
TIMEPART
Extracts a time value from a SAS datetime value
Syntax: - TIMEPART (datetime)
Data ds19;
x=datetime();
y=timepart(x);
Format X datetime. Y time. ;
Run;
DATEPART
Extracts the date from a SAS datetime value
Syntax: -DATEPART(datetime)
Data ds20;
X=datetime();
Y=datepart(x);
Format x datetime. y ddmmyy10.;
Run;
Data ds20a;
x=datepart ('01Jan2010:05:30:26'dt);
Format x ddmmyy8.;
Run;
Data ds1;
Infile datalines;
Input id$ fname$ lname$ sal dob datetime.;
Format dob datetime. date date9. time time8.;
Date=datepart(dob);
Time=timepart(dob);
Datalines;
001 mohan arisela 60000 10jan1983:10:30:15
002 padma narni 45000 22feb1983:20:23:52
003 varma maddina 50000 30mar1983:06:55:25
;
Run;
INTCK
Returns the integer count of the number of interval boundaries between two dates, two
times, or two datetime values
Syntax: - INTCK(interval, from, to)
Interval
Specifies a character constant, a variable, or an expression that contains a time interval
such as SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QTR, SEMIYEAR and YEAR
DATA ds21;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('DAYS', BDATE, EDATE);
RUN;
DATA ds21a;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('months', BDATE, EDATE);
RUN;
DATA ds21b;
BDATE='10SEP2008'D;
EDATE='14SEP2010'D;
ACTDATE=INTCK('Semiyear', BDATE, EDATE);
RUN;
DATA ds21c;
y=trim('year ');
date1='1sep1991'd + 300;
date2='1sep2001'd - 300;
Years=intck (y,date1,date2);
RUN;
YRDIF
Returns the difference in years between two dates
Syntax: - YRDIF (sdate,edate,basis)
sdate
Specifies a SAS date value that identifies the starting date
edate
Specifies a SAS date value that identifies the ending date
basis
Identifies a character constant or variable that describes how SAS calculates the date
difference the following character strings are valid: '30/360'
Specifies a 30-day month and a 360-day year in calculating the number of years
Each month is considered to have 30 days, and each year 360 days, regardless of the
actual number of days in each month or year
DATA ds22;
BDATE='10SEP2000'D;
EDATE='14SEP2010'D;
ACTYEARS=YRDIF(BDATE, EDATE, 'ACTUAL');
Format BDATE date9. EDATE date9. ;
RUN;
DATA ds22a;
Sdate='16Oct1998'd;
Edate='16Feb2003'd;
y30360=yrdif(sdate, edate, '30/360');
Yactact=yrdif(sdate, edate, 'ACT/ACT');
yact360=yrdif(sdate, edate, 'ACT/360');
yact365=yrdif(sdate, edate, 'ACT/365');
Run;
DATA ds22b;
Sdate='16Oct1998'd;
Edate='16Feb2003'd;
YRDIFF=yrdif(sdate, edate, '30/360');
DAYDIFF=yrdif(sdate, edate, 'ACT/365');
Run;
INTNX
Increments a date, time, or datetime value by a given interval or intervals, and returns a
date, time, or datetime value Category: Date and Time
Syntax: -
INTNX (interval<multiple><.shift-index>, start-from, increment<, alignment>)
Interval
Specifies a character constant, a variable, or an expression that contains a time interval
such as SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QTR, SEMIYEAR and YEAR
Data ds23;
Yr=intnx('year','05feb94'd,3);
Format yr date7. ;
Run;
Data ds23a;
Next=intnx('semiyear','01jan97'd,1);
Format next date9.;
Run;
Data ds23b;
X1='month ';
X2=trim(x1);
Date='1jun1990'd - 100;
Next_month=intnx(x2,date,1);
Format Next_month date9.;
Run;
DATA DS23c;
FORMAT TODAY1 DATE9.;
TODAY1=TODAY();
CDATE=PUT (INTNX ('MONTH',TODAY1,0,'S'),DATE9.);
LMCDATE=PUT(INTNX('MONTH',TODAY1,-1,'S'),DATE9.);
BCDATE=PUT(INTNX('DAY',TODAY1,-1,'S'),DATE9.);
LMBCDATE=PUT(INTNX('MONTH',(TODAY1-1),-1,'S'),DATE9.);
BDATE=PUT(INTNX('MONTH',TODAY1,0,'B'),DATE9.);
EDATE=PUT(INTNX('MONTH',TODAY1,0,'E'),DATE9.);
RUN;
HOLIDAY
Returns a SAS date value for the holiday and year specified