You are on page 1of 9

An Introduction to PC SAS/CONNECT

2006 WRDS Users Meeting, June 14, 2006 David Robinson If the WRDS web interface doesnt provide the flexibility or capability you require, you may need to run a customized computer program to perform your analysis. The WRDS Unix server is available to faculty, research assistants of faculty (RAs) and Ph.D. students for this purpose. All other users are limited to using the web-based interface. WRDS supports two programming environments: Unix and PC SAS. This session will discuss the PC SAS/CONNECT programming environment. PC SAS (The SAS System for Windows, ver. 9.1.3) Users that don't feel comfortable in the Unix programming environment may use PC SAS instead. This will allow you to remain in the MS Windows environment while still making use of the WRDS Unix system's resources. In fact, PC SAS/CONNECT allows you to make effective use of the computing capabilities of both your PC and the WRDS Unix server simultaneously. A typical research project begins with the extraction of a sample of data from the WRDS server. As we will see, this can be done by running a PC SAS/CONNECT program, but it can also be done by using the WRDS web interface or by directly transferring a SAS data set from the WRDS Unix server to a PC using "sftp". (sftp will be discussed in more detail in a subsequent session on Unix.) An important thing to remember when transferring a SAS data set to a PC with the WRDS web interface or sftp, however, is that WRDS Unix SAS data sets are in 64-bit format, while native PC SAS data sets are in 32bit format. The WRDS web interface allows you to select the output format of the SAS data set (64-bit or 32-bit), so when saving SAS data sets to your PC the 32-bit SAS data set output option should be selected. But if you transfer a 64-bit SAS data set to your PC using sftp, you should convert it to 32-bit format before using it extensively. PC SAS can read 64-bit SAS data sets, but it does so very slowly. (In the near future we will all have 64-bit PCs - both hardware and software - but until then we need to be aware of this issue.) For example, say I downloaded the CRSP Monthly Stock file (msf.sas7bdat) and it's associated index file (msf.sas7bndx) to the C:\SAS\64bit folder on my PC using sftp. Although my PC SAS programs would be able to read this "msf" file on my PC, the speed of execution might be slow since PC SAS will not make use of the 64-bit index file. A warning regarding the use of a "non-native" data set would be issued in the SAS log file. To translate this data set to 32-bit mode, the following program should be run: libname mylib64 "c:\sas\64bit"; libname mylib32 "c:\sas\32bit"; proc migrate in=mylib64 out=mylib32; run;

This program will convert all SAS data sets in the C:\SAS\64bit folder and place the modified versions in the C:\SAS\32bit folder. The resulting 32-bit SAS data sets and index files will be native to the PC SAS environment and programs accessing them will suffer no performance penalty. PC SAS/CONNECT Remote Library Services (RLS) PC SAS/CONNECT can read a SAS data set from the WRDS server dynamically, without having to download it to your PC prior to program execution. This capability is called Remote Library Services (RLS), and it allows you to run a SAS program locally (on your PC), using data that is downloaded from the WRDS server as it is needed during program execution. Note, however, that a WHERE clause or a PROC CONTENTS that is operating on a remote SAS data set will execute on the server, not locally. In these cases the remote data set is not downloaded to the local machine before execution. RLS may be a convenient technique for the one-time analysis of a data set that you dont want to store locally, but it may be more efficient to download the data to your PC first, if that is practical. For example, if there is a bug in your program and you have to run it again, the entire data set may have to be downloaded from the server to your PC again when you run the corrected program. An example of an RLS program follows: %let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; options nocenter nodate nonumber ls=max ps=max; libname ibeslib remote '/wrds/ibes/sasdata' server=wrds; proc contents data=ibeslib.statsum; run; signoff; The first 3 lines of the above program are necessary to establish a connection between the local SAS session (on your PC) and the remote SAS session (on the WRDS Unix server). The libname statement defines a remote library in your local SAS Session. (Click the Explorer tab in the bottom of the left-hand pane of PC SAS and then double-click Libraries to see your currently-defined libraries.) Since this simple program only runs a PROC CONTENTS, the remote data set is not downloaded to the PC before execution. The signoff command terminates the link between the local SAS session and the remote SAS session. (This will remove the remote library definition in the local SAS session.)

Note that if you dont execute the signoff command in your program, the link between the local and remote SAS sessions remains and you can execute additional lines of code without re-defining your remote libraries. For example, if you submit the above lines of code without the signoffcommand, you could then submit the following: proc contents data=ibeslib.det; run; This can be done by putting the above lines in the Editor window of PC SAS, highlighting the lines of code, and hitting the Submit icon or the F8 key. Also note that, as long as signoff has not been executed, the remote library definition ibeslib remains in the Explorer pane of PC SAS after the program has completed. In the following program, a WHERE clause is used in the DATA step to filter the data based on the value of an indexed variable. When using RLS, this causes only the filtered observations to be downloaded to your PC prior to execution rather than the entire SAS data set. %let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; options nocenter nodate nonumber ls=max ps=max msglevel=i; libname ibeslib remote '/wrds/ibes/sasdata' server=wrds;
libname mylib 'c:\sas'; data mylib.ibes2003; set ibeslib.statsum (keep=ticker fpedats statpers fpi measure meanest actual repdats); where ticker in ("IBM","MSFT","DELL"); if year(fpedats) = 2003 and year(statpers) = 2003 and month(fpedats) = month(statpers) and measure = "EPS" and fpi = "1"; run; proc print data=mylib.ibes2003 noobs; var ticker fpedats statpers fpi measure meanest actual repdats; run;

Notice that the "Signoff" command was not issued in the above program, so the remote SAS session (on the WRDS Unix server) is still active. It is possible to use local libraries and remote libraries at the same time. This allows you to use SAS data sets on the WRDS server and SAS data sets on your local machine within the same program. For example:

%let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; options nocenter nodate nonumber ls=max ps=max; libname ibeslib remote '/wrds/ibes/sasdata' server=wrds; libname mylib 'c:\sas'; data ibes2004; set ibeslib.statsum (keep=ticker fpedats statpers fpi measure meanest actual repdats); where ticker in ("IBM","MSFT","DELL"); if year(fpedats) = 2004 and year(statpers) = 2004 and month(fpedats) = month(statpers) and measure = "EPS" and fpi = "1"; run; data ibesdata; set mylib.ibes2003 ibes2004; run; proc sort data=ibesdata; by ticker fpedats; run; proc print data=ibesdata noobs; var ticker fpedats statpers fpi measure meanest actual repdats; run;

PC SAS/CONNECT Remote Submit It is also possible to use a Remote Submit command within PC SAS (rsubmit) to submit SAS code for remote execution on the WRDS server. Since using Remote Library Services may mean that your program will need to download sizable files from the server to your local machine during program execution on your PC, it may be better to use Remote Submit to run your SAS program directly on the WRDS server. Only the log and listing files would be returned to your local PC SAS session in this case. For example:
%let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; rsubmit; options nocenter nodate nonumber ls=max ps=max msglevel=i; data temp; set comp.compann (keep=gvkey smbl yeara fyr coname data6); where smbl in ("IBM", "MSFT", "DELL") and yeara >= 2003; format fybegdt fyenddt yymmddn8.; if 1 <= fyr <= 5 then fyenddt = intnx('month',mdy(fyr,1,yeara+1),0,'end'); if 6 <= fyr <= 12 then

fyenddt = intnx('month',mdy(fyr,1,yeara),0,'end'); fybegdt = intnx('month',fyenddt,-11); label fybegdt = "Fiscal Year Begin Date" fyenddt = "Fiscal Year End Date"; run; proc print data=temp noobs; var gvkey smbl yeara fyr fybegdt fyenddt coname data6; run; endrsubmit; libname rwork slibref=work server=wrds; libname rcomp slibref=comp server=wrds;

All lines of code between rsubmit and endrsubmit are executed in the remote SAS session on the WRDS Unix server. Control is returned to the local SAS session after the endrsubmit command is executed. The final two libname statements in the above program create local (PC) references to the remote (Unix) WORK and CRSPLIB libraries, respectively. After running this program, use the "Explorer" tab in the left-hand pane of PC SAS (select it at the bottom) to browse the various local and remote SAS libraries. Additional lines of code can be submitted to operate on any of these SAS libraries by highlighting the lines of code in the Editor window and clicking the "Submit" button (or hitting F8). PC SAS/CONNECT - PROC UPLOAD / DOWNLOAD When using PC SAS/CONNECT, it is possible to move files between your PC and the UNIX server during program execution. The SAS Procedures PROC UPLOAD and PROC DOWNLOAD will upload (from your PC to the UNIX server) or download (from the UNIX server to your PC) files during program execution. This technique is useful when you have a local file (on your PC) containing company identifiers and dates, and you want to use it in a SAS program that you will Remote Submit to run on the UNIX server. Once the program has extracted the desired subset of data, it can automatically be downloaded to your PC at the end of the program. For example: %let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; /* ----------------------------------------------------------------------------- */ /* Note that fileref and libref statements that refer to files amd */ /* and libraries on the local machine (PC) MUST BE EXECUTED */ /* EXECUTED PRIOR TO THE REMOTE SUBMIT. In this */ /* example, the permno.txt file just contains a few PERMNOs: */ /* 12490 */ /* 11081 */ /* 10107 */ /* ---------------------------------------------------------------------------- */

filename pcfile 'c:\sas\permno.txt'; libname locallib 'c:\sas'; rsubmit; options nocenter nodate nonumber ls=max ps=max msglevel=i; filename unixfile '~/permno.txt'; proc upload infile=pcfile outfile=unixfile; run; data temp; infile unixfile; input permno; run; proc sql; create table demo as select msf.permno, msf.date, msf.prc, msf.ret from temp, crsp.msf where temp.permno = msf.permno and 2003 <= year(date) <= 2004; quit; proc print data=demo noobs; var permno date prc ret; run; proc download data=demo out=locallib.demo; run; endrsubmit; SAS data sets that are downloaded with PROC DOWNLOAD are automatically converted to the native format of the destination machine, if necessary. In this case, the SAS data set "locallib.demo" is converted to 32-bit format. Using the SASTEMP File Systems in PC SAS/CONNECT The WRDS Unix server has 10 temporary file systems that may be used for temporary file storage. These file systems are named /sastemp0, /sastemp1, , /sastemp9, and they are each 200 GB in size. Files on these file systems that have not been used for 2 days are automatically deleted, and recovery is not possible. See section 9 of the http://wrds.wharton.upenn.edu/support/docs/WRDS_Unix.pdf document for a full discussion of these file systems.

It is possible to make use of the sastemp file systems within PC SAS/CONNECT programs. For example, the following program uses the x command to create and manipulate files in the sastemp file systems. The x command will execute native operating system commands, as if at the command line, within a SAS program. Before choosing which sastemp file system to use, however, it is advisable to check the utilization levels of /sastemp0 - /sastemp9 by using the web-based SASTEMP utility under Support | Remote Access to WRDS: http://wrds.wharton.upenn.edu/support/nonwebaccess.shtml .
%let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_; rsubmit; x 'mkdir /sastemp5/mydata'; libname mylib '/sastemp5/mydata'; proc sort data=crsp.dsf (keep=permno date ret) out=returns; where permno in (10068, 12490); by permno date; run; proc sql; create table mylib.cum_returns as select permno, exp(sum(log(1+ret))) - 1 as cum_return, min(ret) as minret, max(ret) as maxret, n(ret) as n_periods, nmiss(ret) as n_miss, sum(ret=.P) as n_dot_p, min(date) as first_date, max(date) as last_date from returns where ('01jan1986'd <= date <= '31dec1986'd) group by permno; quit; proc print data=mylib.cum_returns noobs; format first_date last_date yymmddn8.; run; endrsubmit; libname rmylib slibref=mylib server=wrds;

Notice that a remote library reference was defined after the endrsubmit and that the signoff command was not executed. This will allow you to submit additional SAS commands in the local SAS session that reference files within the /sastemp5/mydata directory on the WRDS Unix server. For example, the following lines of code can be submitted in the same SAS session (prior to signoff).
%let wrds = wrds.wharton.upenn.edu 4016; options comamid=TCP remote=WRDS; signon username=_prompt_;

proc print data=rmylib.cum_returns noobs; where permno = 10068; run; rsubmit; proc print data=mylib.cum_returns noobs; where permno = 12490; run; proc delete data=mylib.cum_returns; run; x 'rmdir /sastemp5/mydata';

endrsubmit; signoff; Notice that the first PROC PRINT is executed in the local SAS session and uses the remote library pointing to /sastemp5/mydata. (This is equivalent to using Remote Library Services.) The second PROC PRINT is executed in the remote SAS session. Also, please note that the above program is careful to delete all files in sastemp that are no longer needed. It uses PROC DELETE to delete a specific SAS data set, and then removes the entire /sastemp5/mydata directory. Remember that the sastemp file systems are a shared resource, so please be a good citizen and delete your files if you no longer need the space.

A Note on the Use of SAS Indexes It is important to note that using a WHERE clause on an indexed SAS variable will always be more efficient than using an IF clause. This is because the WHERE clause makes use of the index, but the IF clause does not. To illustrate, compare the following two log files, both of which were executed using Remote Submit:
4 5 6 7 data temp; set remlib.msf; where permno = 12490 and date gt '01JAN2001'd; run;

NOTE: There were 12 observations read from the data set REMLIB.MSF. WHERE (permno=12490) and (date>'01JAN2001'D); NOTE: The data set WORK.TEMP has 12 observations and 19 variables. NOTE: DATA statement used: real time 0.22 seconds cpu time 0.07 seconds

4 5 6

data temp; set remlib.msf; if permno = 12490 and date gt '01JAN2001'd;

run;

NOTE: There were 2981941 observations read from the data set REMLIB.MSF. NOTE: The data set WORK.TEMP has 12 observations and 19 variables. NOTE: DATA statement used: real time 41.79 seconds cpu time 37.53 seconds

The IF clause took significantly longer than the WHERE clause - 42 seconds vs. .22 seconds. Of course, for larger and more complex queries the difference will amount to a great deal of time. So, always use the WHERE clause, if possible.

You might also like