Introduction to SAS

SAS offers comprehensive software products for data access, data management, data analysis and data presentation. In this course, we will primarily use SAS/STAT, an integral component of SAS, to perform statistical analysis. The version available on campus is SAS for Windows (PC-SAS Version 8.01). You need a Purdue University Computing Center career account to use PUCC facilities.

Get Started

1. SAS version 8 is available on the PCs in PUCC labs.
2. Purdue's SAS license allows any student and staff member to get a copy of SAS version 8.01. You can check out a SAS CD and install SAS on your machine. Contact Carol Funkhouse in the PUCC office in MATH G175.

Installation hints: The whole SAS package needs more than 500MB disk space. The part used in STAT514 takes about 300MB. To keep your SAS to "only" 300MB: 1). Ignore the CDs titled "Online Doc" and "Client-Side Components". 2). Say "No" when asked if you want to install help in "Simple HTML". 3). Choose Custom Installation and only choose the following components: Base SAS, Core of the SAS system, SAS/GRAPH, SAS/QC, and SAS/STAT.

Stat 514 Website

The URL of the website is http://www.stat.purdue.edu/~yuzhu/514gate.html. It contains links to all the course material as well as possible announcement. The links are Syllabus, Lecture Notes, Homework and Exam Material, Data Sets, SAS files and Announcement. You can download the files when they are ready. To download a .sas program or .dat file, just click the file name, then click Save this file to disk and navigate to the directory where you keep your SAS work. You need use either your home directory (H: drive) or a floppy disk (A: drive), otherwise your work will be deleted when you log off.

Use SAS for Windows

Launch SAS:
1) Start menu -> standard software -> statistical packages ->The SAS system -> SAS V8;
2) Double-click on a SAS program file (.sas file).

Create or Open a SAS file:
After SAS is activated, you will see several widows. One is the Editor in which you can create and modify SAS programs. In today's lab, you will use sample SAS programs only. A sample SAS file is given in the following. You can copy and paste it from this webpage to the SAS Editor window. Another way to open a saved SAS file is either double-click on the sas file, or hightlight the Editor window first, then click on File menu->Open->...

data Class; 
   input Name $ Height Weight Age @@; 
datalines; 
   Alfred  69.0 112.5 14  Alice  56.5  84.0 13  Barbara 65.3  98.0 13 
   Carol   62.8 102.5 14  Henry  63.5 102.5 14  James   57.3  83.0 12 
   Jane    59.8  84.5 12  Janet  62.5 112.5 15  Jeffrey 62.5  84.0 13 
   John    59.0  99.5 12  Joyce  51.3  50.5 11  Judy    64.3  90.0 14 
   Louise  56.3  77.0 12  Mary   66.5 112.0 15  Philip  72.0 150.0 16 
   Robert  64.8 128.0 12  Ronald 67.0 133.0 15  Thomas  57.5  85.0 11 
   William 66.5 112.0 15 
; 

symbol1 v=dot c=blue height=1.5pct;
proc reg data=Class; 
   model Weight = Height; 
run; 
   plot Weight*Height/cframe=ligr; 
run; 
quit;
Run a SAS file:
With the Editor window highlighted, click the running figure icon in the tool bar (or go to Run menu->Submit). This tells SAS to run the program in the Editor window.

SAS Output:
The results appear in several other windows. The Log window is a step-by-step account of what SAS did with your program. SAS reports errors in your program here. Special graphics (plots) appear in a separate Graph window with one graph per page. Use the Page Up and Page Down keys to view the graphs one by one. The Output contains the text output (the analytical results) from your program.

If you make some changes in your SAS program and re-submit it. The new results will not replace the old results instead they will be appended to the old. It may cause some difficulty to see the new results. A simple way to solve this problem is to clean the windows before you submit the modified file. In the Log window, just right-click to bring up the contextual menus, then go to Edit->Clear All. For the Output and Graph windows, the most effective way is to go to the result summary window (left-most window), highlight the results main directory, then click on the X button or do Edit->Clear All.

Save/Print SAS Results:
You can highlight the window and do File menu->Save/Print to save/print the contents there. SAS tends to generate too many pages of output and it is better to move the Output contents into a word processor like Microsoft Word. To save the output window as .rtf file, highlighting the Output window and select File menu->Save as->select save as type RTF Files.

The graphics can also be cut and pasted into Word documents. Highlight the graphics window and go to the graphic of interest, click the Edit Graph button in the tool bar (or go to Tools menu-> Graphics Editor. Once in the graphics editor, you can add to or edit the graphic. To copy the graphic to Word, select Edit->Select->All and then Copy...You can also export the graphic as an image (.bmp, .gif, .jpeg, or .ps) and import them to word. In this case, you cannot edit them once in word.

Basics of SAS Programming

We use another example to introduce some basics of SAS programming. The data set tensile.dat is based on an experiment investigating the tensile strength of a new synthetic blended with different percentages of cotton. There are three variables: percent, strength and time. We will focus on the first two variables. The SAS file tensile.sas is used to analyze the data sets. Now, you can save both files to a designated directory. Since SAS is already running, you can open tensile.sas in SAS following File menu->Open->the directory.....Notice that in the infile statement of the data step, the tensile.dat location is specified so that SAS knows where to import the data set. Since you might have saved the data set in a different location, you need to make a change accordingly. You can submit the whole program to get all the results. To facilitate clear explanations, we will submit it block by block. If you choose to follow this approach, please clear the file in the SAS editor window before submitting a new block.
options ls=75 ps=60 nocenter;
goptions colors=(none) device=win target=winprtm rotate=landscape ftext=swiss
   hsize=8.0in vsize=6.0in htext=1.5 htitle=1.5 hpos=60 vpos=60
   horigin=0.5in vorigin=0.5in;

data one;
 infile 'c:\saswork\data\tensile.dat';
 input percent strength time;

title1 'example';
proc print data=one; 
run;
Note very carefully that all SAS program lines end with a semicolon. The indented and blank lines just make the program easier to read. run tells SAS to execute the commands that proceed it. Note also that names in SAS should be no more than 8 characters long, should contain only letters and numbers, and should begin with a letter. These restrictions appear to be relaxed in more recent versions of SAS, I will still follow this rule.

options ls=75 ps=60 restrict the output to be 75 columns and 60 lines per page. The nocenter tells SAS not to center the output. goptions specifies various options of the graphics. These settings hopefully creates graphics that fit nice in Word. The colors=(none) option tell SAS to use black and white only.

title1 prints a title on each page of your output to help you identify it later. You should always do this. You can print more than one line by adding title2, title3, and so on. The actual title must be enclosed with a single right quote at each end of the text. The last title will be used on all subsequent graphs. To turn the last title off, you need the statement goptions reset=title.

data one: SAS programs usually consist of data steps and procedures. A data statement names a data set. The lines following a data statement creates the data set. This program has one data statement that creates a SAS data set called one containing three variables.

infile and input: we read data from a file. The infile statements tells SAS what file to read and where the file is located. Be sure to put a single right quote symbol on either end of the file's name. The input statement describes the data. We name the three variables percent, strength, and time. In this example, tensile.dat is an existent data set, SAS uses infile to read it into the SAS system. If you need input a new data set in SAS, the datalines statement should be used as demonstrated in the previous example.

proc: proc is the abbreviation of procedure. SAS/STAT consists of many procedures that provide a variety of functionalities for data management, analysis and visualization. The proc used in the above program is named print that prints the imported/created data to the Output window and you can verify if the data is correct. The general format of a procedure command is

Proc procname options;
  statement / statement options;
  statement / statement options;
  .
  .
Now, the second block of tensile.sas is given as follows,
symbol1 v=circle i=none;
title1 'Plot of Strength vs Percent Blend';
proc gplot data=one;  plot strength*percent/frame; 
run;
proc gplot makes a scatter plot. Note that the y (verticle) variable is given first. The symbol1 specifies the symbol to be used in the plot. The frame option puts a box around the plot.
proc boxplot;
 plot strength*percent/boxstyle=skeletal pctldef=4;
run;
proc boxplot creates boxplots of the data. Note that the y (verticle) variable is given first. The skeletal option means that the whiskers of each box extend to the minimum and maximum values. The pctldef option specifies certain way of computing quantiles.
proc glm;
 class percent; model strength=percent;
 output out=oneres p=pred r=res; 
run;
proc glm (and proc mixed) are two linear model commands you will need for many of your homeworks. Please consult the SAS help for mroe details. We will discuss the procs/outputs further in class later on. The model statement has the form

response variable = list of predictor variables

The equal sign can be interpreted "is explained by". The output statement enable you to save results for further analysis. This creates a new file named oneres, which contains all the original data plus additional variables. Here the new variables are the predicted (p=pre) and residual (r=res) values.

proc sort; by pred;
symbol1 v=circle i=sm50; title1 'Residual Plot';
proc gplot;  plot res*pred/frame; 
run;
proc sort sorts the data according to a specific variable(s). In this case, the data is sorted from smallest to largest according to the predicted values from the linear model. The plot statement generates a residual plot.
proc univariate data=oneres pctldef=4;
 var res;  qqplot res / normal (L=1 mu=est sigma=est);
 histogram res / normal; 
run;
proc univariate gives basic numerical descriptions for each variable you request. If you leave out the var statement, SAS describes all the numeric variables in the data set. Including the qqplot statement adds a normal quantile plot and including the histogram statement adds a histogram and overlays, in this case, a normal distribution. We will discuss these in some detail in class.
symbol1 v=circle i=none;
title1 'Plot of residuals vs time';
proc gplot; plot res*time / vref=0 vaxis=-6 to 6 by 1;
run;
quit;
This generates a residual versus time plot. To terminate all the commands of a sas program, you need add a quit statement in the end.

SAS Help

You have now gone through SAS basics using one template program. SAS itself can give you a more detailed tour. In SAS, do Help menu->Getting Started with SAS Software. SAS also has detailed help on each procedure. You may find this too terse to be useful if you are a very beginner of SAS. In SAS, do Help menu ->SAS System Help. In the list, click Help on SAS Software Products. Most statistical procedures are in SAS/STAT and clicking on a statistical procedure gives details of the structure and options. There is also an item called Sample SAS Programs and Applications. This contains other template fils from which you might learn and borrow some commands.

Welcome on Board!