http://franz.stat.wisc.edu/pub/MASS3/Windows.shtmlDownload VRlib62.exe to the directory c:\splus4\library;second, open MS-DOS prompt, go to the above directory, and execute VRlib62.exe by simply typing in
vrlib62So, the temporary library MASS has been set up. Now, in the splus command window, you can use the following commands to make various functions available
library(MASS, first=T) library(nnet) library(spatial) library(class)For discriminant analysis, we need the first command. Because the splus4 directory is temporary, after you log out, the library will be lost, so you need re-install it next time when you log in.
To check if you have really set up the library properly. Go to the splus command window, and try out the following commands.
help(lda)//you will get an error message library(MASS, first=T) help(lda)
We will use iris data, a famous one used by Fisher to illustrate his ideas in discriminant analysis. We used iris data in principal component analysis. (If you have iris as the name for an object, you need go to _Data to remove it.)
help(iris)
iris
irisdat_rbind(iris[,,1],iris[,,2],iris[,,3])
irisdat
spe_(c(rep("s",50),rep("c",50), rep("v",50)))
spe
We have combined the original data to be a two-dimensional matrix, the first
50 rows are from Setosa, denoted by s, the next 50 observations are from 
Veriscolor, denoted by c, and the last 50 observations are from Virginica 
denoted by v. 
x1_irisdat[,1] x2_irisdat[,2] x3_irisdat[,3] x4_irisdat[,4]We can use x2 and x4 to take a look at the data.
win.graph() plot(x2, x4, type="n", xlab="x2", ylab="x4") text(x2, x4, spe)Any comments about the plot? Do they appears to be from bivariate normal distributions? Seems that the specie s is different from the other two, c and v. Now let's focus on c and v.
meanc_apply(irisdat[51:100,1:4], 2, mean) meanv_apply(irisdat[101:150,1:4],2, mean) meanc meanv varc_var(irisdat[51:100,1:4]) varv_var(irisdat[101:150,1:4]) varc varvNow the pooled variance:
varpool_(49*varc+49*varv)/(100-2) varpoolWe assume the prior probabilities are equal, so are the costs. Let's calculate the coefficients of linear discriminant (classification) function.
a_solve(varpool)%*% (meanc-meanv) aAnd the second term in the linear discriminant function:
crit_sum(a*(meanc+meanv)/2) critCheck if the derived discriminant function classifies the observations in c and v correctly. T indicates that an observation is classified as from c and F indicates classified as from v.
clas_irisdat[51:150,] %*% a >= crit t(clas)We know that the observations from 51-100 are from c, and the observation from 101-150 are from v. Now let's calculate the number of misclassifications. nctov represent the number of misclassification from c to v, and nvtoc represents the opposite.
nctov_0
nvtoc_0
for(i in 1:50) {
if(clas[i]==F) nctov_nctov+1
}
for(i in 51:100){
if(clas[i]==T) nvtoc_nvtoc+1
}
nctov
nvtoc
Hence the apparent error rate is (nctov+nvtoc)/100=3%. The true error rate 
is likely to be higher. 
In the following, we will use another example to illustrate linear discriminant analysis. We need to use the ellipse function in the following, so you need source the ellipse.S to set up the functions once more.
Margolese (1970) studied hormone ratios in 15 male heterosexuals and 11 male homosexuals.
Read in the data and examine.
hormone_read.table("hormone.data",header=T)
hormone
Compute the means and variances.
means_apply(hormone[1:11,-3],2,mean) meang_apply(hormone[12:26,-3],2,mean) vars_var(hormone[1:11,-3]) varg_var(hormone[12:26,-3])Now plot the data with 95% confidence ellipses:
plot(hormone$a,hormone$e,xlab="Androgen",ylab="Estrogen",type="n",xlim=c(0,6),ylim=c(0,6)) text(hormone$a,hormone$e,as.character(hormone$o)) lines(ellipse(vars,cent=means),lty=2) lines(ellipse(varg,cent=meang),lty=2)Suppose we assume a common variance and redo the plot.
varpool_(10*vars+14*varg)/24 plot(hormone$a,hormone$e,xlab="Androgen",ylab="Estrogen",type="n",xlim=c(0,6),ylim=c(0,6)) text(hormone$a,hormone$e,as.character(hormone$o)) lines(ellipse(varpool,cent=means),lty=2) lines(ellipse(varpool,cent=meang),lty=2)Put in the points of the means.
points(means[1],means[2]) points(meang[1],meang[2])Now compute the LDA rule.
a_solve(varpool) %*% (means-meang) crit_sum(a*(means+meang)/2) aThe decision line - can you see where the numbers come from?
abline(-4.063956/3.588248,4.534452/3.588248)What's the apparent error rate?
(To be continued)