|
B. Gunter, Technometrics: ``This is a terrific book --- in my opinion, a pathbreaking book. Get it. Read it. Practice what it preaches. You will improve the quality of your data analysis.'' |
P. Royston, Statistics in Medicine: |
A. Bowman, Journal of the Royal Statistical Society: |
D. Birnbaum, Infection Control & Hospital Epidemiology: |
|
C. J. Wild, ISI Book Reviews: |
A. H. Welsch, J. of the American Statistical Association: |
A. M. Ellison, BioScience: ``required reading for every scientist ... '' |
Data Tables
This is a collection of ascii tables that contain 22 data sets from
the book. They have been bundled by a UNIX command and can be unbundled
in UNIX. Windows users can use text editing tools to extract the data.
Data as S Objects
This ascii file created by S contains 22 data sets
from the book.
There is also a README object so there are 23 objects in all. The
objects were written into the file by data.dump(), and
should be restored by data.restore(). For example, if
the dump is in the file xxx, the S command
data.restore("xxx") reads them into S. The name of
each data set is the name used in the book. To find
the description of the data set, look under the entry
"data, name" in the index. For example, one data set
is barley. To find the description of barley, look in
the index under the entry "data, barley".
S Scripts
This is a collection of 270+ S scripts for producing figures in
the book. The file can be read into S using source("filename").
This creates 270+ functions that can be run to produce the figures.
The function names carry the figure number. For example, the function
book.6.7 produces Figure 7 of Chapter 6.
Tools matter. There are exceptionally powerful visualization tools, and there are others, some well known, that rarely outperform the best ones. The data analyst needs to be hard-boiled in evaluating the efficacy of a visualization tool. It is easy to be dazzled by a display of data, especially if it is rendered with color or depth. Our tendency is to be mislead into thinking we are absorbing relevant information when we see a lot. But the success of a visualization tool should be based solely on the amount we learn about the phenomenon under study. Some tools in the book are new and some are old, but all have a proven record of success in the analysis of common types of statistical data that arise in science and technology.
There are two components to visualizing the structure of statistical data --- graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered. The visualization tools in this book consist of methods for graphing and methods for fitting.
The book is organized around applications of the visualization tools to data sets from scientific studies. This shows the role each tool plays in data analysis, and the class of problems it solves. It also demonstrates the power of visualization; for many of the data sets, the tools reveal that effects were missed in the original analyses or incorrect assumptions were made about the behavior of the data. And the applications convey the excitement of discovery that visualization brings to data analysis.
The visualization of statistical data has always existed in one form or another in science and technology. For example, diagrams are the first methods presented in R. A. Fisher's Statistical Methods for Research Workers, the 1925 book that brought statistics to many in the scientific and technical community. But with the appearance of John Tukey's pioneering 1977 book, Exploratory Data Analysis, visualization became far more concrete and effective. Since 1977, changes in computer systems have changed how we carry out visualization, but not its goals.
When a graph is made, quantitative and categorical information is encoded by a display method. Then the information is visually decoded. This visual perception is a vital link. No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods. Display methods are the main topic of The Elements of Graphing Data. The visualization methods described here make heavy use of the results of Elements and other work in graphical perception.
The reader should be familiar with basic statistics and the least-squares method of fitting equations to data. For example, an introductory course in statistics that included the fundamentals of regression analysis would be sufficient.
For most purposes, the chapters need to be read in order. Material in later chapters uses tools and ideas introduced in earlier chapters. There are two exceptions to this general rule. Chapter 6, which is about multiway data, does not use material beyond Section 4.6 in Chapter 4. Also, sections of the book labeled ``For the Record'' contain details that are not necessary for understanding and using the visualization tools. The details are meant for those who want to experiment with alterations of the methods, or want to implement the methods, or simply like to take in all of the detail.