Visualizing Data

How to Order

Visualization methods. A strategy for data analysis that stresses the use of visualization to thoroughly study the structure of data and to check the validity of mathematical and statistical models fitted to data. Prerequisites: Basic statistics and least-squares fitting.

Quotes from Reviewers

B. Gunter, Technometrics:
``This is a terrific book --- in my opinion, a pathbreaking book. Get it. Read it. Practice what it preaches. You will improve the quality of your data analysis.''

P. Royston, Statistics in Medicine:
``proposes and convincingly illustrates a philosophy of data analysis that is both modern and practical. ... the writing is beautifully lucid.''

A. Bowman, Journal of the Royal Statistical Society:
``uses mostly elementary tools ... in a way which is always illuminating and often highly imaginative.''

D. Birnbaum, Infection Control & Hospital Epidemiology:
``An exciting aspect of this book is the discovery of new findings in examples that are real, sometimes classic data sets.''

C. J. Wild, ISI Book Reviews:
``This book, by a leading researcher in statistical graphics, deserves a wide readership.''

A. H. Welsch, J. of the American Statistical Association:
`` a serious effort to produce a coherent data analysis.''

A. M. Ellison, BioScience: ``required reading for every scientist ... ''


Data Sets and S Scripts to Produce the Figures

Data Tables
This is a collection of ascii tables that contain 22 data sets from the book. They have been bundled by a UNIX command and can be unbundled in UNIX. Windows users can use text editing tools to extract the data.

Data as S Objects
This ascii file created by S contains 22 data sets from the book. There is also a README object so there are 23 objects in all. The objects were written into the file by data.dump(), and should be restored by data.restore(). For example, if the dump is in the file xxx, the S command data.restore("xxx") reads them into S. The name of each data set is the name used in the book. To find the description of the data set, look under the entry "data, name" in the index. For example, one data set is barley. To find the description of barley, look in the index under the entry "data, barley".

S Scripts
This is a collection of 270+ S scripts for producing figures in the book. The file can be read into S using source("filename"). This creates 270+ functions that can be run to produce the figures. The function names carry the figure number. For example, the function book.6.7 produces Figure 7 of Chapter 6.


Preface

Visualization is critical to data analysis. It provides a front line of attack, revealing intricate structure in data that cannot be absorbed in any other way. We discover unimagined effects, and we challenge imagined ones.

Tools matter. There are exceptionally powerful visualization tools, and there are others, some well known, that rarely outperform the best ones. The data analyst needs to be hard-boiled in evaluating the efficacy of a visualization tool. It is easy to be dazzled by a display of data, especially if it is rendered with color or depth. Our tendency is to be mislead into thinking we are absorbing relevant information when we see a lot. But the success of a visualization tool should be based solely on the amount we learn about the phenomenon under study. Some tools in the book are new and some are old, but all have a proven record of success in the analysis of common types of statistical data that arise in science and technology.

There are two components to visualizing the structure of statistical data --- graphing and fitting. Graphs are needed, of course, because visualization implies a process in which information is encoded on visual displays. Fitting mathematical functions to data is needed too. Just graphing raw data, without fitting them and without graphing the fits and residuals, often leaves important aspects of data undiscovered. The visualization tools in this book consist of methods for graphing and methods for fitting.

The book is organized around applications of the visualization tools to data sets from scientific studies. This shows the role each tool plays in data analysis, and the class of problems it solves. It also demonstrates the power of visualization; for many of the data sets, the tools reveal that effects were missed in the original analyses or incorrect assumptions were made about the behavior of the data. And the applications convey the excitement of discovery that visualization brings to data analysis.

The visualization of statistical data has always existed in one form or another in science and technology. For example, diagrams are the first methods presented in R. A. Fisher's Statistical Methods for Research Workers, the 1925 book that brought statistics to many in the scientific and technical community. But with the appearance of John Tukey's pioneering 1977 book, Exploratory Data Analysis, visualization became far more concrete and effective. Since 1977, changes in computer systems have changed how we carry out visualization, but not its goals.

When a graph is made, quantitative and categorical information is encoded by a display method. Then the information is visually decoded. This visual perception is a vital link. No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails. Some display methods lead to efficient, accurate decoding, and others lead to inefficient, inaccurate decoding. It is only through scientific study of visual perception that informed judgments can be made about display methods. Display methods are the main topic of The Elements of Graphing Data. The visualization methods described here make heavy use of the results of Elements and other work in graphical perception.

The reader should be familiar with basic statistics and the least-squares method of fitting equations to data. For example, an introductory course in statistics that included the fundamentals of regression analysis would be sufficient.

For most purposes, the chapters need to be read in order. Material in later chapters uses tools and ideas introduced in earlier chapters. There are two exceptions to this general rule. Chapter 6, which is about multiway data, does not use material beyond Section 4.6 in Chapter 4. Also, sections of the book labeled ``For the Record'' contain details that are not necessary for understanding and using the visualization tools. The details are meant for those who want to experiment with alterations of the methods, or want to implement the methods, or simply like to take in all of the detail.