William S. Cleveland

Positions

William S. Cleveland has been the Shanti S. Gupta Distinguished Professor of Statistics and Courtesy Professor of Computer Science at Purdue University since 1/1/2014. Previous to this, he was a Distinguished Member of Technical Staff in the Statistics Research Department at Bell Labs, Murray Hill; for 12 years he was the Department Head.

Education

Cleveland received an A.B. in Mathematics from Princeton University; his senior thesis adviser was William Feller. He received his Ph.D. in Statistics from Yale University; his Ph.D. thesis adviser was Leonard Jimmie Savage.

Awards and Honors

In 2016 Cleveland received the Lifetime Achievement Award for Graphics and Computing from the American Statistical Association, the first since 2010. In 2016 he also received the Parzen Prize from Texas A&M University, given every two years since 1994 to a "statistician whose outstanding research contributions include innovations that have had impact on practice". In 1996 he was chosen national Statistician of the Year by the Chicago Chapter of the American Statistical Association. In 2002 he was selected as a Highly Cited Researcher by the American Society for Information Science and Technology in the newly formed mathematics category. He has twice won the Wilcoxon Prize and once won the Youden Prize from Technometrics. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association of the Advancement of Science, and is an Elected Member of the International Statistical Institute.

Data Science

In a talk at the 1999 meeting of the International Statistical Institute and in a 2001 paper, number [25] in the list of publications in the above PDF, Cleveland defined data science as it is used today. It had been used before, but with different meanings. See the Wikipedia Web page Data Science. The paper was republished in 2014 [1] together with a discussion and with another paper about D&R with DeltaRho [2], described next, which requires work in all technical areas of data science.

The technical areas of data science are those that have an impact on how a data analyst analyses data: (1) Statistical theory; (2) Statistical models; (3) Statistical and machine-learning methods; (4) Algorithms for statistical and machine-learning methods, and optimization; (5) Computational systems for data analysis; (6) Live analyses of data where results are judged by the findings, not the methodology and systems that where used.

The implications for an academic department are that it is not necessary each individual to research in all areas. Rather, collectively, the department needs to have research in all areas. There must be an exchange of knowledge so that all department members have at least a basic understanding of all areas.

Areas of Research

Cleveland's areas of research have been in statistics, machine learning, data visualization, data analysis for multidisciplinary studies, and high performance computing for deep data analysis.

Data Analysis Projects

Cleveland has been involved in many projects requiring the analysis and modeling of very diverse datasets from many fields, including computer networking, healthcare engineering, telecommunications, homeland security, environmental monitoring, public opinion polling, cyber security, and visual perception. Since circa 2008, many of the analyzed datasets have been big in size and required analytic methods with high computational complexity.

Widely-Used Methods and Their Publication

In the course of this work in data analysis, Cleveland has developed many new analytic methods and new computer systems for data analysis that are used throughout the worldwide technical community. He has published 129 papers and 3 books on this work. See the PDF above for a chronological list. For citations to the publications, see the Web page Google Citations .

Data Visualization

In data visualization, Cleveland has written two books, co-authored another and one user's manual, and was the Editor of two books and a special issue of the Journal of the American Statistical Association. He is the founder of the Graphics Section of the American Statistical Association, which means he led the group that successfully petitioned the ASA board of directors for approval.

His two books on data visualization have been reviewed in many journals from a wide variety of disciplines. The Elements of Graphing Data was selected for the Library of Science Book Club. J. Lodge reviewed it in Atmospheric Environment and wrote: "certain kinds of tendency toward bad graphics could be cured if as many authors as possible would not just read, but, in the words of the Anglican Prayer Book, `learn, mark, and inwardly digest' this volume." B. Gunter reviewed Visualizing Data in Technometrics and wrote: "This is a terrific book --- in my opinion, a path-breaking book. Get it. Read it. Practice what it preaches. You will improve the quality of your data analysis."

Cleveland and colleagues developed trellis display, a powerful framework for data visualization. It has been used by a large, worldwide community of data analysts as a result of its implementation in the two software systems based on the S language, the S-Plus commercial system and the R open source system.