Reproducible Research: What Have We Learned in 20 Years?
Rapid advances in computing technology over the past few decades have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. Together, these two phenomena have brought about tremendous advances in scientific discovery but have also raised two serious concerns, one relatively new and one quite familiar. The complexity of modern data analyses raises questions about the reproducibility of the analyses, meaning the ability of independent analysts to re-create the results claimed by the original authors using the original data and analysis techniques. While seemingly a straightforward concept, reproducibility of analyses is typically thwarted by the lack of availability of the data and computer code that were used in the analyses. A much more general concern is the replicability of scientific findings, which concerns the frequency with which scientific claims are confirmed by completely independent investigations. While the concepts of reproduciblity and replicability are related, it is worth noting that they are focused on quite different goals and address different aspects of scientific progress. In this review, we will discuss the origins of reproducible research, characterize the current status of reproduciblity in public health research, and connect reproduciblity to current concerns about replicability of scientific findings. Finally, we describe a path forward for improving both the reproducibility and replicability of public health research in the future.
Roger D. Peng is a Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health where his research focuses on the development of statistical methods for addressing environmental health problems. He is the author of the popular book R Programming for Data Science and 10 other books on data science and statistics. He is also the co-creator of the Johns Hopkins Data Science Specialization, the Simply Statistics blog where he writes about statistics for the public, the Not So Standard Deviations podcast with Hilary Parker, and The Effort Report podcast with Elizabeth Matsui. Dr. Peng is a Fellow of the American Statistical Association and is the recipient of the Mortimer Spiegelman Award from the American Public Health Association, which honors a statistician who has made outstanding contributions to public health.