The Benefits of Reproducibility in Data Science
In this talk, I will first argue that there are (at least) two different notions of reproducibility in data science: what we might call “data reproducibility” and “model reproducibility.” The former focuses on whether similar (in various senses) results would be obtained if we collected a new dataset; the latter focuses on whether similar (in various senses) results would be obtained if we pushed the original data through someone else’s (or perhaps even our own) data science pipeline. The benefits of data reproducibility have been widely studied for other sciences, and those insights arguably apply in data science as well: increased confidence, generalization, and so forth. In contrast, the benefits of model reproducibility are much less clear, partly because it appears to be a newer phenomenon that is distinctive to an era of Big Data, probabilistic models, and black-box learning algorithms. In contrast with data reproducibility, I will argue that the principal benefits of model reproducibility are not actually to be found in the models themselves. Rather, the main benefits are centered on the methods and methodologies of data science; that is, model reproducibility is important because it can help us do data science better.
David Danks is L.L. Thurstone Professor of Philosophy & Psychology, and Head of the Department of Philosophy, at Carnegie Mellon University. He is also the Chief Ethicist of CMU’s Block Center for Technology & Society; co-director of CMU’s Center for Informed Democracy and Social Cybersecurity (IDeaS); and an adjunct member of the Heinz College of Information Systems and Public Policy, and the Carnegie Mellon Neuroscience Institute. His research interests are at the intersection of philosophy, cognitive science, and machine learning, using ideas, methods, and frameworks from each to advance our understanding of complex, interdisciplinary problems. Danks has examined the ethical, psychological, and policy issues around AI and robotics in transportation, healthcare, privacy, and security. He has also done significant research in computational cognitive science, culminating in his Unifying the Mind: Cognitive Representations as Graphical Models (2014, The MIT Press). Danks is the recipient of a James S. McDonnell Foundation Scholar Award, as well as an Andrew Carnegie Fellowship. He received an A.B. in Philosophy from Princeton University, and a Ph.D. in Philosophy from University of California, San Diego.