Hadley Wickham | Institute for Research in Statistics and its Applications

Position

Chief Scientist at RStudio

Talk Title

dplyr: One Language, Many Implementations

Abstract

One of dplyr's lesser known features is that it works with data stored in a wide range of ways, translating dplyr verbs into a variety of other computational frameworks. In this talk, I'll talk about three important backends: dtplyr, dbplyr, and multidplyr. These allow dplyr to seamlessly scale up to handle every larger datasets: dtplyr uses the fantastic data.table package to quickly work with large in-memory datasets, dbplyr converts your R code to SQL so you can work with data of any size in a relational database, and multidplyr allows you to easily take advantage of every core on your computer.

Biography

Hadley is Chief Scientist at RStudio, winner of the 2019 COPSS award, a member of the R Foundation, and Adjunct Professor at Stanford University and the University of Auckland. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (the tidyverse: including ggplot2, dplyr, tidyr, purrr, and readr) and principled software development (roxygen2, testthat, devtools, pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz.