Title
Formally Private Heavy-Tailed Synthetic Data.
Abstract
Employment and payroll data of all U.S. establishments and firms are invaluable for economic research, but often contain sensitive information that requires confidentiality protection. Generating synthetic heavy-tailed data with a formal privacy guarantee while preserving high levels of utility is a challenging yet practically highly relevant problem for many federal data products containing employment and payroll data. We propose using the K-Norm Gradient Mechanism (KNG) with quantile regression, taking into consideration a crossing quantile issue, for formal privacy synthetic data generation. Through a simulation study and application to the original synthetic version of the U.S. Census Longitudinal Business Databases (SynLBD), our results show that the proposed methods can achieve better data utility relative to the naive KNG implementation at the same privacy-loss budget. Importantly our DP-SynLBD can capture the economic trends such as gross employment often estimated from LBD, the heavy tail nature of these data, and point to directions that can lead to further improvements in estimation. (joint work with T.Tran and M.Reimherr).
Bio
Aleksandra (Sesa) Slavkovic is a Professor of Statistics & Public Health Sciences, Dorothy Foehr Huck and J. Lloyd Huck Chair in Data Privacy and Confidentiality, and Associate Dean for Research in the Eberly College of Science at Penn State. Her research focuses on methodological developments in the area of data privacy and confidentiality in the context of small and large scale surveys, health, genomic, and network data, including work on differential privacy and broad data access that offers guarantees of accurate statistical inference needed to support reliable science and policy. Slavkovic is associate editor of the Annals of Applied Statistics and Journal of Privacy and Confidentiality, and editor for the Statistics and Public Policy. She served as a chair of the American Statistical Association (ASA) Privacy and Confidentiality committee, and of the ASA Social Statistics Section, and serves on a half dozen advisory committees, and National Academy of Sciences ones. She is a fellow of the American Statistical Association, Institute of Mathematical Statistics and the International Statistical Institute. She received her PhD (2004) and M.S. (2001) in Statistics, and a Master of Human-Computer Interaction (1999) from Carnegie Mellon University. She received her B.A. in Psychology from Duquesne University (1996).