Brad Price

Title:  A Cluster Elastic Net for Multivariate Regression

Abstract: In this talk we propose a method for simultaneously estimating regression coefficients and clustering response variables in a multivariate regression model, to increase prediction accuracy and give insights into the relationship between response variables. The estimates of the regression coefficients and clusters are found by using a penalized likelihood estimator, which includes a cluster fusion penalty, to shrink the difference in fitted values from responses in the same cluster, and an L1 penalty for simultaneous variable selection and estimation. We propose a two step algorithm, that iterates between k-means clustering and solving the penalized likelihood function assuming the clusters are known, which has desirable parallel computational properties obtained by using the cluster fusion penalty. Theoretical results are presented for the penalized least squares case, including asymptotic results allowing for p ≫ n. We extend our method to the setting where the responses are binomial variables, and discuss extensions to generalized linear models.  Examples in genetics and business operations will be discussed. This is joint work with Ben Sherwood