Out of the Shadows: Towards Community-Based, Reproducible Informatics Tools and Workflows for Integrative Multi-Omic Analysis.
Technologies for generating large-scale molecular information on collections of genes (genomes) and the molecules that they encode to carry out biochemical functions (e.g. proteins, making up the proteome) are now available to most biological research labs. These technologies have proven crucial in driving discoveries across many fields of biological research. The complex, large-scale data generated by these approaches necessitates specialized software tools for domain-specific analyses (e.g. DNA sequencing, protein identification and quantification). It is appreciated that integration of these datasets provides unique and more complete insights into molecular mechanisms driving biological phenomena – such as how changes to the DNA sequence result in changes to encoded proteins that may act as biochemical drivers of disease. To enable such insights, multi-step workflow analyses are necessary, integrating different software tools and datatypes to provide an understanding of connectivity of these different molecular types and data. These workflows, however, bring up new challenges in reproducibility and reuse by others. On the technical level, the integration of many disparate software tools necessary for these analyses is many times beyond the wherewithal of the biological researchers seeking to employ similar data analysis and/or replicate results from others; additionally, effectively communicating the complex set of software specifications of each software tool necessary for analysis reproduction is nearly impossible using traditional methods (e.g. inclusion in experimental/methods sections of peer-reviewed publications). As a solution, our research group has been developing multi-omic software approaches deployed in the Galaxy bioinformatics ecosystem. Developed with forward-looking architecture, Galaxy provides a unified environment for integrating disparate software tools into workflows, with a focus on enabling usage by biological researchers who are not computational experts. Galaxy provides provenance tracking, such that even the most complex workflows can be saved and shared transparently with other users of the platform. It is also more than simply a workflow engine, with a worldwide community of users who have developed freely accessible resources to promote training and democratization of tools and workflows, leveraging cloud-based informatic technologies. The goal of these efforts is to provide a means to shine daylight on complex bioinformatic workflows and make them accessible to a broader community of users. This presentation will describe our experiences in developing multi-omic bioinformatic resources, with the goal of providing more reproducible and accessible tools to a wider, global community of scientists whose research could benefit from these advanced technologies.
Dr. Griffin is a Professor in the Department of Biochemistry, Molecular Biology and Biophysics at the University of Minnesota. He is also the Faculty Director of the Center for Mass Spectrometry and Proteomics core facility, serving research needs of the University community. His research interests are in developing analytical and bioinformatics tools, focused on mass spectrometry-based methods and data, and applying these tools to interesting questions in biology and biomedical research. He leads the Galaxy for proteomics (Galaxy-P) project (galaxyp.org), which seeks to develop bioinformatics tools for integrative analysis of mass spectrometry data with other types of ‘omics data. A main objective of this ongoing work is to promote community adoption of reproducible informatics tools for multi-omic molecular characterization of biological systems.