Data Management and Analysis Core
The Center investigators seek to understand and remediate potential health risks posed by complex exposure scenarios present at hazardous waste sites using a systems approach. Several of the proposed projects will use cutting-edge analytical chemistry, sequencing and other approaches to produce high-dimensional “omic” data with thousands of parallel measurements on a specific endpoint. These data will be analyzed to identify biological processes that are perturbed in complex environmental exposure scenarios. The Data Management and Analysis Core (DMAC) will support the all aspects of the scientific data process: from statistical design, data management and QA/QC to providing high performance computing platforms, with secure access to Center data, consulting on data science, biostatistical analysis, to development of new methodology and its dissemination for project goals. The DMAC will thus support the acquisition, storage, analysis, and sharing of large, complex datasets through the development of tools, infrastructure and expertise. It will develop data- driven, machine-learning methods to find patterns in high-dimensional data sets in order to understand biological perturbations and potential health risks associated with exposures. These efforts include proposals for new statistical algorithms (and resulting software) on discovering which patterns of chemical mixtures have greatest potential human health impacts. As these methods require a lot of computing power, Core C leaders will work with the Berkeley Research Computing group to provide a platform for computation that provides for fast scalable solutions, as the resulting system will have over 100 CPU’s. This platform will also have direct access through the integration with Box file management system. Finally, the DMAC will manage access to Center data, metadata, analysis plans and other supporting material releases using the Open Scientific Framework (osf). In conclusion, Core C is an integral and critical component of the overall program that supports the whole life history of Center data.
Dr. Alan Hubbard
Professor of Biostatistics at UC Berkeley and Head of the Division of Biostatistics,
University of California, Berkeley