The NCPI's participating platforms are: NHGRI's AnVIL, NHLBI's BioData Catalyst, NCI's Cancer Research Data Commons and the NIH Common Fund's Gabriella Miller Kids First Pediatric Research Program.
An overview of each platform is given below:
The NHGRI Genomic Data Science The Genomic Analysis, Visualization, and Informatics Lab-space, or AnVIL, is NHGRI's genomic data resource that leverages a cloud-based infrastructure for democratizing genomic data access, sharing and computing across large genomic, and genomic-related data sets.
In addition to downloading copies of data to local computers and servers, users will have the option to work with data in a secure cloud environment, where they can also use common bioinformatics tools and packages and develop and share their own software tools.
NHLBI BioData Catalyst is a cloud-based platform providing tools, applications, and workflows in secure workspaces. By increasing access to NHLBI datasets and innovative data analysis capabilities, BioData Catalyst accelerates efficient biomedical research that drives discovery and scientific advancement, leading to novel diagnostic tools, therapeutics, and prevention strategies for heart, lung, blood, and sleep disorders.
Though the primary goal of the BioData Catalyst project is to build a data science ecosystem, at its core, this is a people-centric endeavor. BioData Catalyst is also building a community of practice working collaboratively to solve technical and scientific challenges.
The goal of the National Cancer Institute’s Cancer Research Data Commons (CRDC) is to empower researchers to accelerate data-driven scientific discovery by connecting diverse datasets with analytical tools in the cloud. The CRDC is built upon an expandable data science infrastructure that provides secure access to many different data across scientific domains via Data Commons Framework.
The CRDC enables users to search and aggregate data across repositories via the Cancer Data Aggregator using a common data model developed by the Center for Cancer Data Harmonization. Users can access CRDC data using NCI Cloud Resources (Broad FireCloud, Seven Bridges Cancer Genomics Cloud, and Institute for Systems Biology Cancer Genomics Cloud) that bring data and computational power together to enable cancer research and discovery.
NCI Cloud Resources eliminate the need for researchers to download and store extremely large data sets by allowing them to bring analysis tools to the data in the cloud. The platforms also provide access to on-demand computational capacity to analyze these data.
The ability to combine diverse data types and perform cross-domain analysis of large cancer datasets can lead to new discoveries in cancer prevention, treatment and diagnosis, further supporting the goals of precision medicine and the Cancer Moonshot℠.
The CRDC will encompass and connect multiple cloud-based data repositories and serve as a central location to support public data sharing for NCI-funded programs.
The NIH Common Fund's Gabriella Miller Kids First Pediatric Research Program’s (“Kids First”) vision is to “alleviate suffering from childhood cancer and structural birth defects by fostering collaborative research to uncover the etiology of these diseases and by supporting data sharing within the pediatric research community.”
The program continues to generate and share whole genome sequence data from thousands of children affected by these conditions, ranging from rare pediatric cancers, such as osteosarcoma, to more prevalent diagnoses, such as congenital heart defects.
In 2018, Kids First launched the Gabriella Miller Kids First Data Resource Center, charged with building a large-scale data platform supporting clinical and genetic data from these patients and their families in order to accelerate discovery and ultimately clinical impact.
Researchers can search, access, aggregate, and analyze these data through the Kids First Data Resource Portal. Additionally, by using cloud-based individual workspaces in CAVATICA, a data analysis and sharing computation platform, researchers can cross-analyze Kids First data with data from other efforts, such as NCI’s TARGET program and consortia-based datasets like the Children’s Brain Tumor Tissue Consortium (CBTTC).
CAVATICA is a cloud based infrastructure originally developed for supporting pediatric disease research, but can support the analytics of all forms of controlled-access data in a cloud environment.
CAVATICA is powered by the Seven Bridges Platform, which meets or exceeds all NIH requirements for dbGaP or similarly controlled-access data on both Amazon Web Services (AWS) and/or the Google Compute Platform (GCP). Please see the Seven Bridges Compliance White Paper for full description of CAVATICA's security and compliance features.
For NIH Kids First data, both the Kids First Data Resource Portal and CAVATICA support user authentication and authorization to controlled-access datasets via integration with the Gen3-powered Bionimbus Trusted Partnership for access and distribution (KFDRC Framework Services).
The Kids First Data Resource enables scientists to rapidly explore shared genetic pathways and associated clinical datasets underlying diverse pediatric conditions occurring throughout development, empowering cross-disease discovery with the aim of improving preventative measures, diagnostics, and therapeutic interventions on behalf of affected children and their families.