AnVIL Data Submission Guide
Our goal is to help researchers by providing robust and large datasets and making it easier for researchers to find and analyze the data they need. By contributing datasets, you are helping us achieve this goal.
In order to submit data into AnVIL you will need to:
- obtain required approvals/register with dbGaP.
- set up your data model.
- prepare your data for submission.
- ingest your data into AnVIL.
- QC ingested data
General Data Requirements
Make sure your data conforms to these overall data requirements, or contact the AnVIL data team.
All submitted genomic data should be based on Human reference genome assembly GRCh37 or GRCh38.
Register with dbGaP
Studies submitted to the AnVIL will still need to be registered with dbGaP as you will need to populate the data elements
To streamline the data submission process, you can register your data in dbGaP at the same time you obtain approval (Step 1).
Though there will be no requirement to submit source files or individual samples through the dbGaP portal, the dbGaP consent codes will be used to determine data access. Studies with multiple consent codes will be split into individual data workspaces based on cohort and consent pairings. External researchers can use dbGaP to apply for access, and a completed and approved DAR will permit dbGaP to link this access grant to Terra.
All individual-level human genomic and phenotypic data must conform to the NIH Genomic Data Sharing Policy. This includes the expectation that participants [are/were] explicitly consented for data sharing.
Access control within the AnVIL is governed by three major groups - developer access, consortium access, and external researcher access (via dbGaP). For more information, see Data Access Controls.
Please contact the AnVIL Outreach team with support and training requests at email@example.com.