Using AnVIL
What analysis tools can I use for data analysis on AnVIL?
- WDL - Batch processing of GATK and other workflows
- Jupyter - Interactive analysis with the python or R programming languages; the R environment includes a family of Bioconductor 3.10 packages
- R Studio - Interactive analysis with your favorite R coding platform
- Galaxy - Access thousands of tools via an intuitive graphical user interface for processing batch analysis with Galaxy Workflows and interactive downstream visualizations.
- AnVIL API library (coming soon) - Interact with AnVIL data, analysis solutions, and workflows via a command line interface.
- Genome Browser supported by UCSC (coming soon) - Interactive analysis of genomic visualizations.
What data are available on AnVIL?
AnVIL provides access to a diverse array of genomic data sets that can be accessed here (https://anvil.terra.bio/#library/datasets). These data include both unrestricted access and restricted access datasets. Data access requests are submitted according to the guidelines provided by the data provider or consortium.
NHGRI consortium data will be hosted primarily on AnVIL. Initial releases of data from the Centers For Common Disease Genomics (CCDG), Centers for Mendelian Diseases (CMG), Electronic Medical Records and Genomics (eMERGE) Network, and Clinical Sequencing Evidence-generating Research (CSER) consortium will be hosted on AnVIL. Researchers can apply for access to these data on dbGaP (https://dbgap.ncbi.nlm.nih.gov). Once granted access, users can access their data on the Terra component of AnVIL by linking their eRA identities. Data will be made available to users in shared Workspaces and later accessible from Gen3.
Which of the NHGRI consortium data are available on AnVIL?
Initial releases and project IDs of datasets from the CCDG, CMG, eMERGE, CSER, and GTEx can be found on the AnVIL Data Dashboard.
Where can I find documentation to “get started” on AnVIL?
There are resources available on the AnVIL Portal to help users get registered on AnVIL and provide some introductory usage guides.
Can NIH Intramural Investigators use AnVIL?
Yes. AnVIL is a community resource that is available to both NIH extramural and intramural researchers. Interested intramural investigators should contact AnVIL Program Staff (anvil@mail.nih.gov) to discuss whether AnVIL can serve their lab’s data sharing needs. If so, NIH Intramural investigators and their lab members may deposit data in the AnVIL for sharing with the scientific community and may request access to data stored in AnVIL following the procedures described in these FAQs. Intramural investigators will have access to AnVIL workspaces in a secure cloud-based environment. They will be subject to the same storage, compute, and egress charges as extramural investigators, and must set up a Google Cloud billing account to get started (see here for more information). Intramural investigators may wish to contact the NIH STRIDES Initiative to explore discounts on Google cloud services, in addition to other training and professional services.