Using AnVIL

What analysis tools can I use for data analysis on AnVIL?

  • WDL - Batch processing of GATK and other workflows
  • Jupyter - Interactive analysis with the python or R programming languages; the R environment includes a family of Bioconductor 3.10 packages
  • R Studio (coming soon) - Interactive analysis with your favorite R coding platform
  • AnVIL API library (coming soon) - Interact with AnVIL data, analysis solutions, and workflows via a command line interface.
  • Galaxy (coming soon) - Access thousands of tools via an intuitive graphical user interface for processing batch analysis with Galaxy Workflows and interactive downstream visualizations.
  • Genome Browser supported by UCSC (coming soon) - Interactive analysis of genomic visualizations.

What data are available on AnVIL?

AnVIL provides access to a diverse array of genomic data sets that can be accessed here (https://anvil.terra.bio/#library/datasets). These data include both unrestricted access and restricted access datasets. Data access requests are submitted according to the guidelines provided by the data provider or consortium.

NHGRI consortium data will be hosted primarily on AnVIL. Initial releases of data from the Centers For Common Disease Genomics (CCDG), Centers for Mendelian Diseases (CMG), Electronic Medical Records and Genomics (eMERGE) Network, and Clinical Sequencing Evidence-generating Research (CSER) consortium will be hosted on AnVIL. Researchers can apply for access to these data on dbGaP (https://dbgap.ncbi.nlm.nih.gov/). Once granted access, users can access their data on the Terra component of AnVIL by linking their eRA identities. Data will be made available to users in shared Workspaces and later accessible from Gen3.

Where can I find documentation to “get started” on AnVIL?

There are resources available on the AnVIL Portal (https://anvilproject.org/training/guides) to help users get registered on AnVIL and provide some introductory usage guides.