Getting Started with AnVIL
Quick Start Guides
These Quick Start guides will help new AnVIL users get started as quickly as possible. We’ve made sensible recommendations that will fit many (but not all) situations, and have included notes and links for users with more specialized situations.
For more detailed documentation covering the many ways you can make AnVIL work for you and your team, please refer to the Account Setup section below.
- Set Up Lab Accounts - Follow these instructions to get your accounts, set up billing, and set up your team members to do research with AnVIL.
- Budget Templates - Templates for calculating a budget and writing a budget justification for using AnVIL in your grant applications.
- Set up a Terra account - To register for a Terra account, you will need a Gmail account or another email account (an institutional email, for example) associated with a Google identity.
- Set up Terra account with non-Google email - If your email is not associated with a Google identity, follow these steps to create a Google account that is associated with your non-Gmail, institutional email address.
- Set up a Gen3 account Create a Gen3 account by logging in with your NIH, Google, or RAS login credentials. This allows you to use the Gen3 data explorer to create artificial cohorts over AnVIL datasets that have been indexed by Gen3.
- Link your Gen3 and Terra accounts - Follow these step-by-step instructions to link your Gen3 credentials to your Terra account. This allows you to analyze Gen3 data on Terra.
- Link your Terra and eRA Commons ID - To use controlled-access data on Terra, you will need to link your Terra user ID to your authorization account (such as a dbGaP account). Linking to external servers will allow Terra to automatically determine if you can access controlled datasets hosted in Terra (ex. TCGA, TOPMed, etc.) based on your approved dbGaP applications.
- Link your Terra identity with Google Billing - The Terra platform is free to use; you can browse showcase workspaces and the Data Library as soon as you register for an account. However operations in Terra - such as running workflows, running Jupyter Notebooks, and accessing and storing data - may incur Google Cloud Platform charges. These charges are billed by GCP and paid through your Terra billing account.
Using Terra Workspaces
- Working with workspaces - Terra workspaces are dedicated spaces where you and your collaborators can access and organize the same data and tools and run analyses together.
- Cloning a workspace - "Cloning" a workspace makes another copy of the workspace under your own billing project. Cloning creates a completely independent copy of the workspace in which you are the owner and sole user until you choose to "share" your "clone" with someone else.
- Understanding workspace access levels - Terra workspaces have three access levels: READER, WRITER, and OWNER. Each access level represents an expanded set of permissions.
- Exploring curated example workspaces - One of the best ways to get started in AnVIL is to explore curated example workspaces. These are curated workspace templates that span a variety of use cases. Standardized for completeness and ease of use, they're great as templates or introductions to help reproduce instructive results and learn established methodologies. Also see AnVIL's featured example workspaces.
Finding and Accessing Datasets
- Discovering datasets - Datasets of interest can be discovered in AnVIL’s dataset catalog, the Gen3 Data Explorer and by reviewing data-focused workspaces available to you once you are logged into Terra.
- Requesting dataset access - AnVIL's open access datasets are accessible to all upon logging into Terra or Gen3. To request access for datasets with access restrictions, see AnVIL's guides for requesting access to Controlled Access and Consortium Access datasets.
- Once your access is approved - the workspaces associated with your new datasets will be listed on your Terra workspaces tab. Clone the workspace to begin working with the dataset.
- Running GATK workflows - If you're new to running GATK on a cloud-based platform, or new to Terra, this information will help get you started. From pre-processing raw sequencing data through variant calling and joint calling, showcase workspaces provide fully reproducible workflows for critical use-cases and include extensive documentation and sample data to practice on.
- Interactive analysis with Jupyter notebooks - Jupyter notebooks are an open-source analysis environment where you can visualize and analyze data in real time to gain insight into study data. Import data including processed genomics, phenotype and transcriptomics data stored in the cloud and analyze with custom or pre-built libraries using R or Python.
- Visualizing genomic data with IGV - This article explains three ways you can use the Integrative Genomics Viewer (IGV) to examine tracks from BAM (.bam) files in Terra.
Controlling Cloud Costs
- Controlling cloud costs - Understand the costs of using key cloud services (Google Cloud Storage, Google Computer Engine, and Google BigQuery). Examples are provided to help you make informed decisions around controlling costs on Terra.
- Overview of Terra for new users - An overview of the Terra platform covering account and billing setup, accessing and managing data, pipelining analysis and interactive analysis.
- Terra training materials - A library of training materials for the Terra platform.
- Navigating the Terra user interface - An overview of the Terra user interface covering how to manage your profile, setup billing, manage groups, access, clone and share workspaces, access tools, data and curated workspaces.
- Data privacy and access - Because research is frequently collaborative, you need to be able to keep sensitive genomic data secure, but still easy to share. Terra was designed to help you balance these competing requirement