AnVIL Portal
  • Introduction
  • Getting Started
  • Guides and Tutorials
  • Introduction to Terra
  • Introduction to Dockstore
  • Understanding Cloud Costs
  • Account Setup
  • Overview of Account Setup
  • Obtaining a Google ID
  • Creating a Terra Account
  • Billing Setup
  • Overview of Billing Concepts
  • Creating a Google Cloud Billing Account
  • Accessing Data
  • Discovering Data
  • Requesting Data Access
  • Data Access Controls
  • Bringing Your Own Data
  • Running Analysis Workflows
  • Using Example Workspaces
  • Running GATK in Terra
  • Running Galaxy Workflows from Dockstore
  • Running Interactive Analyses
  • Running Jupyter Notebooks in AnVIL
  • Running R / Bioconductor in AnVILL
  • Running Galaxy in AnVIL
  • MOOC
  • What is AnVIL?
  • Cloud Computing
  • Cloud Costs
  • Use Case: GATK
  • Use Case: GWAS
  • Use Case: eQTL
  • Video Gallery
  • Anvil
  • Terra
  • Dockstore
  • Galaxy
  • Seqr
  • Workshop Archive
  • Workshop Archive
  • Reference
  • Cross Platform Data Access with GA4GH DRS in Terra

Galaxy Running in AnVIL / Terra

We are pleased to announce that Galaxy is now available within Terra, AnVIL's cloud compute environment.

Overview of Galaxy in the Cloud

Launching Galaxy in AnVIL

To access Galaxy use the new "Create a Cloud Environment for Galaxy" feature under Notebooks.

Create a cloud environment for Galaxy

This will take you to the AnVIL branded version of Galaxy!

AnVIL branded Galaxy

From the AnVIL branded version of Galaxy, users can browse files in their AnVIL/Terra Workspace and perform a variety of genomics research.

Browsing files in the AnVIL/Terra Workspace

bcftools

Step-by-Step Tutorial

The step-by-step tutorial below demonstrates how to compute quality metrics of unaligned reads, align the reads to a reference genome using bowtie2, plot a coverage histogram, call variants using FreeBayes, and then summarize the variant calls using bcftools.

I. Launching Galaxy

  1. To access it, visit the AnVIL portal and click on "Launch Terra". Step 1
  2. This will take you to the Terra sign in page, which allows you to sign in using your Google credentials. Step 2
  3. If this is the first time you are using AnVIL, you should first link your AnVIL account to the NHGRI AnVIL Data Commons Framework Services from your AnVIL profile page (https://anvil.terra.bio/#profil). This is done using the bottom link on the right-hand side and signing in using your ERA commons identity. Step 3
  4. After signing in, you should see that your account is now linked. You will need to renew your link every 30 days. Step 4
  5. Once your accounts are linked, return to the workspace list available at https://anvil.terra.bio/#workspaces. Step 5
  6. Galaxy must be launched from a workspace. This can either be an existing workspace that has data already loaded, or could be from a new workspace. For this example, we will create a new workspace using the "Create New Workspace" dialog. Note you will also need to set up and select a Billing project to be associated with the Workspace. Step 6
  7. For this example, we will load (simulated) microbial sequencing data available here: asm.tgz. After downloading the asm.tgz file, expand the archive and upload the data to the Terra workspace by dragging and dropping the files from your local computer (or using the + button) into the Files pane in the Data tab. Step 7 Step 7 - loaded
  8. Next click on the "Notebooks" tab to find the "Create a Cloud Environment for Galaxy" button. Step 8
  9. Clicking on the "Create a Cloud Environment for Galaxy’ button brings up the "Cloud environment" launch panel. Step 9
  10. Clicking next then shows the "Create" panel. Step 10
  11. After clicking "Create" you will see a new icon at the top showing "Galaxy Provisioning". It will take approximately 10 minutes for Galaxy to be fully provisioned and initialized. Step 11
  12. After provisioning, you will be notified that you can now launch Galaxy. Step 12

II. Welcome to Galaxy in AnVIL

  1. Clicking "Launch Galaxy" will take you to the Galaxy welcome screen. Step 13
  2. Click on the data upload tool to load your data into Galaxy. Step 14
  3. This will display the data browser. Step 15
  4. Then click "Choose remote files" to access your AnVIL/Terra Workspace. Step 16
  5. Browse inside your workspace to "Other Data". Step 17
  6. And then "Files/". Step 18
  7. Here you will see all of the data you loaded into your AnVIL Workspace. Step 19
  8. Select all of the files to load into Galaxy. Step 20
  9. Clicking "Ok" will finalize the selection. Step 21
  10. After clicking "Start" the data will be transferred into Galaxy. Step 22
  11. You can then "Close" the data picker to see the main Galaxy interface. Step 23

III. Running Tools in Galaxy

  1. On the left-hand tool panel, expand the "FASTQ Quality Control" menu and click on "FastQC". This will automatically pick the most recent item in your history (frag180.1.fq). Step 24
  2. Clicking "Execute" at the bottom of this panel will run the tool. Step 25
  3. As the Job is running, the task will be added to the history panel on the right. Step 26
  4. Once the job is complete you will see the job turn green. Step 27
  5. Click the "Eye" view data tool to show the results. Step 28
  6. This shows the reads have overall good quality, with a decrease trend over the read length (as expected). Step 29
  7. Next align the reads with Bowtie2. Make sure to select paired-end reads, fastq #1 should be "frag180.1.fq" and fastq #2 should be "frag180.2.fq". Step 30
  8. Then pick "ref.fa" as your reference genome. Step 31
  9. Clicking "Execute" will show Bowtie2 launching. Step 32
  10. Next use the plotCoverage tool to display the coverage histogram. Note the results from Bowtie2 will automatically be selected since that is the only compatible file format. Step 33
  11. Click "Execute" to show plotCoverage running. Step 34
  12. Once complete, click the view data icon for the plotCoverage image (step 9) to show the coverage distribution. Step 35
  13. Next run FreeBayes to call SNVs and indels in the sample. Make sure to select "History" as the source of the reference genome. Step 36
  14. Once FreeBayes is complete, you can click on the view data icon to display the VCF file containing the variants. Step 37
  15. The last step is to summarize the variant calls using "bcftools stats". Step 38
  16. Once this is complete, you can view the summary of the VCF file. You should see that there are 1908 SNVs in the sample. Step 39

IV. Shutting Down

  1. When you are done with Galaxy, you will need to stop your running instance of Galaxy to stop the charges. Return to the AnVIL workspace where you launched Galaxy. Step 40
  2. Click "Galaxy Running" to display the administrative panel. Step 41
  3. Click "Delete" to stop and delete your Galaxy instance. Note this will delete all data and results files from your Galaxy session. Step 42
  4. After your running version of Galaxy is done, the "Galaxy Running" icon will disappear. Step 43

Help us make these docs great!
All AnVIL docs are open source. See something that’s wrong or unclear? Submit a pull request.
Make a contribution