Reproducible Analysis of Human Pangenome Data using the AnVIL
Wednesday, January 26, 2022 12:00 PM to 1:30 PM EST
This workshop will explore and demonstrate open access data from the Human Pangenome Research Consortium (HPRC), an NHGRI funded effort to create a more diverse and comprehensive reference human pangenome. We will present the data and methods produced and utilized within the first year of this project, which ultimately aims to release the assembly of high-quality diploid genomes from >350 ethnically diverse individuals across five years. Currently, raw data and assemblies from 45 individuals and associated Docker-based analysis workflows written in the Workflow Description Language (WDL) are available in the AnVIL for researchers to explore and utilize. Data and workflows will continue to be publicly released as early as possible to promote open science. These data make an excellent substrate for interaction with these data types and new workspaces and methods.
Using data and workflows from the HPRC, participants of this workshop will perform hands-on exercises including:
- Registering for an AnVIL account and Google Cloud credits
- Setting up a collaborative cloud workspace in Terra
- Accessing and exploring hosted AnVIL data
- Searching for bioinformatics workflows in Dockstore and exporting them to a Terra workspace
- Configuring and launching a Docker-based WDL workflow to conduct a parallel analysis
After completing the workshop, attendees will be able to leverage AnVIL to analyze hosted datasets and launch analyses that are reproducible and scalable.
Cloud-based analysis of genomic datasets is increasingly vital for portability, reproducibility, and multi-institution collaboration, but transitioning to the cloud can be daunting. We will offer a workshop that will serve to eliminate the barriers to the adoption of these tools. Specifically, we will teach researchers how to access and utilize The Analysis, Visualization, and Informatic Lab-space (AnVIL), an environment that provides access to hosted data, reproducible tools, and collaborative workspaces, and comprehensive documentation to enable users to research in the cloud. This workshop will demonstrate how to access and explore data in AnVIL. Participants will also learn to search for analysis tools in Dockstore, a platform for sharing portable, container-based tools and workflows written in CWL, WDL, and Nextflow. Finally, they will analyze data in a Terra workspace, which is a dedicated space where researchers can access and organize the same data and tools and run analyses together.
The intended audience for this workshop are scientists who would like to find and use tools in the cloud for genomic analysis. Researchers interested in NHGRI data are especially encouraged to attend.
Basic knowledge of Python or R is recommended but not required.
Annual Meeting, general inquiries: firstname.lastname@example.org