Data Analysts - Guides and Tutorials
This section lists guides, tutorials, and other resources to help data analysts find and process data and share results in the AnVIL cloud.
- Getting Started with Gen3 - An overview of the Gen3 platform and how to use Gen3 to access AnVIL datasets, create cohorts, and export them to Terra workspaces.
- Getting Started with Bioconductor - Guides helping R / Bioconductor users start RStudio or Jupyter for interactive analysis and workflows for large-scale data processing.
Bioconductor Popup Workshops
- Using R / Bioconductor in AnVIL - An introduction to the AnVIL cloud computing environment. We learn how to create a Google account to use in AnVIL. We explore key concepts related to workspaces and billing projects. We explore creating a Jupyter notebooks-based cloud environment, and an RStudio cloud environment.
- The R / Bioconductor AnVIL Package for Easy Access to Buckets, Data, Workflows, and Fast Package Installation - An exploration of how workspaces provide a framework for managing data and large-scale analyses using the HCA Optimus Pipeline and 1000G-high-coverage-2019 workspaces and R using the AnVIL package.
- Running a Workflow: Bulk RNASeq Differential Expression from FASTQ Files to Top Table - How to configure and run a workflow, based on the Bioconductor-Workflow-DESeq2 workspace. The workflow starts with FASTQ files and transforms them using salmon to the inputs required for Bioconductor DESeq2 analysis of differential expression.
- Single-cell RNASeq with 'Orchestrating Single Cell Analysis' in R / Bioconductor - An introduction to a resource, developed primarily by Aaron Lun of Genentech, Inc., that employs Bioconductor resources for many aspects of the analysis of single-cell RNA-seq data. The resource is a "computable book" written in R Markdown, published at https://bioconductor.org/books/release/OSCA/.
- Using AnVIL for Teaching R / Bioconductor - A case study of using AnVIL to teach R for a Biostatistics course and provides essentials for using AnVIL for other instructional efforts.
- Reproducible Research with AnVILPublish - An exploration of elements of reproducible research with the AnVILPublish package. We will illustrate how to make a docker container tailored publishing AnVIL packages and then emphasize the merits of an R package structure for organizing research activities in a manner that emphasizes provenance and reproducibility.
- Getting Started with Galaxy - A step-by-step tutorial demonstrating how to compute quality metrics of unaligned reads, align the reads to a reference genome using bowtie2, plot a coverage histogram, call variants using FreeBayes, and then summarize the variant calls using bcftools.