Transferring Data Into AnVIL

This document details the data transfer protocol during the transition to Gen3 as the primary data repository for the AnVIL. This process will evolve during the transition period.

Data Transfer Workflow

  1. Program Officers meet with AnVIL Program Management to consolidate information about the dataset (current dbGAP status, number of consortium members, alignment pipeline, functional equivalence, determining reprocessing if-necessary, what are the available phenotypes, file size, file formats, file quantity, any special considerations for access, etc).
  2. Workspaces, Buckets and Auth groups are created by Terra Program Management for each cohort and consent combination within the dataset.
  3. Data is Transferred to the bucket.

    1. For small datasets or technically savvy users, data can be directly imported using a tool like gsutil or via the internal Terra interface.
    2. For larger, more complex datasets, the Terra team will interface with a POC from the Pipeline Ops team who can facilitate the transfer from the data’s current location to its eventual home in a bucket.
    3. Sequencing data transferred to google buckets is indexed in Gen3, subject to provided access control mechanisms and data structure (split by cohort-consent)
  4. Work with the consortium to address and interpret phenotypic questions to help them to get their data into a more platform-usable format.
  5. Phenotypic data is:

    1. Uploaded to the workspace by a member of the Terra team to associate samples, files, and phenotypes together.
    2. Submitted to the graph by members of the University of Chicago team to allow for faceted search and virtual cohort creation.
  6. Access granted to program officers to set up workspace description, etc.
  7. Authorization groups are populated.

    1. AnVIL_Devs are added as a secondary group.
    2. The dbGaP telemetry list is linked to the Terra whitelist.
    3. Consortium Officer granted access to add members to consortium whitelist.
Improve this pageContent guide