NHGRI Analysis Visualization and Informatics Lab-space


IntroductionData AnalystsInvestigatorsData Submitters

Consortium Guidelines for AnVIL Data Access


These guidelines represent AnVIL’s expectations for consortia access to data on the AnVIL. For the purpose of these guidelines “consortium data” refers to the data used by the consortium for the primary research of the consortium. Consortium data access is defined as data sharing between consortium members for the primary research of the consortium. Primary research includes data quality assurance/quality control, analyses, and preparations for submission of data for release to the scientific community. If members of the consortium are using data generated by themselves or other members of the consortium, access to consortium data by consortium members does not require Data Access Committee approval. Some consortia may utilize publicly available data for collaborative analyses, or use a mix of datasets generated by the consortium and publicly accessible datasets. Typical data access procedures, such as Data Access Committee approval, is required for access to data not generated by the consortium.

AnVIL will facilitate data access and sharing of consortia data in a cloud environment via Terra Authorization Domains.1 It is the responsibility of the consortium to set consortium data sharing and access policies and to maintain the list of consortium members with access privileges (the “consortium access list”), which is used to manage access to data. In most cases, the expectation is that consortium-generated data will be shared through AnVIL with the broader scientific community in accordance with the NIH GDS Policy. Consortium data access should not be used to delay/avoid broad data sharing with the scientific community.

Consortium access will be terminated six months after the consortium’s Project End Date of the related NIH grant(s), at which time the consortium members will be able to obtain access through typical data access procedures. Extension requests for consortium data access can be made by the NIH program officer.2 For Intramural NHGRI Investigators, consortium access will be verified with the Clinical and/or Scientific Director upon its establishment and each subsequent quadrennial review and will generally be terminated when consortium activities have ended or following close-out of the NIH investigator’s project. For Intramural investigators from other NIH ICs, consortium access will be considered by AnVIL program directors on a case-by-case basis.

General Expectations for Consortia Access to Data

All consortia working with AnVIL should:

  • Make use of Terra Authorization Domains to control data access (e.g., to control access to pre-release data, consortium-generated controlled-access data, other controlled-access data, etc.) by authorized consortium members.3
  • Have clear documentation of data use limitations, including any additional approvals that may be needed, particularly if different subsets of the consortium data have different data use limitations.
  • Clearly define what it means to be a consortium member (e.g., investigators funded through the consortium and members of their lab, criteria for affiliate membership) and any further requirements and expectations for data access privileges.
  • Have a plan for communicating and managing changes to consortium membership, for instance when an individual leaves the consortium during the project.
  • NOT use this process to provide streamlined access to any researchers that are not part of the consortium.

Consortia submitting data to AnVIL to make data available to the broader scientific community should also:

  • Register all large-scale genomic-phenotypic datasets that will be made publicly available (whether through controlled-access or unrestricted-access) prior to working with AnVIL curators on data ingestion.
  • Have a plan with clear timelines and milestones for releasing data rapidly and completely in accordance with the NIH GDS Policy to the community through AnVIL.
  • Identify and communicate any datasets that are for consortium use only (i.e., will not be made publicly available).
    • These datasets will be owned, managed, and maintained by the consortium
    • If later on in the process it is determined by the consortium that the data will be registered and released, the data can then be prioritized by AnVIL staff

Consortium Member Responsibilities

Consortium members are responsible for complying with consortium data access and sharing policies, and for ensuring data remain securely within the consortium. Consortium members must also:

  • Establish Two Factor Authentication on the account they use to access Terra (Google or Microsoft).
  • NOT provide access to users without the appropriate permissions and supervision.
  • Abide by the terms of the consortium agreement and any data use limitations.
  • Ensure all personnel under the member’s supervision are aware of and adhere to all data use limitations and all terms of consortium agreements (MOUs, DUAs, etc.).
  • Inform the consortium administrator upon leaving the consortium immediately, no later than 24 hours after official notification.
  • Report any potential data security incidents to the consortium administrator, AnVIL staff, and NIH staff within 24 hours and follow any consortium specific protocols as necessary.

Consortium Administrator (Contact Person) Responsibilities

Each consortium working with AnVIL should have a defined Consortium Administrator (Contact Person), who serves as the point person for discussions between the Consortium and AnVIL. This person should be communicated to the AnVIL team and appropriate NIH Program Officers.

The AnVIL team requires the use of Terra groups for managing the data access list and permissions. The consortium administrator creates the Terra group(s) for managing access to consortia data by authorized consortium members. The administrator is responsible for maintaining the consortium access list, and for helping to ensure that the consortium members handle pre-release genomic and associated data responsibly. The administrators’ responsibilities include ensuring the following:

  • The consortium access list is current and accurate.
    • The consortium access list is updated according to the consortium’s protocol, ideally within 3 working days of being notified, when a member leaves the consortium.
  • The consortium has a policy and protocol for data security and management incidents, and works with consortium members, AnVIL staff and NIH staff to implement those protocols as necessary.
  • Consortium members confirm that they have Two Factor Authentication active on their Google Account before granting them access.
  • Consortium members confirm that they will abide by the terms of the consortium agreement and any data use limitations before granting them access.
  • Consortium members only have access to appropriate datasets (i.e., if there are multiple datasets and a consortium member is only approved for access to a subset of those datasets, access should be organized via distinct Terra Authorization Domains), and.
  • Consortium members are aware of and obtain any necessary approvals beyond consortium membership for doing secondary research on the data (e.g., IRB approval, additional collaborator approval, etc.).

  1. Consortia Data Management, Analysis, and Billing Guidance - If consortium-generated data will be released to the broader scientific community, then AnVIL will set up the workspace with the AnVIL billing project and provide the consortium with access for data submissions. If consortium-generated data are not going to be released or data release is tentative, then the consortium would need to set up their own billing project, set up their workspaces, and can maintain access and manage the data.
  2. Non-NIH funded consortia will need to provide AnVIL with an indication of the consortium’s project end date in order to receive access through a consortium Terra group.
  3. It is essential to make use of an Authorization Domain when the consortium is using controlled-access data to create data files that also require controlled-access (e.g., joint calls, harmonized phenotypes, imputed data, GSR designated as “sensitive”).
5 - QC DataData Withdrawal Procedures
Improve this pageContent guide