Consortium Data Access Guidelines
Overview
These guidelines describe AnVIL's expectations for consortia accessing and sharing data within the AnVIL cloud environment. For the purpose of these guidelines, consortium data refers to data that the consortium is collectively generating or jointly using for its own collaborative research activities.
Consortium data access means data sharing among consortium members for the consortium's primary research activities conducted prior to public release of the data (e.g., quality assurance/quality control, internal analyses, manuscript preparation). When consortium members use data generated by the consortium, Data Access Committee approval is typically not required.
Some consortia may utilize publicly available data for collaborative analyses, or use a mix of datasets generated by the consortium and publicly accessible datasets. Access to non-consortium data follows standard procedures (e.g., DAC approval).
AnVIL facilitates access and sharing of consortia data using Terra Authorization Domains.1 Consortia are responsible for setting their own access policies, maintaining the list of consortium members with access privileges (the "consortium access list"), and ensuring that consortium data are ultimately shared with the broad scientific community in accordance with the NIH data sharing policies. Consortium access may not be used to delay broad sharing.
Duration of Consortium Access
Consortium access continues through the Project End Date of the NIH award supporting the consortium's designated consortium administrator (contact person). When multiple awards support the consortium, the consortium administrator's award determines the access period. Access may extend if the award is renewed or NIH approves a no cost extension (NCE).
Note: Non-NIH-funded consortia must provide AnVIL with an expected project end date to establish access.
Extensions of Consortium Access
Consortia may request time-limited extensions, of up to one year, to complete scientific activities such as finishing analyses, revising manuscripts, or responding to reviewer comments or addressing a temporary lapse in funding. Such requests must:
- include a clear scientific justification;
- have concurrence from the NIH Program Officer; and
- identify a consortium administrator willing to serve during the extension period and ensure ongoing compliance with these guidelines.
Extensions should be limited in duration, purpose-specific, and must not substitute for timely data submission or transition to standard access pathways. Additional extensions may be requested if needed.
Intramural Investigator Considerations
For Intramural NHGRI Investigators, consortium access will be confirmed with the Clinical and/or Scientific Director upon its establishment and each subsequent quadrennial review. Access generally ends when consortium activities conclude or when the investigator's project is closed out. For Intramural investigators outside of NHGRI, consortium access will be considered by AnVIL program directors on a case-by-case basis.
Access After the Consortium Period
After the consortium period ends, investigators may continue accessing data through standard mechanisms such as streamlined submitter access via dbGaP, submitting a Data Access Request, or using any locally retained data copies consistent with NIH policies. See below for additional guidance on post-funding considerations for consortia projects using AnVIL.
Users with questions about the AnVIL Consortia Data Access Guidelines should contact anvil-data@broadinstitute.org.
General Expectations for Consortia Access to Data
All consortia working with AnVIL should:
- Use of Terra Authorization Domains to control access to pre-release data, consortium-generated, other controlled-access data, etc. by authorized consortium members.2
- Clearly document data use limitations, including any additional approvals that may be needed, particularly if different subsets of the consortium data have different data use limitations.
- Define what constitutes consortium membership (e.g., investigators funded through the consortium and members of their lab, criteria for affiliate membership) and any further requirements and expectations for data access privileges.
- Maintain a process for updating membership, for instance when an individual leaves the consortium during the project.
- NOT use consortium access to provide streamlined access to non-consortium researchers.
Consortia submitting data to AnVIL to make data available to the broader scientific community should also:
- Register all large-scale genomic-phenotypic datasets that will be made publicly available (whether through controlled-access or unrestricted-access) prior to working with AnVIL curators on data ingestion. AnVIL can help determine the best study registration structure, offering guidance on the advantages and disadvantages of different approaches in complex or multi‑study contexts.
- Establish timelines and milestones for releasing data rapidly and completely in accordance with the NIH GDS Policy to the community through AnVIL.
- Identify and communicate any datasets that are for consortium use only (i.e., will not be made publicly available).
- These datasets will be owned, managed, and maintained by the consortium.
- If later designated for release, AnVIL can then prioritize ingestion.
Consortium Member Responsibilities
Consortium members must comply with consortium policies, and ensure data remain secure. Consortium members must also:
- Use Two Factor Authentication on their Terra account (Google or Microsoft).
- NOT provide access to users without the appropriate permissions and supervision.
- Abide by the terms of the consortium agreement and any data use limitations.
- Ensure all personnel under the member's supervision are aware of and adhere to all data use limitations and all terms of consortium agreements (MOUs, DUAs, etc.).
- Notify the consortium administrator immediately (within 24 hours) upon leaving the consortium.
- Report potential data security incidents to the consortium administrator, AnVIL staff, and NIH staff within 24 hours and follow any consortium specific protocols as necessary.
Consortium Administrator (Contact Person) Responsibilities
Each consortium working with AnVIL should have a defined Consortium Administrator (Contact Person), who serves as the point person for discussions between the Consortium and AnVIL. This person should be communicated to the AnVIL team and appropriate NIH Program Officers.
The AnVIL team requires the use of Terra groups for managing the data access list and permissions. The consortium administrator creates the Terra group(s) for managing access to consortia data by authorized consortium members. The administrator is responsible for maintaining the consortium access list, and for helping to ensure that the consortium members handle pre-release genomic and associated data responsibly. The administrators' responsibilities include ensuring the following:
- The consortium access list is current and up-to-date.
- The consortium access list is updated according to the consortium's protocol, ideally within 3 working days of being notified, when a member leaves the consortium.
- The consortium has a policy and protocol for data security and management incidents, and works with consortium members, AnVIL staff and NIH staff to implement those protocols as necessary.
- Members confirm that they have Two Factor Authentication active on their Google Account before granting them access.
- Members confirm that they will abide by the terms of the consortium agreement and any data use limitations before granting them access.
- Members only receive access to datasets they are approved for (i.e., if there are multiple datasets and a consortium member is only approved for access to a subset of those datasets, access should be organized via distinct Terra Authorization Domains).
- Members are aware of and obtain any necessary approvals beyond consortium membership for doing secondary research on the data (e.g., IRB approval, additional collaborator approval, etc.).
Post-funding Considerations
Following the conclusion of a consortium's funded activities, AnVIL supports continued responsible stewardship while enabling appropriate scientific use. Consortia should establish a clear transition plan prior to the end of the funding period to finalize publications, complete data management activities, and resolve outstanding metadata or documentation needs. Consortia are encouraged to plan for these tasks early and incorporate data release milestones into project close‑out procedures. This requirement is consistent with NIH data sharing policies and longstanding expectations for timely and comprehensive release of scientific data.
After funding ends, AnVIL will support former consortium members or submitting institutions in making necessary corrections or withdrawals to previously released datasets. These updates may include addressing data errors, consent‑related withdrawals, or other compliance‑driven adjustments.
Post‑funding updates depend on submission structure:
- Centralized datasets may require versioned release of the entire dataset.
- Decentralized submissions (e.g., site‑specific phsIDs) allow site-level updates.
Former consortium members generating new data related to an original study may register and submit these data under a new study accession (pending acceptance by AnVIL through the typical data submission application process). The new study description should clearly articulate its relationship to the original consortium dataset to support transparency and downstream usability. This approach supports continued scientific progress while maintaining clear provenance and attribution.
Former consortium members who were secondary users of controlled‑access data may continue to collaborate in AnVIL Workspaces if they maintain appropriate approvals (e.g., active Data Access Requests permitting collaborative analysis).
Version History
| Version | Effective Date | Link | Content Changes |
|---|---|---|---|
| 1 | 2020‑10‑07 | e86e65b | Initial version |
| 1.1 | 2020‑10‑09 | deb9e71 | Removed "and -supported" from GDS Policy description |
| 1.2 | 2021‑06‑10 | f7d3d02 | Added Google Doc source link; whitespace/trailing space cleanup |
| 1.3 | 2021‑06‑11 | c909671 | Removed Google Doc link; minor punctuation fixes (Oxford commas, "closing-out" to "closing out", "consortium specific" to "consortium-specific") |
| 2 | 2022‑05‑27 | 13be633 | Major rewrite. Replaced entire structure. New Overview section defining "consortium data" and access policies. Added Terra Authorization Domains requirement. Added 6-month termination policy. Restructured responsibilities into General Expectations, Consortium Member, and Consortium Administrator sections. Added footnotes for billing guidance, non-NIH consortia, and Authorization Domain usage. |
| 2.1 | 2024‑06‑13 | 040e546 | Added Intramural NHGRI/NIH IC investigator access policies. Updated 2FA requirement to include Microsoft accounts alongside Google. Minor footnote formatting fix. |
| 3 | Date pending | GitHub link pending | Substantial update. Reorganized Overview and added new "Duration of Consortium Access" section replacing prior 6-month termination rule. Added clarified extension policy including one-year limit, allowance for multiple extensions, and example justifications. Revised expectations for consortia data release, added expanded post‑funding guidance, and restructured document for clarity (Overview, Duration, Extensions, Intramural policies, Access After Period, Responsibilities). Updated language throughout for consistency and conciseness. Added changelog at the bottom to capture modifications to the guidelines over time. |
Footnotes
-
If consortium-generated data will be released to the broader scientific community, AnVIL will set up the workspace using the AnVIL billing project and provide the consortium with access for data submissions. If consortium-generated data are not going to be released or data release is tentative, then the consortium would need to set up their own billing project, set up their workspaces, and can maintain access and manage the data. ↩
-
Authorization Domains are essential when working with controlled-access data to create data files that also require controlled-access (e.g., joint calls, harmonized phenotypes, imputed data, GSR designated as "sensitive"). ↩