Data Consortia
AnVIL hosts data from the following programs and also allows users to submit their own data to the platform.
Current Consortia
CARD
CARD is a collaborative initiative of the National Institute on Aging (NIA) and the National Institute of Neurological Disorders and Stroke (NINDS) that supports basic, translational, and clinical research on Alzheimer’s disease and related dementias. CARD’s central mission is to initiate, stimulate, accelerate, and support research that will lead to the development of improved treatments and preventions for these diseases. Through CARD, researchers work across scientific domains and disease boundaries to bridge basic, preclinical, and clinical research with the goal of accelerating translational research on these devastating diseases.CCDG
The Centers for Common Disease Genomics are a collaborative large-scale genome sequencing effort to comprehensively identify rare risk and protective variants contributing to multiple common disease phenotypes.CMG
The Centers for Mendelian Genomics is a multi-center collaboration aimed at identifying the genes responsible for Mendelian phenotypes by whole exome and whole genome sequencing.GREGoR
The GREGoR Consortium (Genomics Research to Elucidate the Genetics of Rare diseases) seeks to discover the cause of currently unexplained Mendelian genetic disorders through the application and standardization of new omic technologies and diagnostic approaches.- Learn more about the GREGoR Dataset and GREGoR consortium
- Learn how to Access GREGoR Data
- View the GREGoR Consortium's studies and workspaces
GTEx
The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq.1000 G
The 1000 Genomes Project, launched in January 2008, is an international research effort to establish variation profiles across the human population. This open access data set continues to be a valuable resource to geneticists.CSER
The Clinical Sequencing Evidence-Generating Research (CSER) consortium is a national multi-site research program funded by the National Human Genome Research Institute (NHGRI), the National Cancer Institute (NCI), and the National Institute on Minority Health and Health Disparities (NIMHD).Active 2011 to 2023eMERGE
The Electronic and MEdical Records and Genomics project (eMERGE) is a national network organized and funded by the NHGRI that combines DNA biorepositories with electronic medical record (EMR) systems for large scale, high-throughput genetic research in support of implementing genomic medicine.PAGE
The Population Architecture Using Genomics and Epidemiology Consortium investigates ancestrally diverse populations to gain a better understanding of how genetic factors influence susceptibility to disease.HPRC
The Human Pangenome Reference Consortium aims to modernize the human reference to include a collection of diverse and highly accurate, haplotype-phased genome assemblies. This initiative will generate new technical standards in genome sequencing, scalable and reproducible assembly methods, and pangenomic tool development to ensure comprehensive variant discovery.T2T
The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate accurate and gap-free assemblies of the human genome and the genomes of other species. The initial focus was on de novo assembling the first complete reference human genome known as CHM13.
Leveraging PacBio HiFi sequencing and Oxford Nanopore ultra-long reads, the CHM13v1 reference genome boasts remarkable features. These include an estimated sequence accuracy exceeding QV70, correction of structural errors in the GRCh38 reference genome, and the addition of over 100 Mbp of novel sequence compared to GRCh38.
CHM13v1 unlocks complex regions of the genome for clinical and functional study. Additionally, the T2T-CHRY Workspace utilizes the T2T-CHM13v2.0, which provides the first complete sequence for a human Y chromosome from a separate donor (HG002).
T2T-CHM13v2.0 was also used as a reference genome for investigating short-read variant calling, incorporating data from the 1000 Genomes Project and the Simons Genome Diversity Project. Another effort from the T2T consortium is the T2T-GreatApes Project which employs PacBio HiFi and Oxford Nanopore ultra-long reads, advancing our understanding of great ape genomics. It evaluates the impact of T2T-chrXY assemblies on read alignments and variant calling across 129 individuals from 11 great ape subspecies, providing reference genomes for various ape species.
Planned Consortia
The following consortia are planned for data ingestion. Additional consortia are under consideration and will be listed as they are approved.
- Covid19hg - The COVID-19 host genetics initiative
- GTEx v9 - Genotype-Tissue Expression Project
- NIA - National Institute of Aging
- NIMH - National Institute of Mental Health
- UDN - Undiagnosed Disease Network