Complex Chromosomal Changes Involving Chromosome 9 Data Release on the AnVIL Platform
Sharing Our Data on AnVIL: Complex Chromosomal Changes Involving Chromosome 9
Laboratory of Tychele N. Turner, Ph.D.
Washington University in St. Louis
Department of Genetics
We are excited to announce that our latest dataset is now available through the AnVIL platform (phs004000.v1.p1). This release of the study "Assessment of Complex Chromosomal Changes in De-Identified Cell Lines" focuses on 16 individuals with complex variation involving chromosome 9.
Access requests for this controlled data are available through dbGaP under accession phs004000.v1.p1 and the data is available at https://explore.anvilproject.org/datasets or https://duos.org/datalibrary/anvil.
Sample Selection
The starting point for this project was the Coriell Cell Repositories “Chromosomal Abnormalities” Collection. From this resource, we found a set of de-identified cell lines with structural variants on chromosome 9. These samples provided a valuable opportunity to study complex rearrangements using modern sequencing technologies.
DNA Preparation
To ensure data quality, we requested that Coriell generate new DNA extractions directly from the original cell lines, which are preserved as either lymphoblastoid cell lines (LCL) or fibroblasts. From each line, Coriell produced:
- High molecular weight DNA
- Standard DNA
Both were derived from the same cell passage, providing consistency and reducing the possibility of variation due to cell culture. This approach gave us a solid foundation for downstream sequencing and analysis.
Sequencing Strategy
We generated two complementary sequencing datasets:
- Illumina short-read whole-genome sequencing on all 16 individuals
- PacBio long-read whole-genome sequencing on a subset of 8 individuals
Short-read sequencing allowed us to capture high-quality data across the entire cohort, while long-read sequencing provided a deeper view into the structural complexities of chromosome 9 in selected cases. The combination of both approaches provides a dataset that can support a broad range of analyses, from fine-scale variant discovery to genome architecture reconstruction.
Why We Are Sharing This Data
Complex chromosomal rearrangements are difficult to study but are highly relevant to understanding genome biology and human disease. Chromosome 9 is particularly notable for its structural variation, and this dataset represents a step toward characterizing those complexities with greater resolution. By making these data available through AnVIL, we aim to create a resource that can be used by researchers developing new methods, validating discoveries, or investigating the biological consequences of these structural variants.
The Dataset
The dataset includes 16 individuals from LCL or fibroblast sources. As mentioned above, all samples have Illumina WGS sequence data and 8 samples have PacBio WGS sequencing completed. Each individual’s dataset represents a unique case of structural complexity, and together they form a dataset that we hope will be useful to the broader genomics community.
Publication and dbGaP link
- Please also read our publication: https://link.springer.com/article/10.1186/s13073-025-01563-0
- dbGaP link: https://dbgap.ncbi.nlm.nih.gov/study/phs004000.v1.p1/