Journal article Open Access

The Genomes of the Collaborative Cross

Srivastava, Anuj; Morgan, Andrew P.; Najarian, Maya L.; Sarsani, Vishal Kumar; Sigmon, John Sebastian; Shorter, John; Kashfeen, Anwica; McMullan, Rachel C.; Williams, Lucy H.; Guisti, Paola; Ferris, Martin T.; Sullivan, Patrick; Hock, Pablo; Miller, Darla; Bell, TImothy A.; McMillan, Leonard; Churchill, Gary A.; Pardo-Manuel de Villena, Fernando

The Collaborative Cross (CC) is a multiparent recombinant inbred strain mouse panel derived from eight founder inbred strains. A distinct advantage of recombinant inbred panels is that detailed characterization of their genomes does not need to be performed by each user. Until now the CC genomes were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource had the advantage that it captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes and resulted in uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30X coverage of a single male per strain. Sequencing results in a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples have significant reduction in the genome-wide haplotype frequencies of two of the wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by genetic drift on the CC genomes. The CC strains represent an extreme example of the principle that genetic drift is expected to have maximum impact in populations with small effective size and high level of inbreeding. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 per Gb per generation. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Overall, genetic drift has increased the number of variants segregating among CC strains by more than 2%. Approximately one third of the CC inbred strains have acquired large deletions (>10kb) many of which overlap known coding genes and functional elements. In conclusion we provide a critical resource to users of the CC increase threefold the number of mouse inbred strain genomes available publicly and provide a striking example of the effect of genetic drift on common resources.

Files (103.0 MB)
Name Size
6.0 MB Download
112.4 kB Download
128.1 kB Download
79.8 MB Download
116.3 kB Download
16.9 MB Download
78.2 kB Download
All versions This version
Views 5353
Downloads 1212
Data volume 137.8 MB137.8 MB
Unique views 5353
Unique downloads 99


Cite as