Journal article Open Access

The Genomes of the Collaborative Cross

Srivastava, Anuj; Morgan, Andrew P.; Najarian, Maya L.; Sarsani, Vishal Kumar; Sigmon, John Sebastian; Shorter, John; Kashfeen, Anwica; McMullan, Rachel C.; Williams, Lucy H.; Guisti, Paola; Ferris, Martin T.; Sullivan, Patrick; Hock, Pablo; Miller, Darla; Bell, TImothy A.; McMillan, Leonard; Churchill, Gary A.; Pardo-Manuel de Villena, Fernando

The Collaborative Cross (CC) is a multiparent recombinant inbred strain mouse panel derived from eight founder inbred strains. A distinct advantage of recombinant inbred panels is that detailed characterization of their genomes does not need to be performed by each user. Until now the CC genomes were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource had the advantage that it captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes and resulted in uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30X coverage of a single male per strain. Sequencing results in a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples have significant reduction in the genome-wide haplotype frequencies of two of the wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by genetic drift on the CC genomes. The CC strains represent an extreme example of the principle that genetic drift is expected to have maximum impact in populations with small effective size and high level of inbreeding. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 per Gb per generation. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Overall, genetic drift has increased the number of variants segregating among CC strains by more than 2%. Approximately one third of the CC inbred strains have acquired large deletions (>10kb) many of which overlap known coding genes and functional elements. In conclusion we provide a critical resource to users of the CC increase threefold the number of mouse inbred strain genomes available publicly and provide a striking example of the effect of genetic drift on common resources.

Files (103.0 MB)
Name Size
genotypes.zip
md5:5d3cbfdf441cf3ccbeba35b0eb6ee9a9
6.0 MB Download
GigaHaps.zip
md5:053167e8dc0b27dcf727109302e9b724
112.4 kB Download
MRCAsHaps.zip
md5:f06e0e932e540f5a6c8ff7448a77ce96
128.1 kB Download
Prob36.zip
md5:3b8eb7cad921135eb8e87cd4bffd6f19
79.8 MB Download
SeqHaps.zip
md5:f5b70e38a43444ae837b458956a81847
116.3 kB Download
SupplementalData.zip
md5:450e3475bab6f0650edab30e935bec81
16.9 MB Download
SupplementalDataList.docx
md5:b1d610405f502ae1745ee17b7aa8a46e
78.2 kB Download
53
12
views
downloads
All versions This version
Views 5353
Downloads 1212
Data volume 137.8 MB137.8 MB
Unique views 5353
Unique downloads 99

Share

Cite as