The Genomes of the Collaborative Cross
Creators
- Srivastava, Anuj1
- Morgan, Andrew P.2
- Najarian, Maya L.2
- Sarsani, Vishal Kumar1
- Sigmon, John Sebastian2
- Shorter, John2
- Kashfeen, Anwica2
- McMullan, Rachel C.2
- Williams, Lucy H.2
- Guisti, Paola2
- Ferris, Martin T.2
- Sullivan, Patrick2
- Hock, Pablo2
- Miller, Darla2
- Bell, TImothy A.2
- McMillan, Leonard2
- Churchill, Gary A.1
- Pardo-Manuel de Villena, Fernando2
- 1. The Jackson Laboratory
- 2. University of North Carolina
Description
The Collaborative Cross (CC) is a multiparent recombinant inbred strain mouse panel derived from eight founder inbred strains. A distinct advantage of recombinant inbred panels is that detailed characterization of their genomes does not need to be performed by each user. Until now the CC genomes were haplotype reconstructions based on dense genotyping of the most recent common ancestors (MRCAs) of each strain followed by imputation from the genome sequence of the corresponding founder inbred strain. The MRCA resource had the advantage that it captured segregating regions in strains that were not fully inbred, but it had limited resolution in the transition regions between founder haplotypes and resulted in uncertainty about founder assignment in regions of limited diversity. Here we report the whole genome sequence of 69 CC strains generated by paired-end short reads at 30X coverage of a single male per strain. Sequencing results in a substantial improvement in the fine structure and completeness of the genomes of the CC. Both MRCAs and sequenced samples have significant reduction in the genome-wide haplotype frequencies of two of the wild-derived strains, CAST/EiJ and PWK/PhJ. In addition, analysis of the evolution of the patterns of heterozygosity indicates that selection against three wild-derived founder strains played a significant role in shaping the genomes of the CC. The sequencing resource provides the first description of tens of thousands of new genetic variants introduced by genetic drift on the CC genomes. The CC strains represent an extreme example of the principle that genetic drift is expected to have maximum impact in populations with small effective size and high level of inbreeding. We estimate that new SNP mutations are accumulating in each CC strain at a rate of 2.4 per Gb per generation. The majority of these mutations are novel compared to currently sequenced laboratory stocks and wild mice, and some are predicted to alter gene function. Overall, genetic drift has increased the number of variants segregating among CC strains by more than 2%. Approximately one third of the CC inbred strains have acquired large deletions (>10kb) many of which overlap known coding genes and functional elements. In conclusion we provide a critical resource to users of the CC increase threefold the number of mouse inbred strain genomes available publicly and provide a striking example of the effect of genetic drift on common resources.
Files
genotypes.zip
Files
(103.0 MB)
Name | Size | Download all |
---|---|---|
md5:5d3cbfdf441cf3ccbeba35b0eb6ee9a9
|
6.0 MB | Preview Download |
md5:053167e8dc0b27dcf727109302e9b724
|
112.4 kB | Preview Download |
md5:f06e0e932e540f5a6c8ff7448a77ce96
|
128.1 kB | Preview Download |
md5:3b8eb7cad921135eb8e87cd4bffd6f19
|
79.8 MB | Preview Download |
md5:f5b70e38a43444ae837b458956a81847
|
116.3 kB | Preview Download |
md5:450e3475bab6f0650edab30e935bec81
|
16.9 MB | Preview Download |
md5:b1d610405f502ae1745ee17b7aa8a46e
|
78.2 kB | Download |