Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale

Thibaut Latrille; Nicolas Rodrigue; Nicolas Lartillot

doi:10.5281/zenodo.7543458

Published January 17, 2023 | Version v3

Journal article Open

Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale

1. Université de Lausanne
2. Department of Biology, Carleton University, Ottawa, Canada
3. Université de Lyon, CNRS, LBBE UMR 5558, Villeurbanne, France

Published in:
Proceedings of the National Academy of Sciences, Volume 120, Issue 11, March 2023, Pages e2214977120,
https://doi.org/10.1073/pnas.2214977120

This Zenodo repository contains the mammalian dataset that can be used with the AdaptaPop pipeline. Scripts and instructions necessary to reproduce the empirical experiments are detailed in https://github.com/ThibaultLatrille/AdaptaPop.

I. The archive file OrthoMam.zip must be extracted inside the folder OrthoMam. It contains the input data at the mammalian scale(alignments, trees, annotations) and the output data (estimation of ω and ω₀).

II. The archive file Polymorphism.zip must be extracted inside the folder Polymorphism, it contains the output data (vcf.gz and tsv.gz) for each population. Each vcf file contains SNPs for which is was possible to infer the ancestral and derived codon.

Once both OrthoMam.zip and Polymorphism.zip are extracted, it is possible to run the Snakemake inside the folder Contrasts that will contrast the rate of adaptation at the phylogenetic and population scale.

III. The archive file GeneTable.tsv is a tsv file containing ω_A^phy for each gene. The file contains the following columns:

ENSG is the gene ID on Ensembl shared by all species (in the file name of OrthoMam alignment)
ω_lower is the lower bound of the 95% posterior credible interval for ω.
ω is the posterior mean for ω.
ω_upper is the lower bound of the 95% posterior credible interval for ω.
ω0_lower is the lower bound of the 95% posterior credible interval for ω₀.
ω0 is the posterior mean for ω₀.
ω0_upper is the lower bound of the 95% posterior credible interval for ω₀.
ωA_phy is the posterior mean for ω_A^phy.
category is the classification of the gene (unclassified, nearly-neutral, adaptive).
TRID is the transcript ID of the gene, specific to the focal species (found in the .xml files of OrthoMam).

IV. The archive file MK_statistics.gz contains a tsv file for every population allowing to compute ω_A (McDonald & Kreitman) at the population level for each gene. Each tsv file contains the following columns:

ENSG is the gene ID on Ensembl shared by all species (in the file name of OrthoMam alignment).
NAME is the gene name shared by all species (in the file name of OrthoMam alignment).
TRID is the transcript ID of the gene, specific to the focal species (found in the .xml files of OrthoMam).
CHR is the chromosome on which the gene is located.
STRAND is the strand on which the gene is located (+ if the same as the reference genome, - otherwise).
L_non_syn is the number of non-synonymous sites on which the substitutions and polymorphisms are called.
D_non_syn is the number of non-synonymous substitutions (can be 0).
P_non_syn is the number of non-synonymous polymorphisms (can be 0).
L_syn is the number of synonymous sites on which the substitutions and polymorphisms are called.
D_syn is the number of synonymous substitutions (can be 0).
P_syn is the number of synonymous polymorphisms (can be 0).

From these columns, one can compute for a group of genes (by summing over D, L and P):

d_N is computed as D_non_syn / L_non_syn.
d_S is computed as D_syn / L_syn.
π_N is computed as P_non_syn / L_non_syn.
π_S is computed as P_syn / L_syn.
ω_A is computed as d_N/d_S - π_N/π_S.

Files

MK_statistics.zip

Files (2.6 GB)

Name	Size	Download all
GeneTable.tsv md5:d1ae84b4ca818dd6f1979252269d5be0	3.4 MB	Download
MK_statistics.zip md5:b9bd52777034a390944b718fb52d8a87	7.0 MB	Preview Download
OrthoMam.zip md5:65d69a868b24753976261936e01eac92	1.4 GB	Preview Download
Polymorphism.zip md5:e36df23aa69928592ae034de47f07d04	1.2 GB	Preview Download

Additional details

Agence Nationale de la Recherche
NeGA - Influence of effective population size on animal genome architecture ANR-20-CE02-0008
Agence Nationale de la Recherche
DaSiRe - Exploring the Dark Side of Recombination ANR-15-CE12-0010
Agence Nationale de la Recherche
HotRec - Origin of PRDM9-dependent meiotic hotspots: where, how and why recombine? ANR-19-CE12-0019

	All versions	This version
Views	393	235
Downloads	319	186
Data volume	286.2 GB	127.8 GB

Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale

Creators

Description

Files

MK_statistics.zip

Files (2.6 GB)

Additional details

Funding