
# Evolvability and divergence in contemporary and fossil species

This data repository contains the data underlying the article: 
  
	Holstad, A., Voje, K. L., Opedal, Ø. H., Bolstad, G. H., Bourg, S., Hansen, T. F. and Pélabon, C. (2023) 
	Evolvability predicts evolutionary divergence in extant and extinct species. [INSERT JOURNAL]

#### Contact information on corresponding author:
* Name: Agnes Holstad
* Affiliation: Department of Biology, Centre for Biodiversity Dynamics, Norwegian University of Science and Technology; Trondheim, Norway
* ORCID ID: https://orcid.org/0000-0003-3154-1857
* Email: agnes.holstad@ntnu.no
* Alternate Email: agnes.holstad@gmail.com

#### Co-author ORCID IDs:
* Kjetil L. Voje: 
* Øystein H. Opedal: https://orcid.org/0000-0002-7841-6933
* Geir H. Bolstad: https://orcid.org/0000-0003-1356-8239
* Salomé Bourg: 
* Thomas F. Hansen:
* Cristophe Pélabon: https://orcid.org/0000-0002-8630-8983


## Details on this README file
* File format: .md
* Author: Agnes Holstad
* Date created: 10.08.2023

## Description of the data and file structure

The results in the article stems from two separate meta datasets, both gathered from studies in the primary scientific literature. One meta dataset contains contemporary populations and species and the other is comprised of fossil time series. 

### The contemporary data

This data comprises traits on a ratio scale with requirements of having at least two populations (or species) means and one genetic variation estimate.

#### Details for: contemporary_data.txt
* Contributors: Øystein Opedal and Agnes Holstad
* Format: .txt, tab delimited 
* Size: 1 MB
* Dimensions: 2696 rows x 43 columns
* Missing data codes: NA
* Variables:
	* studyID: Unique identifier for all traits from the same study
 	* trait: Trait name as it is given in original study
	* trait_UUID: Universal Unique identifier for traits measured with the same method by the same group. I.e., divergence is estimated on all pop/sp with the same trait_UUID
	* trait.type: The type of trait, e.g., morphological, physiological life history
	* measure: The measurement as described in the original study
	* unit: Units the trait is measured in
	* dimension: Trait dimension or type of scale. E.g. linear, area, mass/volume, count, growth rate, ratio
	* transformation: If the trait values are transformed prior to estimation of Va (genetic variance). E.g. log_base, sqrt, Z, mean_centering, mean_std
	* n.fam: Number of families in the genetic analysis
	* n.genetic: Number of individuals in the genetic analysis
	* n.pheno: Sample size for the phenotypic data
	* h2: Heritability
	* se.h2: standard error of h2
	* trait.mean: Phenotypic trait mean
	* se: Standard error of trait mean
	* vp: Phenotypic variance
	* se.vp: Standard error of Phenotypic variance
	* va: Genetic variance
	* se.va: Standard error of genetic variance 
	* estim_method: Estimation method of genetic variance, REML/ML/LS/potsmean/postmode
	* ve: Environmental variance
	* se.ve: Standrad error of environmental variance 
	* cva: Genetic coefficient of variance 
	* se.cva: Standard error of cva
	* evol: Evolvability, mean standardised or proportional genetic variance 
	* se.evol: Standard error of evolvability
	* x100: If cva and evolvability is multiplied by 100, Y/N
	* only_sp: If the data is only for species (Y/N/B) (only species data/only population data/both)
	* kingdom
	* phylum
	* taxon
	* order
	* family
	* genus
	* species: Written as Genus_species
	* population: Name of the population
	* sex: Female/Male/both
	* reference: FirstAuthor_year
	* journal 
	* vol
	* year: In format YYYY
	* DOI
	* notes  


### The fossil data

The fossil data was retrieved from the database curated by Kjetil L. Voje:

	K. L. Voje, Phenotypic Evolution Time Series (PETS) Database, version 1.0 (2023). pets.uio.no [not published yet]

The fossil data is comprised of time series that follow one lineage through time, and the samples can be considered as populations sampled from the same lineage through time. We required one or more traits to be measured, with a minimum of two time steps. The trait was also required to be on ratio scale.

The fossil data consists of 3 files: 
* fossil_data_consecutive.txt: The data underlying the analyses using evolvability to predict the morphological distance to the consecutive sample throughout the time series. 
* fossil_data_sum.txt: The data underlying the analyses that uses the average evolvability of the time series to predict the total variance of sample means in the time series. 
* fossil_meta_data.txt: Giving the meta data of the study and time series, linked to the other files by study ID (stID) and time series ID (tsID).


#### Details for: fossil_data_consecutive.txt
* Contributors: Kjetil L. Voje and Agnes Holstad
* Format: .txt, tab delimited 
* Size: 3.5 MB
* Dimensions: 10594 rows x 21 columns
* Missing data codes: NA
* Variables:
	* stID: The study ID, that is linked to the "stID" column in the "fossil_meta_data.txt" file.
	* tsID: The time series ID, that is linked to the "tsID" column in the "fossil_meta_data.txt" file.
	* trait.mean: The natural log of the trait mean of the sample.
	* evol.raw: The raw sample variance estimated on a proportional scale, i.e. as var(ln(x)) or var(x)/x^2.
	* diff: The distance the trait mean of the consecutive sample.
	* abs.diff: The absolute distance to the trait mean of the consecutive sample.
	* times: The time in million years to the consecutive sample.
	* sample.size: The number of individuals in the sample.
	* max.duration: The maximum possible duration the samples could span, estimated as the total elapsed time of the time series divided by the number of samples. 
	* distance.to.optimum: Distance of the trait mean to the estimated stationary optimum of the time series. 
	* taxa
	* species: Written as Genus_species
	* trait.type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
	* ou.var: Estimated microevolution of the trait mean during the accumulation of a fossil sample, assuming the fossil sample spans 50% of the maximum duration (max.duration). 
	* ou.var0.1: Estimated microevolution of the trait mean during the accumulation of a fossil sample, assuming the fossil sample spans 10% of the maximum duration.
	* ou.var0.01: Estimated microevolution of the trait mean during the accumulation of a fossil sample, assuming the fossil sample spans 1% of the maximum duration.
	* evol: 36% of the proportional sample variance (evol.raw) as an approximation for evolvability. 
	* ou.evol: The evolvability (evol) corrected for microevolution with the assuming the accumulation of the fossil sample spans 50% of max duration.
	* ou.evol0.1: The evolvability (evol) corrected for microevolution with the assuming the accumulation of the fossil sample spans 10% of max duration.
	* ou.evol0.01: The evolvability (evol) corrected for microevolution with the assuming the accumulation of the fossil sample spans 1% of max duration.
	* darwins: The rate of evolution estimated in Darwins, as the absolute distance divided by the time elapsed to the consecutive sample (abs.diff/times). 


#### Details for: fossil_meta_data.txt
* Contributors: Kjetil L. Voje and Agnes Holstad
* Format: .txt, tab delimited 
* Size: 366 KB
* Dimensions: 589 rows x 28 columns
* Missing data codes: NA
* Variables:
	* stID: The study ID
	* tsID: The time series ID
	* popID: The population ID
	* description: Description of the trait measure as given in the original study
	* citation
	* URL: DOI of the study
	* total_N: Total sample size of all samples in the time series
	* steps: Number of steps in the time series
	* interval_MY: Time interval of the entire time series in millions of years
	* trait_type: Type of trait dimension or type of scale, e.g., linear, area, count, ratio, percent.
	* taxa
	* species: Written as Genus_species
	* microfossil: If microfossil (yes/no)
	* sampling: What sampling type is used for collecting samples, e.g. geological fieldwork, sediment core
	* age_model: What model is used for aging the samples
	* sediment: Type of sediment
	* environment: type of environment
	* period_start
	* period_end
	* epoch_start
	* epoch_end
	* age_start
	* age_end
	* source
	* publication_year
	* lat
	* lon



#### Details for: fossil_data_sum.txt
* Contributors: Kjetil L. Voje and Agnes Holstad
* Format: .txt, tab delimited 
* Size: 366 KB
* Dimensions: 589 rows x 8 columns
* Missing data codes: NA
* Variables:
	* stID: The study ID
	* tsID: Time series ID
	* div: Divergence among all fossil samples in the time series, estimated as the variance of the natural log trait means.
	* raw.mean.evol: The average raw sample variance within a time series weighted on sample size. Estimated on a proportional scale.
	* evol: 36% of the proportional sample variance (raw.mean.evol) as an approximation for evolvability. 
	* stationary.var: The stationary variance estimated from the Ornstein-Uhlenbeck process fitted to the time series with a stationary optimum. 
	* alpha: Rate of adaptation towards the optimum estimated from the Ornstein-Uhlenbeck process fitted to the time series with a stationary optimum. 
	* n.steps: Number of samples in the time series. 


