There is a newer version of the record available.

Published December 12, 2024 | Version v1
Dataset Open

Identifying organismal models of human biology de novo

Creators

  • 1. ROR icon Arcadia Science

Contributors

  • 1. ROR icon Arcadia Science

Description

Contained are data associated with the publication "Identifying organismal models of human biology de novo". All data needed to replicate the analyses in the publication are provided, including input data/run configurations needed for performing phylogenetic inference via NovelTree, calculating molecular conservation for all human genes, and exploratory analyses.

Directories and files included:

  • run_configurations/noveltree-model-euks-samplesheet.csv - the samplesheet for our snakemake preprocessing workflow to filter and preprocess species proteomes prior to analysis with NovelTree.
  • run_configurations/euk_preprocess_samplesheet.tsv & run_configurations/noveltree-model-euks-parameterfile.json - the NovelTree sample and parameter files used to run NovelTree.
  • preprocessed_proteomes.tar.gz - a compressed tarball containing the preprocessed proteomes used by our NovelTree run.
  • results-noveltree-model-euks.tar.gz - a compressed tarball containing all outputs generated by our NovelTree run.
  • aa-summary-stats.tar.gz - a compressed tarball containing all AA summary statistics generated by code/genefam_aa_summaries.py.
  • gf-aa-multivar-distances.tar.gz - a compressed tarball containing all result files produced by code/calc_protein_mv_distances.R.
  • organismal_selection_tool_citations.csv- source citations describing available genetic perturbations for organisms in our portfolio.

Files

Files (11.4 GB)

Name Size Download all
md5:58e53f0ff7ffd3b2a162bc8155c5d976
11.4 GB Download