# Commentary for data provided for the phylogenetic analysis

Workflow to generate a phylogenetic tree for 20 Thermococcus genomes (including 3 non-Thermococcus genomes as an outgroup).

The file workflow_UndinMarkers_v2.md describes the workflow used to generate the phylogenetic trees.

Provided files/folder:

- **0_required files**

Contains: (i) filelists (i.e. marker genes used, proteins removed from alignments), (ii) custom perl and python scripts used to generate the phylogenetic tree and the protein files for the genomes investigated (iii) . If scripts were taken from online/other resources that is indicated within the script itself. 

- **1_hmmer_results**

Results for the hmmsearch (all genomes of interest vs database with tigr + pfam hmm profiles). This file was used to subselect the ~~45 markers used for the phylogenetic analysis. The hmmer profiles are available here: https://zenodo.org/record/3839790#.YsPgSHBBxcA 

- **2_raw_protein_sequences**

Raw (unaligned, not trimmed) protein sequences. Notice, any duplicated protein sequences were removed already. Proteins that were removed are listed in: 0_required_files/FileLists/proteins_to_remove

- **3_alignments**

Aligned (with mafft-linsi) proteins files

- **4_trimmed_alns**

Trimmed (with BMGE) protein files

- **5_trees**

Contains the concatenated alignment and output files from IQ-TREE. Among others includes a maximum likelihood tree with the original tip labels and taxonomically more informative tip labels (*renamed)

