This folder contains the supplementary data used in the metapointfinder manuscript.
As the entire analysis takes up well over 10 terabyte, we opted to script the downloads and the runs

Steps to reproduce the data for the metapointfinder.sh

Install metapointfinder and dependencies from https://github.com/aldertzomer/metapointfinder according to the instructions

Dependencies required to run all code

Metapointfinder (see https://github.com/aldertzomer/metapointfinder)
Python 3.11
R
Biostrings
pwalign
parallel
wget
diamond
KMA

Benchmark:
EMBOSS backtranseq https://emboss.sourceforge.net/
wgsim https://github.com/lh3/wgsim

Heatmap Munk et al dataset
pheatmap https://cran.r-project.org/web/packages/pheatmap/index.html
tidyr https://cran.r-project.org/web/packages/tidyr/index.html
RColorBrewer https://cran.r-project.org/web/packages/RColorBrewer/index.html

Figure 1 - Benchmark
1. Run the benchmark scripts in the benchmark folder containing a copy of the metapointfinder database
2. The final_scores.csv and final_scores2.csv file was used for Figure 1

Other figures and tables from external datasets
1. Run download.sh. This will get all readfiles from ENA SRA. This might take up to a week.
2. Go into each folder and run the metapointfinder run script run_metapointfinder.sh. This will take several weeks to finish.
3. The .class.prot.summary.txt and class.dna.summary.txt files per readfile were used for Table 1, Figure 2 and Figure 3
4. The gene.prot.summary.txt files per readfile were used for Table 2 and Table 3
5. the additional .sh and R scripts in the Munk et al folder were used to generate the heatmaps

The included excel file contains the readcounts for the metapointfinder runs

