This Zenodo folder contains the following data:
-----------------------------------------------
Written by Eveline Pinseel, August 2023.

This folder contains all the scripts and details used to run the machine learning script.
The machine learning script was used to estimate the number of cells per chain from image data.
This ensured that an approximately equal number of cells were pooled for each locality.
All scripts in this folder were written by Teofil Nakov.

Briefly, the process worked as follows:

# STEP 1:
Pass each culture through a Benchtop B3 series FlowCAM cytometer (Fluid Imaging Technologies).

# STEP 2:
Manually code (i.e., numbers of cells per chain) for the first 1-2K cells from 1-2 samples per population.

# STEP 3:
Use all the variables in the dataset to train a random forest model.
Here we use a subset of the manually called data (number of cells in chain) to train the model and the remainder to check its performance.
This step saves a trained random forest model in a directory named 'models' which is then used to predict all samples from a population.
The output will be a model per population, not per culture.

# STEP 4:
Use the trained model to predict the number of cells per colony counts in all other images from all samples.
The final output is a table that has the amount of culture needed to generate pools with 40 or 80 million cells / pool.

# STEP 5:
Based on the final concentration of cells, derive a dilution factor, or calculate the volume needed to create a equal-cells-per-culture pool.

Note that all steps were performed individually for each locality.

This folder contains the scripts needed to run steps 3, 4, and 5 - as outlined by the names of their respective folders. Only the script foor pool A was included, but it is likewise for all other pools.