1) Get a list of species you want to add in 'species_list.txt' Add the new species in the previous list in 'species_names.${Group_name}.txt' You shouldn't have undercores, only spaces in species name. Be careful to not have an empty name in the end of the file. 2) Run ecoPCRSequences.sh with 'species_list.txt'. You will have a new file 'New_species.NCBI.COI.fasta' # ensure everything in database is extracted; extra sequences 3) Extract sequences from Bold and Genbank with 'species_list.txt' in a file called 'Raw_sequences'. 4) Align the sequences from each species from 'Raw_Sequences' with 'New_species.NCBI.COI.fasta' for the same species. Extract the alignment only (we want to have only sequences with the same length). Can be done with 'Blast two sequences'. Threshold of the alignment : 100% of coverage or at least 99%. Get all the alignments in a file called ' Aligned_Sequences' 5) Run obiuniq on each file to demultiplicate the sequences for each species. Remove the count with obiannotate -delete-tag=count. Get all the sequences in 'Aligned.uniq_Sequences' 6) Concatenate all the files you have in 'Aligned.uniq_Sequences' Add an outgroup with a sequences from in silico PCR (ecoPCR). --> The file name is 'Database.${Group_name}.fasta' All sequence specifications, which come from NCBI, ca be removed by text_removal.py. No underscore allowed, only spaces for this programm. 7) Replace spaces by underscore to have full names in the phylogenetic tree. Align all the file 'Database.${Group_name}.fasta with MUSCLE. Get a file 'Database.${Group_name}.afa'. Check the alignment. If it is good, run iqtree -s Database.${Group_name}.afa-m MFP. Check the tree and remove misannotated sequences. --> Get a file nammed 'Database.${Group_name}.clean.fasta'. 8) Add annotations to the database with the following scripts: Run python taxid_add.py species_names_${Group_name}.txt ${scriptdir}/Database.fasta --> It will give you 'Database.taxid.fasta' obiannotate -S forward_primer:${forward_primer} -S reverse_primer:${reverse_primer} ${scriptdir}/Database.taxid.fasta > ${scriptdir}/Database.taxid.primers.fasta obiannotate -S marker:${marker_name} ${scriptdir}/Database.taxid.primers.fasta > ${scriptdir}/Database.taxid.primers.marker.fasta --> The Database is now annotated. 9) Miseq reads tests can be done by OBITools pipeline or by Clustering with clustering.sh. You must have clusterFromPairs.py and ClusterAnalysis.py to run clustering.sh. Database.Reads is a concatenation of the database and the Miseq reads. To run obiconvert and ecotag, and clustering, you must have replaced all space by underscore, except the space before annotations (taxid...)