# ========================================= # # Get OTU Main Sequence # # ========================================= # This program treats files generated by mothur (http://www.mothur.org/wiki/454_SOP or http://www.mothur.org/wiki/MiSeq_SOP). The aim of this programm is to return a fasta file listing the highly representative sequence of a given OTU for each samples. --------------------------- Included Files and Folders --------------------------- - get_otuMainSeq.pl: The program to launch - lib: The folder containg the homemade libraries --> File: The folder containing the libraries to handle files --> Fasta.pm: The librarie to handle fasta files --------------------------- Command Line Arguments --------------------------- Example: ./get_otuMainSeq.pl input.list input.fasta input.names input.groups label otu Input: - input.list: List file generated by mothur containing the list sequences for each label (file obtain after cluster.split) - input.fasta: Fasta file generated by mothur - input.names: Names file generated by mothur - input.groups: Groups File generated by mothur - label: The chosen label (distance cutoff used to form OTU) - otu: The otu selected for analysis !!! Do not write the 0 at the begining of otu number (for otu "00014" enter "14") --------------------------- Output --------------------------- - A fasta file with, for each sample, the higly representative sequence for the selected OTU Each sequences are named according to this nomenclature: >[sample name]_[Highly representative sequence name]_MO=[XXX]_S=[YYY]_US=[ZZZ] Here is an example (see the test file below) - BB10189bis: The name of the sample - M00842_94_000000000-AACJM_1_1103_9388_15633: The ID of the unique sequence find as the highly representative one for the sample BB10189bis - MS (17): The number of copies of the Main Sequence of the OTU into the ongoing sample - S (22): The number of Sequences associated to the ongoing sample (all sequences regardless of the OTU) - US (5): The number of Unique Sequences associated to the OTU into the ongoing sample --- mainsequence_otu[XXX].fasta >BB10189bis_M00842_94_000000000-AACJM_1_1103_9388_15633_MO=17_S=22_US=5 GAC--GG-AG-GGG--GCT-A-G--C-G--T-T--GT-T-CGG-AA--TT-A-C-T--GG -GC---GT--A---AA-GG-GC-GC---G-TA-G-G-C-G--G--T-TT-A-A-T----- -AA----G-T-T-A--G-G-A--G--TG--A-AA-TC--C-C-AG-G-G---CT-T-AA- ---C-C-C-T-G-G-A--A-C--T-G--C-T--T--C--T--AA-A-A--C-T--G-T-- TG--G-A-C-T-A-G-A-G-T-G---T-GG---TA-G-G-----G-G-A-T---GA-T-- >BB10026bis_M00842_94_000000000-AACJM_1_1103_9388_15633_MO=90_S=96_US=5 GAC--GG-AG-GGG--GCT-A-G--C-G--T-T--GT-T-CGG-AA--TT-A-C-T--GG -GC---GT--A---AA-GG-GC-GC---G-TA-G-G-C-G--G--T-TT-A-A-T----- -AA----G-T-T-A--G-G-A--G--TG--A-AA-TC--C-C-AG-G-G---CT-T-AA- ---C-C-C-T-G-G-A--A-C--T-G--C-T--T--C--T--AA-A-A--C-T--G-T-- TG--G-A-C-T-A-G-A-G-T-G---T-GG---TA-G-G-----G-G-A-T---GA-T-- >BB10189_M00842_94_000000000-AACJM_1_1103_9388_15633_MO=7_S=7_US=1 GAC--GG-AG-GGG--GCT-A-G--C-G--T-T--GT-T-CGG-AA--TT-A-C-T--GG -GC---GT--A---AA-GG-GC-GC---G-TA-G-G-C-G--G--T-TT-A-A-T----- -AA----G-T-T-A--G-G-A--G--TG--A-AA-TC--C-C-AG-G-G---CT-T-AA- ---C-C-C-T-G-G-A--A-C--T-G--C-T--T--C--T--AA-A-A--C-T--G-T-- TG--G-A-C-T-A-G-A-G-T-G---T-GG---TA-G-G-----G-G-A-T---GA-T-- ---