- get run from basespace - download to illumina runs as 08-27-2013-sbrady-A4HJW - setup folders - setup illumiprocessor.conf for i in *; do python ~/git/phyluce/bin/assembly/get_fastq_lengths.py $i/split-adapter-quality-trimmed/ --csv; done All files in dir with acordulecera-pellucida-READ-singleton.fastq.gz,408901,86125439,210.62662845,0.0839188344788,40,251,250.0 All files in dir with andrena-asteris-READ2.fastq.gz,83975,15540798,185.064578744,0.222636809273,40,251,199.0 All files in dir with andrena-sp-READ1.fastq.gz,410453,84592764,206.096103573,0.0842763017632,40,251,232.0 All files in dir with aphaenogaster-sp-READ1.fastq.gz,303484,63159917,208.116134623,0.0969059135826,40,251,249.0 All files in dir with aporus-niger-READ-singleton.fastq.gz,398607,74624652,187.213601367,0.0970336767778,40,251,195.0 All files in dir with bombus-pensylvanicus-READ2.fastq.gz,301910,63481685,210.266917293,0.093298154155,40,251,250.0 All files in dir with chalybion-californicus-READ2.fastq.gz,654184,117929426,180.269505216,0.076667083006,40,251,183.0 All files in dir with chyphotes-mellipes-READ-singleton.fastq.gz,1664263,322103690,193.54133932,0.0458169238927,40,251,208.0 All files in dir with evaniella-semaeoda-READ-singleton.fastq.gz,414086,78104105,188.618076921,0.0975166839375,40,251,202.0 All files in dir with metapolybia-cingulata-READ-singleton.fastq.gz,719460,142797714,198.479017597,0.0685605561469,40,251,220.0 All files in dir with mischocyttarus-flavitarsis-READ-singleton.fastq.gz,307969,61394499,199.352853696,0.105283277093,40,251,223.0 All files in dir with nasonia-sp-READ1.fastq.gz,528367,99597773,188.501123272,0.0823414196746,40,251,199.0 All files in dir with nematus-tibialis-READ1.fastq.gz,703569,135792550,193.005305805,0.0735502296225,40,251,213.0 All files in dir with orthogonalys-pulchella-READ-singleton.fastq.gz,1822967,354456435,194.439304167,0.0448340474334,40,251,214.0 All files in dir with sapyga-pumila-READ1.fastq.gz,1732085,311775579,180.000161078,0.0463617141704,40,251,180.0 All files in dir with scolia-verticalis-READ1.fastq.gz,907356,178554253,196.78522322,0.0636452187035,40,251,221.0 All files in dir with sericomyrmex-sp-READ2.fastq.gz,327399,64865315,198.123131103,0.098417873664,40,251,214.0 All files in dir with stenamma-sp-READ-singleton.fastq.gz,801435,169627489,211.654705622,0.0579792590642,40,251,250.0 All files in dir with taxonus-pallidicornis-READ2.fastq.gz,577999,119464673,206.686643056,0.0785258930299,40,251,250.0 - assemble with trinity (had to pull down metapylobia against from basespace, clean, and assemble at diff time than other samples):: python ~/git/phyluce/bin/assembly/assemblo_trinity.py \ --conf trinity-assembly-conf/08-29-2013-sbrady2-assemblies.conf \ --output hymenoptera-trinity-contigs2 \ --subfolder 'split-adapter-quality-trimmed' \ --cores 12 - after running through the entire set of steps in sbrady-hymenoptera-round2.rst, and copying over the data from the earlier ant runs, created a `contigs-proper` folder that contains all of the Trinity.fasta and trinity.log files from the earlier assemblies. Using those files, I ran the coverage calculator code against the assemblies to compute coverage and trim contigs to 3x ends and 5x mean coverage, placing the resulting symlinks to trimmed contigs in the `contigs-trimmed` folder within `contigs-proper`:: python ~/git/phyluce/bin/assembly/get_trinity_coverage.py \ --assemblies contigs-proper \ --assemblo-config trinity-proper-contigs.conf \ --subfolder split-adapter-quality-trimmed \ --cores 12 \ -bwa-mem - the output from the above is now stored in the `log` directory within /nfs/data1/working/sbrady-hymenoptera python ~/git/phyluce/bin/assembly/get_trinity_coverage_for_uce_loci.py \ --assemblies ../../contigs-proper \ --match-count-output trinity-kmer1-WITH-SAWFLIES-incomplete.conf \ --type untrimmed \ --locus-db ../../untrimmed-lastz/probe.matches.sqlite \ --output contig-coverage --log-path log MOVE ==== Move all trinity kmer1 files into trinity-kmer1. Move "lastz" from above to trimmed-lastz New as of 3/26/2014 ==================== - match trimmed-contigs to probes (ALL output IN LOG):: python ~/git/phyluce/bin/assembly/match_contigs_to_probes.py \ --contigs contigs-proper/contigs-trimmed \ --probes /nfs/data1/working/bfaircloth-hymenoptera-genome/hymenoptera-uce-probes.fasta \ --output trimmed-lastz \ --dupefile /nfs/data1/working/bfaircloth-hymenoptera-genome/hymenoptera-uce-probes.fasta.toself.lastz \ --log-path trimmed-log 2014-03-26 17:43:02,804 - match_contigs_to_probes - INFO - acordulecera_pellucida: 314 (10.81%) uniques of 2905 contigs, 74 dupe probe matches, 84 UCE loci removed for matching multiple contigs, 37 contigs removed for matching multiple UCE loci 2014-03-26 17:43:05,033 - match_contigs_to_probes - INFO - andrena_asteris: 646 (48.98%) uniques of 1319 contigs, 93 dupe probe matches, 15 UCE loci removed for matching multiple contigs, 29 contigs removed for matching multiple UCE loci 2014-03-26 17:43:08,201 - match_contigs_to_probes - INFO - andrena_sp: 723 (22.28%) uniques of 3245 contigs, 95 dupe probe matches, 22 UCE loci removed for matching multiple contigs, 51 contigs removed for matching multiple UCE loci 2014-03-26 17:43:14,516 - match_contigs_to_probes - INFO - aphaenogaster_albisetosa: 742 (2.76%) uniques of 26836 contigs, 93 dupe probe matches, 34 UCE loci removed for matching multiple contigs, 48 contigs removed for matching multiple UCE loci 2014-03-26 17:43:17,654 - match_contigs_to_probes - INFO - aphaenogaster_fulva: 660 (20.08%) uniques of 3287 contigs, 91 dupe probe matches, 27 UCE loci removed for matching multiple contigs, 43 contigs removed for matching multiple UCE loci 2014-03-26 17:43:23,256 - match_contigs_to_probes - INFO - aphaenogaster_megommata: 734 (3.97%) uniques of 18480 contigs, 93 dupe probe matches, 46 UCE loci removed for matching multiple contigs, 60 contigs removed for matching multiple UCE loci 2014-03-26 17:43:28,576 - match_contigs_to_probes - INFO - aphaenogaster_tennesseensis: 725 (4.59%) uniques of 15800 contigs, 93 dupe probe matches, 44 UCE loci removed for matching multiple contigs, 54 contigs removed for matching multiple UCE loci 2014-03-26 17:43:33,158 - match_contigs_to_probes - INFO - aphaenogaster_texana: 729 (6.85%) uniques of 10644 contigs, 93 dupe probe matches, 41 UCE loci removed for matching multiple contigs, 44 contigs removed for matching multiple UCE loci 2014-03-26 17:43:35,808 - match_contigs_to_probes - INFO - aporus_niger: 684 (32.28%) uniques of 2119 contigs, 95 dupe probe matches, 12 UCE loci removed for matching multiple contigs, 38 contigs removed for matching multiple UCE loci 2014-03-26 17:43:39,024 - match_contigs_to_probes - INFO - bombus_pensylvanicus: 742 (36.09%) uniques of 2056 contigs, 93 dupe probe matches, 21 UCE loci removed for matching multiple contigs, 42 contigs removed for matching multiple UCE loci 2014-03-26 17:43:42,433 - match_contigs_to_probes - INFO - chalybion_californicus: 755 (20.07%) uniques of 3761 contigs, 94 dupe probe matches, 24 UCE loci removed for matching multiple contigs, 40 contigs removed for matching multiple UCE loci 2014-03-26 17:43:47,649 - match_contigs_to_probes - INFO - chyphotes_mellipes: 761 (5.51%) uniques of 13808 contigs, 95 dupe probe matches, 19 UCE loci removed for matching multiple contigs, 55 contigs removed for matching multiple UCE loci 2014-03-26 17:43:50,582 - match_contigs_to_probes - INFO - evaniella_semaeoda: 614 (22.61%) uniques of 2716 contigs, 94 dupe probe matches, 22 UCE loci removed for matching multiple contigs, 44 contigs removed for matching multiple UCE loci 2014-03-26 17:43:55,177 - match_contigs_to_probes - INFO - messor_piceus: 709 (5.24%) uniques of 13522 contigs, 93 dupe probe matches, 58 UCE loci removed for matching multiple contigs, 51 contigs removed for matching multiple UCE loci 2014-03-26 17:43:58,499 - match_contigs_to_probes - INFO - metapolybia_cingulata: 660 (11.29%) uniques of 5844 contigs, 94 dupe probe matches, 15 UCE loci removed for matching multiple contigs, 54 contigs removed for matching multiple UCE loci 2014-03-26 17:44:01,317 - match_contigs_to_probes - INFO - mischocyttarus_flavitarsis: 600 (21.46%) uniques of 2796 contigs, 93 dupe probe matches, 23 UCE loci removed for matching multiple contigs, 49 contigs removed for matching multiple UCE loci 2014-03-26 17:44:05,939 - match_contigs_to_probes - INFO - nasonia_vitripennis: 1164 (30.15%) uniques of 3861 contigs, 96 dupe probe matches, 59 UCE loci removed for matching multiple contigs, 72 contigs removed for matching multiple UCE loci 2014-03-26 17:44:08,774 - match_contigs_to_probes - INFO - nematus_tibialis: 428 (7.89%) uniques of 5426 contigs, 80 dupe probe matches, 15 UCE loci removed for matching multiple contigs, 36 contigs removed for matching multiple UCE loci 2014-03-26 17:44:13,533 - match_contigs_to_probes - INFO - orthogonalys_pulchella: 695 (4.95%) uniques of 14050 contigs, 87 dupe probe matches, 11 UCE loci removed for matching multiple contigs, 57 contigs removed for matching multiple UCE loci 2014-03-26 17:44:19,464 - match_contigs_to_probes - INFO - pogonomyrmex_occidentalis: 722 (3.06%) uniques of 23574 contigs, 92 dupe probe matches, 34 UCE loci removed for matching multiple contigs, 55 contigs removed for matching multiple UCE loci 2014-03-26 17:44:25,154 - match_contigs_to_probes - INFO - sapyga_pumila: 715 (4.15%) uniques of 17209 contigs, 97 dupe probe matches, 59 UCE loci removed for matching multiple contigs, 65 contigs removed for matching multiple UCE loci 2014-03-26 17:44:28,779 - match_contigs_to_probes - INFO - scolia_verticalis: 751 (15.90%) uniques of 4724 contigs, 95 dupe probe matches, 16 UCE loci removed for matching multiple contigs, 65 contigs removed for matching multiple UCE loci 2014-03-26 17:44:31,788 - match_contigs_to_probes - INFO - sericomyrmex_sp: 696 (20.89%) uniques of 3332 contigs, 91 dupe probe matches, 11 UCE loci removed for matching multiple contigs, 45 contigs removed for matching multiple UCE loci 2014-03-26 17:44:37,051 - match_contigs_to_probes - INFO - stenamma_diecki: 729 (5.27%) uniques of 13828 contigs, 93 dupe probe matches, 35 UCE loci removed for matching multiple contigs, 54 contigs removed for matching multiple UCE loci 2014-03-26 17:44:42,525 - match_contigs_to_probes - INFO - stenamma_expolitum: 730 (4.39%) uniques of 16641 contigs, 91 dupe probe matches, 29 UCE loci removed for matching multiple contigs, 56 contigs removed for matching multiple UCE loci 2014-03-26 17:44:47,828 - match_contigs_to_probes - INFO - stenamma_felixi: 748 (3.89%) uniques of 19236 contigs, 93 dupe probe matches, 23 UCE loci removed for matching multiple contigs, 49 contigs removed for matching multiple UCE loci 2014-03-26 17:44:53,106 - match_contigs_to_probes - INFO - stenamma_impar: 715 (5.60%) uniques of 12765 contigs, 92 dupe probe matches, 61 UCE loci removed for matching multiple contigs, 51 contigs removed for matching multiple UCE loci 2014-03-26 17:44:56,981 - match_contigs_to_probes - INFO - stenamma_megamanni: 729 (14.58%) uniques of 4999 contigs, 92 dupe probe matches, 23 UCE loci removed for matching multiple contigs, 53 contigs removed for matching multiple UCE loci 2014-03-26 17:45:03,011 - match_contigs_to_probes - INFO - stenamma_megamanni2: 745 (3.30%) uniques of 22585 contigs, 94 dupe probe matches, 28 UCE loci removed for matching multiple contigs, 53 contigs removed for matching multiple UCE loci 2014-03-26 17:45:07,130 - match_contigs_to_probes - INFO - stenamma_muralla: 711 (7.37%) uniques of 9649 contigs, 93 dupe probe matches, 25 UCE loci removed for matching multiple contigs, 60 contigs removed for matching multiple UCE loci 2014-03-26 17:45:09,988 - match_contigs_to_probes - INFO - taxonus_pallidicornis: 435 (10.83%) uniques of 4018 contigs, 79 dupe probe matches, 54 UCE loci removed for matching multiple contigs, 41 contigs removed for matching multiple UCE loci - match untrimmed-contigs to probes (ALL output IN LOG):: python ~/git/phyluce/bin/assembly/match_contigs_to_probes.py \ --contigs contigs-proper/contigs \ --probes /nfs/data1/working/bfaircloth-hymenoptera-genome/hymenoptera-uce-probes.fasta \ --output untrimmed-lastz \ --dupefile /nfs/data1/working/bfaircloth-hymenoptera-genome/hymenoptera-uce-probes.fasta.toself.lastz \ --log-path untrimmed-log 2014-03-26 17:46:27,563 - match_contigs_to_probes - INFO - acordulecera_pellucida: 341 (1.14%) uniques of 30034 contigs, 76 dupe probe matches, 94 UCE loci removed for matching multiple contigs, 37 contigs removed for matching multiple UCE loci 2014-03-26 17:46:30,496 - match_contigs_to_probes - INFO - andrena_asteris: 740 (16.13%) uniques of 4588 contigs, 95 dupe probe matches, 16 UCE loci removed for matching multiple contigs, 31 contigs removed for matching multiple UCE loci 2014-03-26 17:46:37,349 - match_contigs_to_probes - INFO - andrena_sp: 774 (2.29%) uniques of 33763 contigs, 95 dupe probe matches, 31 UCE loci removed for matching multiple contigs, 53 contigs removed for matching multiple UCE loci 2014-03-26 17:47:02,828 - match_contigs_to_probes - INFO - aphaenogaster_albisetosa: 764 (0.48%) uniques of 157814 contigs, 93 dupe probe matches, 35 UCE loci removed for matching multiple contigs, 47 contigs removed for matching multiple UCE loci 2014-03-26 17:47:09,117 - match_contigs_to_probes - INFO - aphaenogaster_fulva: 722 (3.01%) uniques of 24005 contigs, 93 dupe probe matches, 35 UCE loci removed for matching multiple contigs, 46 contigs removed for matching multiple UCE loci 2014-03-26 17:47:29,623 - match_contigs_to_probes - INFO - aphaenogaster_megommata: 751 (0.64%) uniques of 117379 contigs, 93 dupe probe matches, 48 UCE loci removed for matching multiple contigs, 60 contigs removed for matching multiple UCE loci 2014-03-26 17:47:44,228 - match_contigs_to_probes - INFO - aphaenogaster_tennesseensis: 751 (0.98%) uniques of 76657 contigs, 93 dupe probe matches, 45 UCE loci removed for matching multiple contigs, 55 contigs removed for matching multiple UCE loci 2014-03-26 17:47:53,313 - match_contigs_to_probes - INFO - aphaenogaster_texana: 750 (1.52%) uniques of 49222 contigs, 93 dupe probe matches, 43 UCE loci removed for matching multiple contigs, 44 contigs removed for matching multiple UCE loci 2014-03-26 17:47:58,112 - match_contigs_to_probes - INFO - aporus_niger: 740 (4.47%) uniques of 16553 contigs, 95 dupe probe matches, 15 UCE loci removed for matching multiple contigs, 38 contigs removed for matching multiple UCE loci 2014-03-26 17:48:04,301 - match_contigs_to_probes - INFO - bombus_pensylvanicus: 780 (2.90%) uniques of 26878 contigs, 95 dupe probe matches, 24 UCE loci removed for matching multiple contigs, 43 contigs removed for matching multiple UCE loci 2014-03-26 17:48:13,248 - match_contigs_to_probes - INFO - chalybion_californicus: 778 (1.74%) uniques of 44728 contigs, 96 dupe probe matches, 25 UCE loci removed for matching multiple contigs, 40 contigs removed for matching multiple UCE loci 2014-03-26 17:48:31,699 - match_contigs_to_probes - INFO - chyphotes_mellipes: 774 (0.73%) uniques of 105478 contigs, 95 dupe probe matches, 24 UCE loci removed for matching multiple contigs, 55 contigs removed for matching multiple UCE loci 2014-03-26 17:48:36,501 - match_contigs_to_probes - INFO - evaniella_semaeoda: 638 (3.36%) uniques of 18981 contigs, 94 dupe probe matches, 28 UCE loci removed for matching multiple contigs, 45 contigs removed for matching multiple UCE loci 2014-03-26 17:48:51,474 - match_contigs_to_probes - INFO - messor_piceus: 730 (0.79%) uniques of 91859 contigs, 93 dupe probe matches, 63 UCE loci removed for matching multiple contigs, 51 contigs removed for matching multiple UCE loci 2014-03-26 17:49:03,480 - match_contigs_to_probes - INFO - metapolybia_cingulata: 685 (1.08%) uniques of 63300 contigs, 94 dupe probe matches, 22 UCE loci removed for matching multiple contigs, 54 contigs removed for matching multiple UCE loci 2014-03-26 17:49:08,429 - match_contigs_to_probes - INFO - mischocyttarus_flavitarsis: 634 (3.81%) uniques of 16625 contigs, 95 dupe probe matches, 26 UCE loci removed for matching multiple contigs, 49 contigs removed for matching multiple UCE loci 2014-03-26 17:49:15,847 - match_contigs_to_probes - INFO - nasonia_vitripennis: 1166 (4.29%) uniques of 27196 contigs, 96 dupe probe matches, 70 UCE loci removed for matching multiple contigs, 73 contigs removed for matching multiple UCE loci 2014-03-26 17:49:24,085 - match_contigs_to_probes - INFO - nematus_tibialis: 453 (0.93%) uniques of 48875 contigs, 82 dupe probe matches, 31 UCE loci removed for matching multiple contigs, 37 contigs removed for matching multiple UCE loci 2014-03-26 17:49:42,653 - match_contigs_to_probes - INFO - orthogonalys_pulchella: 706 (0.66%) uniques of 106247 contigs, 87 dupe probe matches, 20 UCE loci removed for matching multiple contigs, 58 contigs removed for matching multiple UCE loci 2014-03-26 17:50:08,078 - match_contigs_to_probes - INFO - pogonomyrmex_occidentalis: 741 (0.48%) uniques of 154515 contigs, 92 dupe probe matches, 35 UCE loci removed for matching multiple contigs, 55 contigs removed for matching multiple UCE loci 2014-03-26 17:50:26,684 - match_contigs_to_probes - INFO - sapyga_pumila: 720 (0.66%) uniques of 108991 contigs, 97 dupe probe matches, 62 UCE loci removed for matching multiple contigs, 64 contigs removed for matching multiple UCE loci 2014-03-26 17:50:37,100 - match_contigs_to_probes - INFO - scolia_verticalis: 760 (1.37%) uniques of 55546 contigs, 95 dupe probe matches, 19 UCE loci removed for matching multiple contigs, 66 contigs removed for matching multiple UCE loci 2014-03-26 17:50:43,204 - match_contigs_to_probes - INFO - sericomyrmex_sp: 744 (2.90%) uniques of 25699 contigs, 93 dupe probe matches, 15 UCE loci removed for matching multiple contigs, 45 contigs removed for matching multiple UCE loci 2014-03-26 17:51:01,350 - match_contigs_to_probes - INFO - stenamma_diecki: 751 (0.69%) uniques of 108643 contigs, 93 dupe probe matches, 35 UCE loci removed for matching multiple contigs, 54 contigs removed for matching multiple UCE loci 2014-03-26 17:51:23,194 - match_contigs_to_probes - INFO - stenamma_expolitum: 749 (0.55%) uniques of 135132 contigs, 93 dupe probe matches, 29 UCE loci removed for matching multiple contigs, 56 contigs removed for matching multiple UCE loci 2014-03-26 17:51:44,424 - match_contigs_to_probes - INFO - stenamma_felixi: 762 (0.55%) uniques of 138762 contigs, 93 dupe probe matches, 24 UCE loci removed for matching multiple contigs, 49 contigs removed for matching multiple UCE loci 2014-03-26 17:51:59,456 - match_contigs_to_probes - INFO - stenamma_impar: 741 (0.83%) uniques of 89582 contigs, 94 dupe probe matches, 61 UCE loci removed for matching multiple contigs, 51 contigs removed for matching multiple UCE loci 2014-03-26 17:52:13,420 - match_contigs_to_probes - INFO - stenamma_megamanni: 754 (0.96%) uniques of 78364 contigs, 94 dupe probe matches, 32 UCE loci removed for matching multiple contigs, 53 contigs removed for matching multiple UCE loci 2014-03-26 17:52:38,492 - match_contigs_to_probes - INFO - stenamma_megamanni2: 756 (0.51%) uniques of 147773 contigs, 94 dupe probe matches, 28 UCE loci removed for matching multiple contigs, 53 contigs removed for matching multiple UCE loci 2014-03-26 17:52:55,118 - match_contigs_to_probes - INFO - stenamma_muralla: 734 (0.72%) uniques of 102542 contigs, 93 dupe probe matches, 25 UCE loci removed for matching multiple contigs, 60 contigs removed for matching multiple UCE loci 2014-03-26 17:53:03,251 - match_contigs_to_probes - INFO - taxonus_pallidicornis: 459 (1.08%) uniques of 42508 contigs, 85 dupe probe matches, 67 UCE loci removed for matching multiple contigs, 41 contigs removed for matching multiple UCE loci ## incomplete untrimmed data with sawfly - create locus file:: python ~/git/phyluce/bin/assembly/get_match_counts.py \ --locus-db untrimmed-lastz/probe.matches.sqlite \ --taxon-list-config ../hymenoptera-data-sets.conf \ --taxon-group 'with sawflies' \ --output taxon-sets/trinity-kmer1-WITH-SAWFLIES-incomplete/trinity-kmer1-WITH-SAWFLIES-incomplete.conf \ --extend-locus-db ../in-silico/in-silico-lastz/probe.matches.sqlite \ --log-path untrimmed-log \ --incomplete-matrix 2014-05-18 14:08:41,964 - get_match_counts - INFO - There are 44 taxa in the taxon-group '[with sawflies]' in the config file hymenoptera-data-sets.conf 2014-05-18 14:08:41,964 - get_match_counts - INFO - Getting UCE names from database 2014-05-18 14:08:41,980 - get_match_counts - INFO - There are 1510 total UCE loci in the database 2014-05-18 14:08:42,587 - get_match_counts - INFO - Getting UCE matches by organism to generate a INCOMPLETE matrix 2014-05-18 14:08:42,593 - get_match_counts - INFO - There are 1367 UCE loci in an INCOMPLETE matrix 2014-05-18 14:08:42,594 - get_match_counts - INFO - Writing the taxa and loci in the data matrix to /nfs/data1/working/sbrady-hymenoptera/trinity-kmer1/taxon-sets/trinity-kmer1-WITH-SAWFLIES-incomplete/trinity-kmer1-WITH-SAWFLIES-incomplete.conf - get fastas:: python ~/git/phyluce/bin/assembly/get_fastas_from_match_counts.py \ --contigs ../../contigs-proper/contigs \ --locus-db ../../untrimmed-lastz/probe.matches.sqlite \ --match-count-output trinity-kmer1-WITH-SAWFLIES-incomplete.conf \ --output trinity-kmer1-WITH-SAWFLIES-incomplete.fasta \ --incomplete-matrix trinity-kmer1-WITH-SAWFLIES-incomplete.incomplete \ --extend-locus-db ../../../in-silico/in-silico-lastz/probe.matches.sqlite \ --extend-locus-contigs ../../../in-silico/outgroup-fasta \ --log-path log - explode fasta: python ~/git/phyluce/bin/assembly/explode_get_fastas_file.py --input trinity-kmer1-WITH-SAWFLIES-incomplete.fasta --output-dir exploded-by-taxon --by-taxon - get stats: $ for i in *; do python ~/git/phyluce/bin/assembly/get_fasta_lengths.py $i --csv; done acordulecera-pellucida.unaligned.fasta,341,349519,1024.98240469,19.6805381088,206,2504,1054.0,198 acrech3.unaligned.fasta,774,1641439,2120.72222222,1.64107495636,1602,2220,2119.0,774 andrena-asteris.unaligned.fasta,740,425271,574.690540541,6.95180825622,202,1447,558.0,16 andrena-sp.unaligned.fasta,774,663345,857.034883721,9.92594466126,208,2253,861.5,234 aphaenogaster-albisetosa.unaligned.fasta,764,862282,1128.64136126,20.8175281921,223,11435,1124.5,461 aphaenogaster-megommata.unaligned.fasta,751,889250,1184.08788282,16.1614299882,230,2776,1170.0,484 aphaenogaster-tennesseensis.unaligned.fasta,751,793300,1056.32490013,14.8320358838,210,2493,1051.0,412 aphaenogaster-texana.unaligned.fasta,750,693422,924.562666667,12.7708792404,207,2645,906.0,281 apimel4.unaligned.fasta,803,1691508,2106.4856787,3.35568004183,969,2207,2117.0,802 aporus-niger.unaligned.fasta,740,528399,714.052702703,9.09912325876,205,1981,717.0,71 attcep1.unaligned.fasta,748,1505660,2012.9144385,9.12777727046,1116,2217,2110.5,748 bombus-pensylvanicus.unaligned.fasta,780,670544,859.671794872,9.21964441958,206,2036,879.0,228 camflo1.unaligned.fasta,767,1613460,2103.59843546,3.29624810123,1227,2228,2117.0,767 cerbir1.unaligned.fasta,768,1619997,2109.37109375,3.22695304772,1285,2252,2118.0,768 cersol1.unaligned.fasta,897,1902588,2121.05685619,2.56869025349,1401,2286,2131.0,897 chalybion-californicus.unaligned.fasta,778,631930,812.249357326,10.4007277354,205,1968,809.0,201 chyphotes-mellipes.unaligned.fasta,774,916382,1183.95607235,12.9950918853,294,3189,1185.0,558 evaniella-semaeoda.unaligned.fasta,638,619918,971.65830721,11.5592801762,220,2229,978.5,301 harsal1.unaligned.fasta,763,1610091,2110.21100917,2.8231039794,1198,2435,2116.0,763 lasalb1.unaligned.fasta,779,1572834,2019.042362,8.96304373917,409,2702,2111.0,777 linhum1.unaligned.fasta,762,1614172,2118.33595801,2.25182161411,1077,2214,2118.0,762 messor-piceus.unaligned.fasta,730,811543,1111.70273973,15.6779153013,210,3730,1119.5,441 metapolybia-cingulata.unaligned.fasta,685,563953,823.289051095,12.8191024076,207,2103,801.0,211 mischocyttarus-flavitarsis.unaligned.fasta,634,450896,711.192429022,11.5790414575,203,2687,676.5,110 nasgir1.unaligned.fasta,1191,2430940,2041.09151973,5.81940164114,659,2201,2116.0,1187 naslon1.unaligned.fasta,1192,2429087,2037.82466443,5.85711597178,715,2194,2113.0,1188 nasonia-vitripennis.unaligned.fasta,1166,899101,771.098627787,7.72356781853,202,1672,763.0,237 nasvit2.unaligned.fasta,1214,2572965,2119.41103789,4.55110557377,747,2180,2177.0,1212 nematus-tibialis.unaligned.fasta,453,475444,1049.54525386,17.9261704225,209,3894,1070.0,265 orthogonalys-pulchella.unaligned.fasta,706,962959,1363.96458924,16.5447557137,205,2998,1352.0,569 pogbar3.unaligned.fasta,666,1353040,2031.59159159,8.5822017614,598,2232,2109.0,664 pogonomyrmex-occidentalis.unaligned.fasta,741,846554,1142.44804318,16.2465163301,231,3190,1124.0,457 sapyga-pumila.unaligned.fasta,720,753734,1046.85277778,13.3534507479,224,2743,1078.0,428 scolia-verticalis.unaligned.fasta,760,813497,1070.39078947,12.1807200646,286,2877,1078.0,461 sericomyrmex-sp.unaligned.fasta,744,606204,814.790322581,9.6862084239,205,2099,830.5,177 solinv1.unaligned.fasta,768,1580976,2058.5625,7.44936813072,734,2220,2116.0,763 stenamma-diecki.unaligned.fasta,751,857659,1142.02263648,15.0458293596,209,3188,1167.0,488 stenamma-expolitum.unaligned.fasta,749,907836,1212.06408545,15.6588913756,205,2690,1216.0,520 stenamma-felixi.unaligned.fasta,762,816726,1071.81889764,14.2206588316,209,3469,1056.0,433 stenamma-impar.unaligned.fasta,741,782478,1055.9757085,14.0790851862,229,2846,1046.0,428 stenamma-megamanni2.unaligned.fasta,756,932105,1232.94312169,19.9503093665,204,9956,1218.0,525 stenamma-megamanni.unaligned.fasta,754,858069,1138.02254642,14.4616670267,217,3227,1166.5,502 stenamma-muralla.unaligned.fasta,734,830910,1132.02997275,14.7715614547,221,3299,1113.0,480 taxonus-pallidicornis.unaligned.fasta,459,523674,1140.90196078,21.5710623627,205,3001,1173.0,282 - align fastas (many loci dropped - check logs):: python ~/git/phyluce/bin/align/seqcap_align_2.py \ --fasta trinity-kmer1-WITH-SAWFLIES-incomplete.fasta \ --output mafft-nexus-notrim \ --taxa 44 \ --incomplete-matrix \ --cores 12 \ --log-path log \ --no-trim - strip locus name python ~/git/phyluce/bin/align/remove_locus_name_from_nexus_lines.py --alignments mafft-nexus-notrim --output mafft-nexus-notrim-clean --cores 12 - trim with trimal: mkdir mafft-nexus-trimal && cd mafft-nexus-trimal for i in ../mafft-nexus-notrim-clean/*.nexus; do trimal -in $i -out ./$i:t:r.fasta -automated1 -fasta | tee trimal.out; done - convert: python ~/git/phyluce/bin/align/convert_one_align_to_another.py --alignments mafft-nexus-trimal --output mafft-nexus-trimal-nexus --input-format fasta --output-format nexus --cores 12 --log-path - get summary data:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-trimal-nexus \ --input-format nexus \ --cores 12 \ --log-path log 2014-05-18 15:05:15,542 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-05-18 15:05:15,542 - get_align_summary_data - INFO - [Alignments] loci: 1,330 2014-05-18 15:05:15,543 - get_align_summary_data - INFO - [Alignments] length: 1,338,895 2014-05-18 15:05:15,543 - get_align_summary_data - INFO - [Alignments] mean: 1006.69 2014-05-18 15:05:15,543 - get_align_summary_data - INFO - [Alignments] 95% CI: 37.39 2014-05-18 15:05:15,543 - get_align_summary_data - INFO - [Alignments] min: 76 2014-05-18 15:05:15,543 - get_align_summary_data - INFO - [Alignments] max: 3,841 2014-05-18 15:05:15,545 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-05-18 15:05:15,545 - get_align_summary_data - INFO - [Taxa] mean: 25.31 2014-05-18 15:05:15,545 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.77 2014-05-18 15:05:15,545 - get_align_summary_data - INFO - [Taxa] min: 3 2014-05-18 15:05:15,545 - get_align_summary_data - INFO - [Taxa] max: 44 2014-05-18 15:05:15,546 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-05-18 15:05:15,546 - get_align_summary_data - INFO - [Missing] mean: 0.00 2014-05-18 15:05:15,546 - get_align_summary_data - INFO - [Missing] 95% CI: 0.00 2014-05-18 15:05:15,547 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-05-18 15:05:15,547 - get_align_summary_data - INFO - [Missing] max: 0.00 2014-05-18 15:05:15,563 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-05-18 15:05:15,563 - get_align_summary_data - INFO - [All characters] 27,189,036 2014-05-18 15:05:15,563 - get_align_summary_data - INFO - [Nucleotides] 19,916,811 2014-05-18 15:05:15,574 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-05-18 15:05:15,574 - get_align_summary_data - INFO - [Matrix 50%] 784 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 55%] 751 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 60%] 718 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 65%] 687 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 70%] 658 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 75%] 600 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 80%] 554 alignments 2014-05-18 15:05:15,575 - get_align_summary_data - INFO - [Matrix 85%] 500 alignments 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Matrix 90%] 384 alignments 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Matrix 95%] 212 alignments 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - ------------------------ Character counts ----------------------- 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Characters] '-' is present 7,272,225 times 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Characters] 'A' is present 5,389,316 times 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Characters] 'C' is present 4,562,270 times 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Characters] 'G' is present 4,547,118 times 2014-05-18 15:05:15,576 - get_align_summary_data - INFO - [Characters] 'T' is present 5,418,107 times 2014-05-18 15:05:15,577 - get_align_summary_data - INFO - ================ Completed get_align_summary_data =============== - copy alignments having ≥ 33 taxa at minimum into nexus-min-33-taxa (75% complete):: python ~/git/phyluce/bin/align/get_only_loci_with_min_taxa.py \ --alignments mafft-nexus-trimal-nexus \ --taxa 44 \ --output mafft-nexus-min-33-taxa \ --percent 0.75 \ --cores 12 \ --log-path log 2014-05-18 15:08:42,953 - get_only_loci_with_min_taxa - INFO - Copied 600 alignments of 1330 total containing ≥ 0.75 proportion of taxa (n = 33) - get summary stats before adding missing data characters:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-min-33-taxa \ --cores 12 \ --log-path log 2014-05-18 15:09:02,851 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] loci: 600 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] length: 414,849 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] mean: 691.41 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] 95% CI: 44.37 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] min: 84 2014-05-18 15:09:02,852 - get_align_summary_data - INFO - [Alignments] max: 3,841 2014-05-18 15:09:02,853 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-05-18 15:09:02,853 - get_align_summary_data - INFO - [Taxa] mean: 39.20 2014-05-18 15:09:02,853 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.22 2014-05-18 15:09:02,854 - get_align_summary_data - INFO - [Taxa] min: 33 2014-05-18 15:09:02,854 - get_align_summary_data - INFO - [Taxa] max: 44 2014-05-18 15:09:02,854 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-05-18 15:09:02,854 - get_align_summary_data - INFO - [Missing] mean: 0.00 2014-05-18 15:09:02,855 - get_align_summary_data - INFO - [Missing] 95% CI: 0.00 2014-05-18 15:09:02,855 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-05-18 15:09:02,855 - get_align_summary_data - INFO - [Missing] max: 0.00 2014-05-18 15:09:02,864 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-05-18 15:09:02,864 - get_align_summary_data - INFO - [All characters] 16,217,055 2014-05-18 15:09:02,864 - get_align_summary_data - INFO - [Nucleotides] 12,208,242 2014-05-18 15:09:02,865 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-05-18 15:09:02,865 - get_align_summary_data - INFO - [Matrix 50%] 600 alignments 2014-05-18 15:09:02,865 - get_align_summary_data - INFO - [Matrix 55%] 600 alignments 2014-05-18 15:09:02,865 - get_align_summary_data - INFO - [Matrix 60%] 600 alignments 2014-05-18 15:09:02,865 - get_align_summary_data - INFO - [Matrix 65%] 600 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 70%] 600 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 75%] 600 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 80%] 554 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 85%] 500 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 90%] 384 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Matrix 95%] 212 alignments 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - ------------------------ Character counts ----------------------- 2014-05-18 15:09:02,866 - get_align_summary_data - INFO - [Characters] '-' is present 4,008,813 times 2014-05-18 15:09:02,867 - get_align_summary_data - INFO - [Characters] 'A' is present 3,318,039 times 2014-05-18 15:09:02,867 - get_align_summary_data - INFO - [Characters] 'C' is present 2,786,411 times 2014-05-18 15:09:02,867 - get_align_summary_data - INFO - [Characters] 'G' is present 2,764,222 times 2014-05-18 15:09:02,867 - get_align_summary_data - INFO - [Characters] 'T' is present 3,339,570 times 2014-05-18 15:09:02,867 - get_align_summary_data - INFO - ================ Completed get_align_summary_data =============== - get informative sites: python ~/git/phyluce/bin/align/get_informative_sites.py \ --input-format nexus \ --output informative-sites.csv \ --cores 12 mafft-nexus-min-33-taxa - prep raxml file (haven't added logging yet):: python ~/git/phyluce/bin/align/format_nexus_files_for_raxml.py \ --alignments mafft-nexus-min-33-taxa \ --output mafft-raxml \ --log-path log scp mafft-raxml/mafft-nexus-min-33-taxa.phylip brant@copy.rcc.uga.edu:/home/tcglab/brant/working/hymenoptera/trinity-kmer1/trimmed/ - run raxml 8.0.19 on zcluster:: #!/bin/bash cd /home/tcglab/brant/working/hymenoptera/trinity-kmer1/trimmed/ /usr/local/mpich2/3.0.4/gcc447/bin/mpirun -np $NSLOTS /home/tcglab/brant/git/brant/standard-RAxML/raxmlHPC-MPI-SSE3 -m GTRGAMMA -N 20 -p 982367231 -n best -s mafft-nexus-min-33-taxa.phylip -o nasvit2 qsub -q rcc-30d -pe mpi 10 hymv.sh - run bootreps:: #!/bin/bash cd /home/tcglab/brant/working/hymenoptera/trinity-kmer1/trimmed/ /usr/local/mpich2/3.0.4/gcc447/bin/mpirun -np $NSLOTS /home/tcglab/brant/git/brant/standard-RAxML/raxmlHPC-MPI-SSE3 -m GTRGAMMA -N 100 -p 982367231 -b 232858113 -n bootrep -s mafft-nexus-min-33-taxa.phylip -o nasvit2 qsub -q rcc-30d -pe mpi 10 hymvb.sh - run autoMRE: ~/git/raxml/raxmlHPC-SSE3 -I autoMRE -p 982367231 -m GTRGAMMA -n autoMRE -z RAxML_bootstrap.bootrep Found 100 trees in File RAxML_bootstrap.bootrep # Trees Avg WRF in % # Perms: wrf <= 3.00 % 50 0.49 100 Converged after 50 replicates - make final tree:: ~/git/raxml/raxmlHPC-SSE3 -m GTRGAMMA -p 982367231 -f b -t RAxML_bestTree.best -z RAxML_bootstrap.bootrep -n FINAL -o acordulecera_pellucida,nematus_tibialis,taxonus_pallidicornis - rename tips: python ~/git/phyluce/bin/genetrees/rename_genetree_leaves.py \ --input 05-26-2014-trinity-WITH-SAWFLIES-100-bootrep.tree \ --output renamed-tips/05-26-2014-trinity-WITH-SAWFLIES-100-bootrep-RENAMED-TIPS.tree \ --config tip-names.conf \ --section "with sawflies" \ --output-format newick ## incomplete untrimmed data without sawfly - create locus file:: python ~/git/phyluce/bin/assembly/get_match_counts.py \ --locus-db untrimmed-lastz/probe.matches.sqlite \ --taxon-list-config ../hymenoptera-data-sets.conf \ --taxon-group 'without sawflies' \ --output taxon-sets/trinity-kmer1-WITHOUT-SAWFLIES-incomplete/trinity-kmer1-WITHOUT-SAWFLIES-incomplete.conf \ --extend-locus-db ../in-silico/in-silico-lastz/probe.matches.sqlite \ --log-path untrimmed-log \ --incomplete-matrix 2014-05-21 08:56:43,073 - get_match_counts - INFO - There are 41 taxa in the taxon-group '[without sawflies]' in the config file hymenoptera-data-sets.conf 2014-05-21 08:56:43,074 - get_match_counts - INFO - Getting UCE names from database 2014-05-21 08:56:43,100 - get_match_counts - INFO - There are 1510 total UCE loci in the database 2014-05-21 08:56:43,684 - get_match_counts - INFO - Getting UCE matches by organism to generate a INCOMPLETE matrix 2014-05-21 08:56:43,690 - get_match_counts - INFO - There are 1365 UCE loci in an INCOMPLETE matrix 2014-05-21 08:56:43,692 - get_match_counts - INFO - Writing the taxa and loci in the data matrix to /nfs/data1/working/sbrady-hymenoptera/trinity-kmer1/taxon-sets/trinity-kmer1-WITHOUT-SAWFLIES-incomplete/trinity-kmer1-WITHOUT-SAWFLIES-incomplete.conf - get fastas:: python ~/git/phyluce/bin/assembly/get_fastas_from_match_counts.py \ --contigs ../../contigs-proper/contigs \ --locus-db ../../untrimmed-lastz/probe.matches.sqlite \ --match-count-output trinity-kmer1-WITHOUT-SAWFLIES-incomplete.conf \ --output trinity-kmer1-WITHOUT-SAWFLIES-incomplete.fasta \ --incomplete-matrix trinity-kmer1-WITHOUT-SAWFLIES-incomplete.incomplete \ --extend-locus-db ../../../in-silico/in-silico-lastz/probe.matches.sqlite \ --extend-locus-contigs ../../../in-silico/outgroup-fasta \ --log-path log - align fastas (many loci dropped - check logs):: python ~/git/phyluce/bin/align/seqcap_align_2.py \ --fasta trinity-kmer1-WITHOUT-SAWFLIES-incomplete.fasta \ --output mafft-nexus-notrim \ --taxa 41 \ --incomplete-matrix \ --cores 12 \ --log-path log \ --no-trim - strip locus name python ~/git/phyluce/bin/align/remove_locus_name_from_nexus_lines.py --alignments mafft-nexus-notrim --output mafft-nexus-notrim-clean --cores 12 - trim with trimal: mkdir mafft-nexus-trimal && cd mafft-nexus-trimal for i in ../mafft-nexus-notrim-clean/*.nexus; do trimal -in $i -out ./$i:t:r.fasta -automated1 -fasta | tee trimal.out; done - convert: python ~/git/phyluce/bin/align/convert_one_align_to_another.py --alignments mafft-nexus-trimal --output mafft-nexus-trimal-nexus --input-format fasta --output-format nexus --cores 12 --log-path - get summary data:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-trimal-nexus \ --input-format nexus \ --cores 12 \ --log-path log 2014-05-21 13:35:40,411 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-05-21 13:35:40,412 - get_align_summary_data - INFO - [Alignments] loci: 1,330 2014-05-21 13:35:40,412 - get_align_summary_data - INFO - [Alignments] length: 1,367,943 2014-05-21 13:35:40,412 - get_align_summary_data - INFO - [Alignments] mean: 1028.53 2014-05-21 13:35:40,413 - get_align_summary_data - INFO - [Alignments] 95% CI: 37.34 2014-05-21 13:35:40,413 - get_align_summary_data - INFO - [Alignments] min: 76 2014-05-21 13:35:40,413 - get_align_summary_data - INFO - [Alignments] max: 3,787 2014-05-21 13:35:40,414 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-05-21 13:35:40,415 - get_align_summary_data - INFO - [Taxa] mean: 24.37 2014-05-21 13:35:40,415 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.74 2014-05-21 13:35:40,415 - get_align_summary_data - INFO - [Taxa] min: 3 2014-05-21 13:35:40,415 - get_align_summary_data - INFO - [Taxa] max: 41 2014-05-21 13:35:40,416 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-05-21 13:35:40,416 - get_align_summary_data - INFO - [Missing] mean: 0.00 2014-05-21 13:35:40,416 - get_align_summary_data - INFO - [Missing] 95% CI: 0.00 2014-05-21 13:35:40,416 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-05-21 13:35:40,416 - get_align_summary_data - INFO - [Missing] max: 0.00 2014-05-21 13:35:40,433 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-05-21 13:35:40,433 - get_align_summary_data - INFO - [All characters] 27,272,649 2014-05-21 13:35:40,433 - get_align_summary_data - INFO - [Nucleotides] 19,980,558 2014-05-21 13:35:40,435 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 50%] 809 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 55%] 773 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 60%] 744 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 65%] 706 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 70%] 680 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 75%] 638 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 80%] 601 alignments 2014-05-21 13:35:40,436 - get_align_summary_data - INFO - [Matrix 85%] 555 alignments 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Matrix 90%] 487 alignments 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Matrix 95%] 352 alignments 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - ------------------------ Character counts ----------------------- 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Characters] '-' is present 7,292,091 times 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Characters] 'A' is present 5,420,131 times 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Characters] 'C' is present 4,570,491 times 2014-05-21 13:35:40,437 - get_align_summary_data - INFO - [Characters] 'G' is present 4,542,865 times 2014-05-21 13:35:40,438 - get_align_summary_data - INFO - [Characters] 'T' is present 5,447,071 times 2014-05-21 13:35:40,438 - get_align_summary_data - INFO - ================ Completed get_align_summary_data =============== - copy alignments having ≥ 30 taxa at minimum into nexus-min-30-taxa (75% complete):: python ~/git/phyluce/bin/align/get_only_loci_with_min_taxa.py \ --alignments mafft-nexus-trimal-nexus \ --taxa 41 \ --output mafft-nexus-min-30-taxa \ --percent 0.75 \ --cores 12 \ --log-path log 2014-05-21 13:36:24,870 - get_only_loci_with_min_taxa - INFO - Copied 638 alignments of 1330 total containing ≥ 0.75 proportion of taxa (n = 30) - get summary stats before adding missing data characters:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-min-30-taxa \ --cores 12 \ --log-path log 2014-05-21 13:36:45,010 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-05-21 13:36:45,010 - get_align_summary_data - INFO - [Alignments] loci: 638 2014-05-21 13:36:45,010 - get_align_summary_data - INFO - [Alignments] length: 470,258 2014-05-21 13:36:45,010 - get_align_summary_data - INFO - [Alignments] mean: 737.08 2014-05-21 13:36:45,011 - get_align_summary_data - INFO - [Alignments] 95% CI: 46.40 2014-05-21 13:36:45,011 - get_align_summary_data - INFO - [Alignments] min: 84 2014-05-21 13:36:45,011 - get_align_summary_data - INFO - [Alignments] max: 3,787 2014-05-21 13:36:45,012 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-05-21 13:36:45,012 - get_align_summary_data - INFO - [Taxa] mean: 37.21 2014-05-21 13:36:45,012 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.22 2014-05-21 13:36:45,012 - get_align_summary_data - INFO - [Taxa] min: 30 2014-05-21 13:36:45,012 - get_align_summary_data - INFO - [Taxa] max: 41 2014-05-21 13:36:45,013 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-05-21 13:36:45,013 - get_align_summary_data - INFO - [Missing] mean: 0.00 2014-05-21 13:36:45,013 - get_align_summary_data - INFO - [Missing] 95% CI: 0.00 2014-05-21 13:36:45,013 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-05-21 13:36:45,013 - get_align_summary_data - INFO - [Missing] max: 0.00 2014-05-21 13:36:45,021 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-05-21 13:36:45,021 - get_align_summary_data - INFO - [All characters] 17,473,546 2014-05-21 13:36:45,022 - get_align_summary_data - INFO - [Nucleotides] 12,999,368 2014-05-21 13:36:45,022 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 50%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 55%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 60%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 65%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 70%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 75%] 638 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 80%] 601 alignments 2014-05-21 13:36:45,023 - get_align_summary_data - INFO - [Matrix 85%] 555 alignments 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Matrix 90%] 487 alignments 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Matrix 95%] 352 alignments 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - ------------------------ Character counts ----------------------- 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Characters] '-' is present 4,474,178 times 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Characters] 'A' is present 3,545,792 times 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Characters] 'C' is present 2,960,388 times 2014-05-21 13:36:45,024 - get_align_summary_data - INFO - [Characters] 'G' is present 2,925,657 times 2014-05-21 13:36:45,025 - get_align_summary_data - INFO - [Characters] 'T' is present 3,567,531 times 2014-05-21 13:36:45,025 - get_align_summary_data - INFO - ================ Completed get_align_summary_data =============== - get informative sites: python ~/git/phyluce/bin/align/get_informative_sites.py \ --input-format nexus \ --output informative-sites.csv \ --cores 12 mafft-nexus-min-30-taxa - prep raxml file (haven't added logging yet):: python ~/git/phyluce/bin/align/format_nexus_files_for_raxml.py \ --alignments mafft-nexus-min-30-taxa \ --output mafft-raxml \ --log-path log scp mafft-raxml/mafft-nexus-min-30-taxa.phylip brant@copy.rcc.uga.edu:/home/tcglab/brant/working/hymenoptera/trinity-kmer1/nosawflies/ - run raxml 8.0.19 on zcluster:: #!/bin/bash cd /home/tcglab/brant/working/hymenoptera/trinity-kmer1/nosawflies/ /usr/local/mpich2/3.0.4/gcc447/bin/mpirun -np $NSLOTS /home/tcglab/brant/git/brant/standard-RAxML/raxmlHPC-MPI-SSE3 -m GTRGAMMA -N 20 -p 386755737 -n best -s mafft-nexus-min-30-taxa.phylip -o cersol1 qsub -q rcc-30d -pe mpi 20 hymns.sh - run bootreps:: #!/bin/bash cd /home/tcglab/brant/working/hymenoptera/trinity-kmer1/nosawflies/ /usr/local/mpich2/3.0.4/gcc447/bin/mpirun -np $NSLOTS /home/tcglab/brant/git/brant/standard-RAxML/raxmlHPC-MPI-SSE3 -m GTRGAMMA -N 100 -p 386755737 -b 678262178 -n bootrep -s mafft-nexus-min-30-taxa.phylip -o cersol1 qsub -q rcc-30d -pe mpi 20 hymnsb.sh - run autoMRE: ~/git/raxml/raxmlHPC-SSE3 -I autoMRE -p 386755737 -m GTRGAMMA -n autoMRE -z RAxML_bootstrap.bootrep # Trees Avg WRF in % # Perms: wrf <= 3.00 % 50 0.23 100 Converged after 50 replicates - make final tree:: ~/git/raxml/raxmlHPC-SSE3 -m GTRGAMMA -p 386755737 -f b -t RAxML_bestTree.best -z RAxML_bootstrap.bootrep -n FINAL -o cersol1 - rename tips: python ~/git/phyluce/bin/genetrees/rename_genetree_leaves.py \ --input 05-26-2014-trinity-WITHOUT-SAWFLIES-100-bootrep.tree \ --output renamed-tips/05-26-2014-trinity-WITHOUT-SAWFLIES-100-bootrep-RENAMED-TIPS.tree \ --config tip-names.conf \ --section "no sawflies" \ --output-format newick Untrimmed INCOMPLETE data set - MAFFT -------------------------------------- - create locus file:: python ~/git/phyluce/bin/assembly/get_match_counts.py \ --locus-db untrimmed-lastz/probe.matches.sqlite \ --taxon-list-config ../hymenoptera-data-sets.conf \ --taxon-group 'no sawflies no apiFlo' \ --output taxon-sets/TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE/TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.conf \ --extend-locus-db /nfs/data1/working/bfaircloth-hymenoptera-genome/outgroup-loci/probe.matches.sqlite \ --log-path untrimmed-log \ --incomplete-matrix - get fastas:: python ~/git/phyluce/bin/assembly/get_fastas_from_match_counts.py \ --contigs ../../contigs-proper/contigs \ --locus-db ../../untrimmed-lastz/probe.matches.sqlite \ --match-count-output TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.conf \ --output TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.fasta \ --incomplete-matrix TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.incomplete \ --extend-locus-db /nfs/data1/working/bfaircloth-hymenoptera-genome/outgroup-loci/probe.matches.sqlite \ --extend-locus-contigs /nfs/data1/working/bfaircloth-hymenoptera-genome/outgroup-loci/ \ --log-path log - align fastas (many loci dropped - check logs):: python ~/git/phyluce/bin/align/seqcap_align_2.py \ --fasta TRINITY-KMER1-WITH-sawflies-no-apiflo-INCOMPLETE.fasta \ --output mafft-nexus \ --taxa 40 \ --incomplete-matrix \ --cores 12 \ --log-path log - get summary data:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus \ --input-format nexus \ --cores 12 \ --log-path log 2014-03-27 09:02:59,251 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-03-27 09:02:59,251 - get_align_summary_data - INFO - [Alignments] loci: 1,306 2014-03-27 09:02:59,251 - get_align_summary_data - INFO - [Alignments] length: 886,397 2014-03-27 09:02:59,251 - get_align_summary_data - INFO - [Alignments] mean: 678.71 2014-03-27 09:02:59,252 - get_align_summary_data - INFO - [Alignments] 95% CI: 20.79 2014-03-27 09:02:59,252 - get_align_summary_data - INFO - [Alignments] min: 100 2014-03-27 09:02:59,252 - get_align_summary_data - INFO - [Alignments] max: 2,292 2014-03-27 09:02:59,253 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-03-27 09:02:59,253 - get_align_summary_data - INFO - [Taxa] mean: 23.52 2014-03-27 09:02:59,254 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.62 2014-03-27 09:02:59,254 - get_align_summary_data - INFO - [Taxa] min: 3 2014-03-27 09:02:59,254 - get_align_summary_data - INFO - [Taxa] max: 37 2014-03-27 09:02:59,267 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-03-27 09:02:59,267 - get_align_summary_data - INFO - [Missing] mean: 6.36 2014-03-27 09:02:59,267 - get_align_summary_data - INFO - [Missing] 95% CI: 0.31 2014-03-27 09:02:59,267 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-03-27 09:02:59,267 - get_align_summary_data - INFO - [Missing] max: 35.25 2014-03-27 09:02:59,282 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-03-27 09:02:59,282 - get_align_summary_data - INFO - [All characters] 20,711,501 2014-03-27 09:02:59,282 - get_align_summary_data - INFO - [Nucleotides] 11,697,665 2014-03-27 09:02:59,298 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-03-27 09:02:59,310 - get_align_summary_data - INFO - [Matrix 50%] 838 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 55%] 802 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 60%] 766 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 65%] 731 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 70%] 715 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 75%] 666 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 80%] 630 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 85%] 577 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 90%] 480 alignments 2014-03-27 09:02:59,311 - get_align_summary_data - INFO - [Matrix 95%] 281 alignments - some loci have 'X' as their sequence, remove those:: python ~/git/phyluce/bin/align/screen_alignments_for_problems.py \ --alignments mafft-nexus \ --output mafft-nexus-NO-X 2014-03-27 09:04:26,376 - screen_alignments_for_problems - WARNING - Removed locus uce-343.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-489.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-344.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-1018.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-982.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-165.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-934.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-342.nexus due to presence of X bases 2014-03-27 09:04:26,377 - screen_alignments_for_problems - WARNING - Removed locus uce-763.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-346.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-1042.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-1225.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-117.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-942.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-1017.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-1041.nexus due to presence of X bases 2014-03-27 09:04:26,378 - screen_alignments_for_problems - WARNING - Removed locus uce-1477.nexus due to presence of X bases 2014-03-27 09:04:26,379 - screen_alignments_for_problems - WARNING - Removed locus uce-1298.nexus due to presence of X bases 2014-03-27 09:04:26,379 - screen_alignments_for_problems - WARNING - Removed locus uce-1208.nexus due to presence of X bases - get summary data:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-NO-X \ --input-format nexus \ --cores 12 \ --log-path log 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] loci: 1,287 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] length: 874,352 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] mean: 679.37 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] 95% CI: 20.90 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] min: 100 2014-03-27 09:05:09,346 - get_align_summary_data - INFO - [Alignments] max: 2,292 2014-03-27 09:05:09,348 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-03-27 09:05:09,348 - get_align_summary_data - INFO - [Taxa] mean: 23.50 2014-03-27 09:05:09,348 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.63 2014-03-27 09:05:09,348 - get_align_summary_data - INFO - [Taxa] min: 3 2014-03-27 09:05:09,348 - get_align_summary_data - INFO - [Taxa] max: 37 2014-03-27 09:05:09,349 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-03-27 09:05:09,349 - get_align_summary_data - INFO - [Missing] mean: 6.38 2014-03-27 09:05:09,349 - get_align_summary_data - INFO - [Missing] 95% CI: 0.31 2014-03-27 09:05:09,349 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-03-27 09:05:09,349 - get_align_summary_data - INFO - [Missing] max: 35.25 2014-03-27 09:05:09,364 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-03-27 09:05:09,364 - get_align_summary_data - INFO - [All characters] 20,387,258 2014-03-27 09:05:09,364 - get_align_summary_data - INFO - [Nucleotides] 11,503,380 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - [Matrix 50%] 825 alignments 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - [Matrix 55%] 790 alignments 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - [Matrix 60%] 755 alignments 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - [Matrix 65%] 720 alignments 2014-03-27 09:05:09,366 - get_align_summary_data - INFO - [Matrix 70%] 704 alignments 2014-03-27 09:05:09,367 - get_align_summary_data - INFO - [Matrix 75%] 656 alignments 2014-03-27 09:05:09,367 - get_align_summary_data - INFO - [Matrix 80%] 620 alignments 2014-03-27 09:05:09,367 - get_align_summary_data - INFO - [Matrix 85%] 568 alignments 2014-03-27 09:05:09,367 - get_align_summary_data - INFO - [Matrix 90%] 472 alignments 2014-03-27 09:05:09,367 - get_align_summary_data - INFO - [Matrix 95%] 274 alignments - copy alignments having ≥ 27 taxa at minimum into nexus-min-27-taxa (75% complete):: python ~/git/phyluce/bin/align/get_only_loci_with_min_taxa.py \ --alignments mafft-nexus-NO-X \ --taxa 37 \ --output mafft-nexus-min-27-taxa \ --percent 0.75 \ --cores 12 \ --log-path log 2014-03-27 09:05:59,023 - get_only_loci_with_min_taxa - INFO - Copied 656 alignments of 1287 total containing ≥ 0.75 proportion of taxa (n = 27) - get summary stats before adding missing data characters:: python ~/git/phyluce/bin/align/get_align_summary_data.py \ --alignments mafft-nexus-min-27-taxa \ --cores 12 \ --log-path log 2014-03-27 09:06:28,583 - get_align_summary_data - INFO - ----------------------- Alignment summary ----------------------- 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] loci: 656 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] length: 442,665 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] mean: 674.79 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] 95% CI: 30.96 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] min: 104 2014-03-27 09:06:28,584 - get_align_summary_data - INFO - [Alignments] max: 2,292 2014-03-27 09:06:28,585 - get_align_summary_data - INFO - ------------------------- Taxon summary ------------------------- 2014-03-27 09:06:28,585 - get_align_summary_data - INFO - [Taxa] mean: 33.57 2014-03-27 09:06:28,585 - get_align_summary_data - INFO - [Taxa] 95% CI: 0.19 2014-03-27 09:06:28,585 - get_align_summary_data - INFO - [Taxa] min: 27 2014-03-27 09:06:28,585 - get_align_summary_data - INFO - [Taxa] max: 37 2014-03-27 09:06:28,586 - get_align_summary_data - INFO - ----------------- Missing data from trim summary ---------------- 2014-03-27 09:06:28,586 - get_align_summary_data - INFO - [Missing] mean: 4.48 2014-03-27 09:06:28,586 - get_align_summary_data - INFO - [Missing] 95% CI: 0.24 2014-03-27 09:06:28,586 - get_align_summary_data - INFO - [Missing] min: 0.00 2014-03-27 09:06:28,586 - get_align_summary_data - INFO - [Missing] max: 17.84 2014-03-27 09:06:28,594 - get_align_summary_data - INFO - -------------------- Character count summary -------------------- 2014-03-27 09:06:28,594 - get_align_summary_data - INFO - [All characters] 14,881,990 2014-03-27 09:06:28,594 - get_align_summary_data - INFO - [Nucleotides] 8,148,960 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - ---------------- Data matrix completeness summary --------------- 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 50%] 656 alignments 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 55%] 656 alignments 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 60%] 656 alignments 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 65%] 656 alignments 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 70%] 656 alignments 2014-03-27 09:06:28,595 - get_align_summary_data - INFO - [Matrix 75%] 656 alignments 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Matrix 80%] 620 alignments 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Matrix 85%] 568 alignments 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Matrix 90%] 472 alignments 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Matrix 95%] 274 alignments 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - ------------------------ Character counts ----------------------- 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Characters] '-' is present 5,984,986 times 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Characters] '?' is present 748,044 times 2014-03-27 09:06:28,596 - get_align_summary_data - INFO - [Characters] 'A' is present 2,062,899 times 2014-03-27 09:06:28,597 - get_align_summary_data - INFO - [Characters] 'C' is present 2,013,911 times 2014-03-27 09:06:28,597 - get_align_summary_data - INFO - [Characters] 'G' is present 1,982,927 times 2014-03-27 09:06:28,597 - get_align_summary_data - INFO - [Characters] 'T' is present 2,089,223 times 2014-03-27 09:06:28,597 - get_align_summary_data - INFO - ================ Completed get_align_summary_data =============== - add missing data designators:: python ~/git/phyluce/bin/align/add_missing_data_designators.py \ --alignments mafft-nexus-min-27-taxa \ --output mafft-nexus-min-27-taxa-with-missing \ --match-count-output TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.conf \ --incomplete-matrix TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.incomplete \ --cores 12 - prep raxml file (haven't added logging yet):: mkdir mafft-raxml python ~/git/phyluce/bin/align/format_nexus_files_for_raxml.py \ --alignments mafft-nexus-min-27-taxa-with-missing \ --output mafft-raxml/TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.phylip \ --log-path log - make scratch dir:: mkdir /scratch/sbrady-hymenoptera/TRINITY-no-sawflies-no-apiflo/ cp mafft-raxml/TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.phylip /scratch/sbrady-hymenoptera/TRINITY-no-sawflies-no-apiflo/ - run raxml 7.2.6:: cd /scratch/sbrady-hymenoptera/TRINITY-no-sawflies-no-apiflo/ raxmlHPC-PTHREADS-SSE3 -m GTRGAMMA -p 505217375 -s TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.phylip -N 20 -n best -T 12 Alignment has 212351 distinct alignment patterns Proportion of gaps and completely undetermined characters in this alignment: 50.25% RAxML rapid hill-climbing mode Using 1 distinct models/data partitions with joint branch length optimization Executing 20 inferences on the original alignment using 20 distinct randomized MP trees All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units Partition: 0 Alignment Patterns: 212351 Name: No Name Provided DataType: DNA Substitution Matrix: GTR - run bootreps:: raxmlHPC-PTHREADS-SSE3 -m GTRGAMMA -p 505217375 -b 454951462 -s TRINITY-KMER1-no-sawflies-no-apiflo-INCOMPLETE.phylip -N 100 -n bootrep -T 12 - make final tree:: raxmlHPC-SSE3 -m GTRGAMMA -p 505217375 -f b -t RAxML_bestTree.best -z RAxML_bootstrap.bootrep -n FINAL -o nasVit2,nasonia_vitripennis Genetic distance ================ - computed genetic distance against concatenated alignment - convert alignment from phylip to fasta:: python ~/git/phyluce/bin/align/convert_one_align_to_another.py \ --alignments ../mafft-raxml/ \ --output ./mafft-nexus-min-33-taxa.fasta \ --input-format phylip-relaxed \ --output-format fasta - edit names to shortnames by hand - compute genetic distance in py-cogent:: from cogent import LoadSeqs from cogent.phylo import distance from cogent.evolve.models import GTR al = LoadSeqs("mafft-nexus-min-33-taxa.fasta") d = distance.EstimateDistances(al, submodel= GTR()) d.run() print d d.writeToFile('distance-estimates-with-GTR.phylip', format="phylip") - copy distances to excel, text-to-colums to split, save file - copy distances over to read-count-contig-count spreadsheet