Published September 18, 2024 | Version v3
Dataset Open

Grass Phylogeny Working Group III: data repository

Authors/Creators

  • 1. ROR icon Mahidol University
  • 2. ROR icon Royal Botanic Gardens, Kew
  • 3. ROR icon Aarhus University
  • 4. ROR icon Australian Tropical Herbarium
  • 5. National Herbarium of New South Wales
  • 6. University of New South Wales
  • 7. ROR icon University of Georgia
  • 8. ROR icon Université Toulouse III - Paul Sabatier
  • 9. ROR icon University of Sheffield
  • 10. ROR icon University of Melbourne
  • 11. University of Zaragoza
  • 12. National Herbarium (PE), Institute of Botany, Beijing
  • 13. State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences
  • 14. ROR icon Iowa State University
  • 15. California Botanic Garden
  • 16. ROR icon Claremont Graduate University
  • 17. ROR icon Missouri Botanical Garden
  • 18. ROR icon Northern Illinois University
  • 19. ROR icon California State University, Long Beach
  • 20. ROR icon Norwegian University of Life Sciences
  • 21. Canadian Museum of Nature, Ottawa
  • 22. ROR icon Philipps University of Marburg
  • 23. Sorbonne Université
  • 24. ROR icon Trinity College Dublin
  • 25. ROR icon Fudan University
  • 26. Inner Mongolia University
  • 27. ROR icon Pennsylvania State University
  • 28. ROR icon Stockholm University
  • 29. Tengeru Institute of Community Development
  • 30. ROR icon Donald Danforth Plant Science Center
  • 31. ROR icon Arnold Arboretum
  • 32. East Africa Herbarium, National Museums of Kenya
  • 33. Parc Botanique et Zoologique de Tsimbazaza, Antananarivo
  • 34. Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences
  • 35. Sociedad Colombiana del Bambú
  • 36. Western Australian Herbarium
  • 37. University of Alabama, Tuscaloosa
  • 38. National Biodiversity DNA Library, CSIRO
  • 39. Royal Botanic Gardens Victoria
  • 40. ROR icon Wuhan Botanical Garden
  • 41. National Museum of Natural History, Smithsonian Institution
  • 42. ROR icon Pittsburg State University
  • 43. ROR icon Meise Botanic Garden
  • 44. Brisbane Botanic Gardens
  • 45. ROR icon University of Gothenburg
  • 46. ROR icon Gothenburg Botanic Garden
  • 47. ROR icon University of Missouri
  • 48. ROR icon Kanchanaburi Rajabhat University
  • 49. ROR icon University of Adelaide
  • 50. Universidade Federal de Uberlândia
  • 51. South China Botanical Garden, Chinese Academy of Sciences
  • 52. Southwest University, Chongqing
  • 53. ROR icon Instituto de Botanica Darwinion

Description

Grass Phylogeny Working Group III: data repository

Phylogenetic analyses of the grass family (Poaceae) using nuclear and plastid data. The data set includes 1153 accessions corresponding to 1133 accepted species. Genomic data was obtained from different sources including target capture, shotgun, transcriptomes and annotated genomes. Nuclear markers (Angiosperm353 gene set) were assembled from short read data using HybPiper or a custom assembly pipeline optimized for low coverage shotgun data. Plastid genes were either retrieved from published plastome sequences or assembled here using getOrganelle. This data set also includes the results of a gene tree-species tree reconciliation analysis using GeneRax.

 

Contact persons:

Matheus E. Bianconi (matheus-enrique.bianconi@univ-tlse3.fr), Jan Hackel (jan.hackel@uni-marburg.de), Maria S. Vorontsova (m.vorontsova@kew.org)

 

Content description

1. Metadata

  • gpwgIII_samples_metadata_taxonomy.tsv

Tab-separated file with details for all 1,702 accessions used in this study. Columns: analysis_ID - ID in nuclear analyses; analysis_ID_plastome - ID in plastome analyses; acc_species - accepted species name; acc_species_author - taxonomic species authority; acc_genus - accepted genus name; acc_genus_author - taxomomic genus authority; publication - associated prior publication; data type - type of sequence data; isolate - laboratory isolate ID; voucher_ID - herbarium voucher ID; germplasm_ID - germplasm collection ID; repo_accession - accession number in public repository; plastome_accession - accession number of assembled plastome sequence; removed_nuclear - reason for removal from nuclear tree, if applicable; removed_plastome - reason for removal from plastome tree, if applicable; soreng2022_genus - genus name in Soreng et al. 2022, https://doi.org/10.1111/jse.12847; subtribe, tribe, subfamily, major.clade - classification according to Soreng et al. 2022.

2. Nuclear data

- Dataset1 ("main")
Number of samples: 1153
Number of genes: 331
Alignment trimming threshold: gt = 0.1 (removed sites > 90% missing data)
Genes per sample: > 166

- Dataset2 ("strict trimming")
Number of samples: 1153
Number of genes: 315
Alignment trimming threshold: gt = 0.5 (removed sites > 50% missing data)
Genes per sample: > 158

- Dataset3 (dataset 1 without shotgun samples)
Number of samples: 841
Number of genes: 331
Alignment trimming threshold: gt = 0.1 (removed sites > 90% missing data)
Genes per sample: > 166

2.1. Raw sequences

Raw Ang353 sequence assemblies for all samples (pre-trimming and filtering)

  • raw_Ang353_sequences.zip

2.2 Nuclear gene alignments

Trimmed alignments from datasets 1, 2 and 3.

  • alignments_dataset1_main_final.zip
  • alignments_dataset2_strict_trimming_final.zip
  • alignments_dataset3_no_shotgun_final.zip

2.3. Nuclear gene trees
Gene trees inferred using RAxML (GTRCAT, 100 bootstraps) for the alignments from datasets 1, 2 and 3.

  • gene_trees_dataset1_main_final.zip
  • gene_trees_dataset2_strict_trimming_final.zip
  • gene_trees_dataset3_no_shotgun_final.zip

2.4. Multigene species trees
Multigene species trees obtained using Astral-Pro3 from gene trees for datasets 1, 2 and 3. 

  • astralpro_trees.zip, which includes:
    • trees_Ang353_grasses_dataset1_main_gtrcat.astralpro
    • trees_Ang353_grasses_dataset2_strict_trimming_gtrcat.astralpro
    • trees_Ang353_grasses_dataset3_no_shotgun_gtrcat.astralpro

3. Gene tree–species tree reconciliation

  • generax.zip

Compressed zip archive with input files and results, including log files, of the GeneRax reconciliation analysis. One subfolder for each of the four analyses run: "all_tribes", "Andropogoneae", "Bambusoideae", "Triticeae".

  • transfers_reconciliation_analyses.zip, which includes:
    • transfers_all_all_tribes.tsv: Tab-separated file with all transfers inferred with the tribe-level Poaceae reconciliation analysis. Each line represents one transfer inferred.
    • transfers_all_Andropogoneae.tsv: Tab-separated file with all transfers inferred with the Andropogoneae reconciliation analysis. Each line represents one transfer inferred.
    • transfers_all_Bambusoideae.tsv: Tab-separated file with all transfers inferred with the Bambusoideae reconciliation analysis. Each line represents one transfer inferred.
    • transfers_all_Triticeae.tsv: Tab-separated file with all transfers inferred with the Triticeae reconciliation analysis. Each line represents one transfer inferred.
    • transfers_counts_all_tribes.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the tribe-level Poaceae reconciliation analysis.
    • transfers_counts_Andropogoneae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Andropogoneae reconciliation analysis.
    • transfers_counts_Bambusoideae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Bambusoideae reconciliation analysis.
    • transfers_counts_Triticeae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Triticeae reconciliation analysis.

4. Plastome data

Alignment and phylogenetic tree from plastome data.

  • plastome_files.zip, which includes
    • reduced_plastome_concat_CDS_trnLtrnF_trimmed.fna-out.fas: FASTA file with the final, concatenated DNA alignment of 71 plastome regions for 910 accessions, after data filtering.
    • partitions.txt: Text file with positions of the 71 plastome regions in the concatenated alignment.
    • plastome_concat_CDS_trnLtrnF_trimmed_TBE.raxml.support: Plastome tree with Transfer Bootstrap Expectation values as node labels.
    • RAxML_bipartitions.plastome_concat_CDS_trnLtrnF_trimmed: Maximum likelihood plastome tree inferred with RAxML, with Felsenstein bootstrap values as node labels.
    • RAxML_bootstrap.plastome_concat_CDS_trnLtrnF_trimmed: 100 rapid bootstrap pseudoreplicate plastome trees inferred with RAxML.
    • RAxML_info.plastome_concat_CDS_trnLtrnF_trimmed: RAxML analysis log file.
    • nuc_plastome_matching_tips.tab: Tab-separated file with accessions matched in nuclear-plastome comparison.

5. Poaceae-specific reference Ang353 dataset
Reference sequence dataset used for the assembly of Ang353 sequences in this study.

  • target_Ang353_sequences_grasses.zip

6. Shotgun assembly script

Custom script used for the assembly of Ang353 sequences from shotgun data

  • shotgun_assembler_script.zip, which includes:
    • shotgun_assembler_Ang353_sequences.sh: script for assembly of short reads from shotgun data
    • template_manifest_file.tsv: TAB-separated file to specify sample names and location of short read files (required by the assembly script)
    • list_Ang353_genes_orthofinder.txt: list of Ang353 gene identifiers (required by the assembly script)

7. Quartet metrics script

R script to calculate the Quartet Concordance (QC) and Quartet Differential (QD) metrics from the gene tree frequencies/proportions for each quartet at a branch, following Pease et al. 2018 (American Journal of Botany, https://doi.org/10.1002/ajb2.1016).

  • quartet_metrics.R

Files

alignments_dataset1_main_final.zip

Files (312.2 MB)

Name Size Download all
md5:902a12f209a13d28d9cc3c0e04c28579
49.4 MB Preview Download
md5:32506088306d66eaeaadf2285dec0f4c
42.1 MB Preview Download
md5:bc83e988d42ad8e074f6e6a74ed0ab24
35.1 MB Preview Download
md5:d872fb7d86e7f7e06fb3f4c52bf27ff5
195.0 kB Preview Download
md5:e80b9bbf0e7a3185541d93f1f38b2461
10.0 MB Preview Download
md5:ffa6d2806ff8ac74aecb74cdc4f83e82
9.4 MB Preview Download
md5:cb11fb6a87a5a0308b9cc60c6e4697fb
7.5 MB Preview Download
md5:b73a449a12a7c56426c3b0d4b982281f
91.4 MB Preview Download
md5:1537a7a9755fcbb4616ce99288dc3132
507.8 kB Download
md5:17e00fc22f3054a3661ad8a0eb8bd2b5
90.3 kB Download
md5:51e4ca9bea3e2919e8d64094b33e346c
15.1 MB Preview Download
md5:e4306060c4a3bb3fb1cecfa3c3cd1e02
993 Bytes Download
md5:ea8cfd0b1eb295a999810b236394a5e8
47.1 MB Preview Download
md5:01d64a21f9c1f0ec3800825e108fd821
19.9 kB Preview Download
md5:8fdb65da30a1d83ee3d0cce2348e9edd
3.6 MB Preview Download
md5:19e4a26f03b3f55cff7db43a486c50a4
585.9 kB Preview Download