Grass Phylogeny Working Group III: data repository
Authors/Creators
-
Arthan, Watchara1
-
Baker, William J.2, 3
-
Barrett, Matthew D.4
-
Barrett, Russell L.5, 6
-
Bennetzen, Jeffrey7
-
Besnard, Guillaume8
-
Bianconi, Matheus
(Contact person)9, 8
- Birch, Joanne L.10
-
Catalán, Pilar11
- Chen, Wenli12, 13
- Christenhusz, Maarten2
-
Christin, Pascal-Antoine9
-
Clark, Lynn G.14
-
Columbus, J. Travis15, 16
-
Couch, Charlotte2
-
Crayn, Darren M.4
- Davidse, Gerrit17
- Dransfield, Soejatmi2
-
Dunning, Luke T.9
-
Duvall, Melvin R.18
- Ficinski, Sarah Z.2
-
Fisher, Amanda E.19
-
Fjellheim, Siri20
-
Forest, Felix2
-
Gillespie, Lynn J.21
-
Hackel, Jan
(Contact person)2, 22
-
Haevermans, Thomas23
-
Hodkinson, Trevor R.24
- Huang, Chien-Hsun25, 26
- Huang, Weichen27
-
Humphreys, Aelys M.28
-
Jobson, Richard W.5
- Kayombo, Canisius J.29
-
Kellogg, Elizabeth A.30, 31
-
Kimeu, John M.32
-
Larridon, Isabel2
- Letsara, Rokiman33
-
Li, De-Zhu34
- Liu, Jing-Xia34
- Londoño, Ximena35
- Luke, Quentin W.R.32
-
Ma, Hong27
-
Macfarlane, Terry D.36
-
Maurin, Olivier2
-
McKain, Michael R.37
- McLay, Todd G.B.38, 39, 10
-
Moreno-Aguilar, Maria Fernanda11
-
Murphy, Daniel J.6, 39, 10
- Nanjarisoa, Olinirina P.2
-
Onjalalaina, Guy E.40
-
Peterson, Paul M.41
- Rakotonasolo, Rivontsoa A.33
- Razanatsoa, Jacqueline33
-
Saarela, Jeffery M.21
- Simpson, Lalita4
-
Snow, Neil W.42
-
Soreng, Robert J.41
-
Sosef, Marc43
-
Thompson, John J.E.44
- Traiperm, Paweena1
-
Verboom, G. Anthony45, 46
-
Vorontsova, Maria S.
(Contact person)2
-
Walsh, Neville G.39
-
Washburn, Jacob D.47
-
Watcharamongkol, Teera48
-
Waycott, Michelle49
-
Welker, Cassiano A.D.50
-
Xanthos, Martin D.2
-
Xia, Nianhe51
-
Zhang, Lin52
-
Zizka, Alexander22
- Zuloaga, Fernando O.53
-
Zuntini, Alexandre R.2
-
1.
Mahidol University
-
2.
Royal Botanic Gardens, Kew
-
3.
Aarhus University
-
4.
Australian Tropical Herbarium
- 5. National Herbarium of New South Wales
- 6. University of New South Wales
-
7.
University of Georgia
-
8.
Université Toulouse III - Paul Sabatier
-
9.
University of Sheffield
-
10.
University of Melbourne
- 11. University of Zaragoza
- 12. National Herbarium (PE), Institute of Botany, Beijing
- 13. State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, Chinese Academy of Sciences
-
14.
Iowa State University
- 15. California Botanic Garden
-
16.
Claremont Graduate University
-
17.
Missouri Botanical Garden
-
18.
Northern Illinois University
-
19.
California State University, Long Beach
-
20.
Norwegian University of Life Sciences
- 21. Canadian Museum of Nature, Ottawa
-
22.
Philipps University of Marburg
- 23. Sorbonne Université
-
24.
Trinity College Dublin
-
25.
Fudan University
- 26. Inner Mongolia University
-
27.
Pennsylvania State University
-
28.
Stockholm University
- 29. Tengeru Institute of Community Development
-
30.
Donald Danforth Plant Science Center
-
31.
Arnold Arboretum
- 32. East Africa Herbarium, National Museums of Kenya
- 33. Parc Botanique et Zoologique de Tsimbazaza, Antananarivo
- 34. Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences
- 35. Sociedad Colombiana del Bambú
- 36. Western Australian Herbarium
- 37. University of Alabama, Tuscaloosa
- 38. National Biodiversity DNA Library, CSIRO
- 39. Royal Botanic Gardens Victoria
-
40.
Wuhan Botanical Garden
- 41. National Museum of Natural History, Smithsonian Institution
-
42.
Pittsburg State University
-
43.
Meise Botanic Garden
- 44. Brisbane Botanic Gardens
-
45.
University of Gothenburg
-
46.
Gothenburg Botanic Garden
-
47.
University of Missouri
-
48.
Kanchanaburi Rajabhat University
-
49.
University of Adelaide
- 50. Universidade Federal de Uberlândia
- 51. South China Botanical Garden, Chinese Academy of Sciences
- 52. Southwest University, Chongqing
-
53.
Instituto de Botanica Darwinion
Description
Grass Phylogeny Working Group III: data repository
Phylogenetic analyses of the grass family (Poaceae) using nuclear and plastid data. The data set includes 1153 accessions corresponding to 1133 accepted species. Genomic data was obtained from different sources including target capture, shotgun, transcriptomes and annotated genomes. Nuclear markers (Angiosperm353 gene set) were assembled from short read data using HybPiper or a custom assembly pipeline optimized for low coverage shotgun data. Plastid genes were either retrieved from published plastome sequences or assembled here using getOrganelle. This data set also includes the results of a gene tree-species tree reconciliation analysis using GeneRax.
Contact persons:
Matheus E. Bianconi (matheus-enrique.bianconi@univ-tlse3.fr), Jan Hackel (jan.hackel@uni-marburg.de), Maria S. Vorontsova (m.vorontsova@kew.org)
Content description
1. Metadata
gpwgIII_samples_metadata_taxonomy.tsv
Tab-separated file with details for all 1,702 accessions used in this study. Columns: analysis_ID - ID in nuclear analyses; analysis_ID_plastome - ID in plastome analyses; acc_species - accepted species name; acc_species_author - taxonomic species authority; acc_genus - accepted genus name; acc_genus_author - taxomomic genus authority; publication - associated prior publication; data type - type of sequence data; isolate - laboratory isolate ID; voucher_ID - herbarium voucher ID; germplasm_ID - germplasm collection ID; repo_accession - accession number in public repository; plastome_accession - accession number of assembled plastome sequence; removed_nuclear - reason for removal from nuclear tree, if applicable; removed_plastome - reason for removal from plastome tree, if applicable; soreng2022_genus - genus name in Soreng et al. 2022, https://doi.org/10.1111/jse.12847; subtribe, tribe, subfamily, major.clade - classification according to Soreng et al. 2022.
2. Nuclear data
- Dataset1 ("main")
Number of samples: 1153
Number of genes: 331
Alignment trimming threshold: gt = 0.1 (removed sites > 90% missing data)
Genes per sample: > 166
- Dataset2 ("strict trimming")
Number of samples: 1153
Number of genes: 315
Alignment trimming threshold: gt = 0.5 (removed sites > 50% missing data)
Genes per sample: > 158
- Dataset3 (dataset 1 without shotgun samples)
Number of samples: 841
Number of genes: 331
Alignment trimming threshold: gt = 0.1 (removed sites > 90% missing data)
Genes per sample: > 166
2.1. Raw sequences
Raw Ang353 sequence assemblies for all samples (pre-trimming and filtering)
raw_Ang353_sequences.zip
2.2 Nuclear gene alignments
Trimmed alignments from datasets 1, 2 and 3.
alignments_dataset1_main_final.zipalignments_dataset2_strict_trimming_final.zipalignments_dataset3_no_shotgun_final.zip
2.3. Nuclear gene trees
Gene trees inferred using RAxML (GTRCAT, 100 bootstraps) for the alignments from datasets 1, 2 and 3.
gene_trees_dataset1_main_final.zipgene_trees_dataset2_strict_trimming_final.zipgene_trees_dataset3_no_shotgun_final.zip
2.4. Multigene species trees
Multigene species trees obtained using Astral-Pro3 from gene trees for datasets 1, 2 and 3.
astralpro_trees.zip, which includes:- trees_Ang353_grasses_dataset1_main_gtrcat.astralpro
- trees_Ang353_grasses_dataset2_strict_trimming_gtrcat.astralpro
- trees_Ang353_grasses_dataset3_no_shotgun_gtrcat.astralpro
3. Gene tree–species tree reconciliation
generax.zip
Compressed zip archive with input files and results, including log files, of the GeneRax reconciliation analysis. One subfolder for each of the four analyses run: "all_tribes", "Andropogoneae", "Bambusoideae", "Triticeae".
transfers_reconciliation_analyses.zip, which includes:- transfers_all_all_tribes.tsv: Tab-separated file with all transfers inferred with the tribe-level Poaceae reconciliation analysis. Each line represents one transfer inferred.
- transfers_all_Andropogoneae.tsv: Tab-separated file with all transfers inferred with the Andropogoneae reconciliation analysis. Each line represents one transfer inferred.
- transfers_all_Bambusoideae.tsv: Tab-separated file with all transfers inferred with the Bambusoideae reconciliation analysis. Each line represents one transfer inferred.
- transfers_all_Triticeae.tsv: Tab-separated file with all transfers inferred with the Triticeae reconciliation analysis. Each line represents one transfer inferred.
- transfers_counts_all_tribes.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the tribe-level Poaceae reconciliation analysis.
- transfers_counts_Andropogoneae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Andropogoneae reconciliation analysis.
- transfers_counts_Bambusoideae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Bambusoideae reconciliation analysis.
- transfers_counts_Triticeae.tsv: Tab-separated file with aggregated transfer counts, in both directions for each reticulate connection, from the Triticeae reconciliation analysis.
4. Plastome data
Alignment and phylogenetic tree from plastome data.
plastome_files.zip, which includes- reduced_plastome_concat_CDS_trnLtrnF_trimmed.fna-out.fas: FASTA file with the final, concatenated DNA alignment of 71 plastome regions for 910 accessions, after data filtering.
- partitions.txt: Text file with positions of the 71 plastome regions in the concatenated alignment.
- plastome_concat_CDS_trnLtrnF_trimmed_TBE.raxml.support: Plastome tree with Transfer Bootstrap Expectation values as node labels.
- RAxML_bipartitions.plastome_concat_CDS_trnLtrnF_trimmed: Maximum likelihood plastome tree inferred with RAxML, with Felsenstein bootstrap values as node labels.
- RAxML_bootstrap.plastome_concat_CDS_trnLtrnF_trimmed: 100 rapid bootstrap pseudoreplicate plastome trees inferred with RAxML.
- RAxML_info.plastome_concat_CDS_trnLtrnF_trimmed: RAxML analysis log file.
- nuc_plastome_matching_tips.tab: Tab-separated file with accessions matched in nuclear-plastome comparison.
5. Poaceae-specific reference Ang353 dataset
Reference sequence dataset used for the assembly of Ang353 sequences in this study.
target_Ang353_sequences_grasses.zip
6. Shotgun assembly script
Custom script used for the assembly of Ang353 sequences from shotgun data
shotgun_assembler_script.zip, which includes:- shotgun_assembler_Ang353_sequences.sh: script for assembly of short reads from shotgun data
- template_manifest_file.tsv: TAB-separated file to specify sample names and location of short read files (required by the assembly script)
- list_Ang353_genes_orthofinder.txt: list of Ang353 gene identifiers (required by the assembly script)
7. Quartet metrics script
R script to calculate the Quartet Concordance (QC) and Quartet Differential (QD) metrics from the gene tree frequencies/proportions for each quartet at a branch, following Pease et al. 2018 (American Journal of Botany, https://doi.org/10.1002/ajb2.1016).
quartet_metrics.R
Files
alignments_dataset1_main_final.zip
Files
(312.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:902a12f209a13d28d9cc3c0e04c28579
|
49.4 MB | Preview Download |
|
md5:32506088306d66eaeaadf2285dec0f4c
|
42.1 MB | Preview Download |
|
md5:bc83e988d42ad8e074f6e6a74ed0ab24
|
35.1 MB | Preview Download |
|
md5:d872fb7d86e7f7e06fb3f4c52bf27ff5
|
195.0 kB | Preview Download |
|
md5:e80b9bbf0e7a3185541d93f1f38b2461
|
10.0 MB | Preview Download |
|
md5:ffa6d2806ff8ac74aecb74cdc4f83e82
|
9.4 MB | Preview Download |
|
md5:cb11fb6a87a5a0308b9cc60c6e4697fb
|
7.5 MB | Preview Download |
|
md5:b73a449a12a7c56426c3b0d4b982281f
|
91.4 MB | Preview Download |
|
md5:1537a7a9755fcbb4616ce99288dc3132
|
507.8 kB | Download |
|
md5:17e00fc22f3054a3661ad8a0eb8bd2b5
|
90.3 kB | Download |
|
md5:51e4ca9bea3e2919e8d64094b33e346c
|
15.1 MB | Preview Download |
|
md5:e4306060c4a3bb3fb1cecfa3c3cd1e02
|
993 Bytes | Download |
|
md5:ea8cfd0b1eb295a999810b236394a5e8
|
47.1 MB | Preview Download |
|
md5:01d64a21f9c1f0ec3800825e108fd821
|
19.9 kB | Preview Download |
|
md5:8fdb65da30a1d83ee3d0cce2348e9edd
|
3.6 MB | Preview Download |
|
md5:19e4a26f03b3f55cff7db43a486c50a4
|
585.9 kB | Preview Download |