This README.txt file was generated on 26 November, 2020 by Joel Nitta

------------------- GENERAL INFORMATION -----------------

Title of Dataset: Data from: A taxonomic and molecular survey of the
pteridophytes of the Nectandra Cloud Forest Reserve, Costa Rica

Author Information

Principal Investigator: Joel H. Nitta

Department of Biological Sciences, Graduate School of Science, The University of
Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan joelnitta@gmail.com

Associate or Co-investigator: Atsushi Ebihara

Department of Botany, National Museum of Nature and Science, 4-1-1 Amakubo,
Tsukuba 305-0005, Japan ebihara@kahaku.go.jp

Associate or Co-investigator: Alan R. Smith

University Herbarium, University of California, Berkeley. 1001 Valley Life
Sciences Bldg. #2465. Berkeley, California, 94720, U.S.A. arsmith@berkeley.edu

Date of data collection: 2008–2018

Geographic location of data collection: Nectandra Cloud Forest Reserve, Costa
Rica

Information about funding sources or sponsorship that supported the collection
of the data: Funding provided in part by the Nectandra Institute and Japan
Society for the Promotion of Science (Kakenhi grant no. 15K07204)

--------------------------

SHARING/ACCESS INFORMATION

--------------------------

Licenses/restrictions placed on the data, or limitations of reuse: CC0 1.0
Universal (CC0 1.0)

Recommended citation for the data: Nitta JH, Ebihara A, Smith AR (2020) Data
from: A taxonomic and molecular survey of the pteridophytes of the Nectandra
Cloud Forest Reserve, Costa Rica. Dryad Digital Repository.
https://doi.org/10.5061/dryad.bnzs7h477

Citation for and links to publications that cite or use the data: Nitta JH,
Ebihara A, Smith AR (2020) A taxonomic and molecular survey of the pteridophytes
of the Nectandra Cloud Forest Reserve, Costa Rica. PLOS ONE.
https://doi.org/10.1371/journal.pone.0241231

Code for analyzing the data is available on github:
https://github.com/joelnitta/nectandra_ferns

--------------------

DATA & FILE OVERVIEW

--------------------

File list (filenames, directory structure (for zipped files) and brief
description of all data files):

•	costa_rica_richness.csv: Data on species richness of pteridophytes in
protected areas in Costa Rica.

•	cyatheaceae_rbcL.fasta: Aligned rbcL sequences of family Cyatheaceae from the
Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on
GenBank in FASTA format.

•	cyatheaceae_rbcL.tre: Phylogenetic tree of family Cyatheaceae from the
Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences on
GenBank in Newick format.

•	grammitidoideae_rbcL.fasta: Aligned rbcL sequences of subfamily
Grammitidoideae from the Nectandra Cloud Forest Reserve, Costa Rica and all
available sequences on GenBank in FASTA format.

•	grammitidoideae_rbcL.tre: Phylogenetic tree of subfamily Grammitidoideae from
the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences
on GenBank in Newick format.

•	JNG4254.fasta: DNA sequence in FASTA format of rbcL gene from Amauropelta
atrovirens (C. Chr.) Salino & T.E. Almeida (Nitta 2237).

•	nectandra_gb_template.sbt: Plain text file (submit-block object) containing
metadata related to GenBank submission.

•	nectandra_DNA_accessions.csv: DNA accession numbers and specimen accession
numbers of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica.

•	nectandra_rbcL.fasta: Newly generated rbcL sequences of pteridophytes from the
Nectandra Cloud Forest Reserve, Costa Rica in FASTA format.

•	nectandra_rbcL.phy: Aligned rbcL sequences of pteridophytes from the Nectandra
Cloud Forest Reserve, Costa Rica in PHYLIP format.

•	nectandra_rbcL.treefile: Phylogenetic tree of pteridophytes from the Nectandra
Cloud Forest Reserve, Costa Rica in Newick format.

•	nectandra_specimens.csv: Specimen data of pteridophytes from the Nectandra
Cloud Forest Reserve, Costa Rica collected by Joel Nitta.

•	ppgi_taxonomy.csv: Taxonomic system of Pteridophyte Phylogeny Group I (2016)
for pteridophytes at the genus level and above.

•	seqids.txt: Newly assigned GenBank accession numbers for sequences generated
by this project.

Additional related data collected that was not included in the current data
package:

•	rbcL_clean_sporos.fasta: rbcL sequences of pteridophytes of Moorea, French
Polynesia [Nitta et al. (2017); https://doi.org/10.5061/dryad.df59g].

•	ESM1.csv: A list of native fern and lycophyte taxa (species, subspecies and
varieties; 721 taxa total) in Japan [Ebihara and Nitta (2019);
https://doi.org/10.5061/dryad.4362p32].

•	FernGreenListV1.01E.xls: List of Japanese ferns and lycophytes species
including scientific name, endemic status, conservation status, and other
taxonomic data [Ebihara and Nitta (2019);
https://doi.org/10.5061/dryad.4362p32].

•	rbcl_mrbayes.nex: NEXUS file used for phylogenetic analysis of Japanese fern
and lycophyte taxa with MrBayes [Ebihara and Nitta (2019);
https://doi.org/10.5061/dryad.4362p32].

--------------------------

METHODOLOGICAL INFORMATION

--------------------------

Description of methods used for collection/generation of data:

Surveys of pteridophytes (i.e., ferns and lycophytes) were carried out over
three field seasons (January 2008, 2011, and 2013; 37 days total) at the
Nectandra Cloud Forest Reserve, Costa Rica. Most specimens were collected along
trails through the reserve. Epiphytes were collected from fallen trees or tree
branches, or up to 2 m on tree trunks. Permits for collection were obtained from
the Costa Rican government (SINAC No. 04941 and Cites 2014-CR 1006/SJ (#S
1045)). The first set of voucher specimens was deposited at UC, with duplicates
at CR, GH, and TI. Herbarium codes follow Thiers (2020). Leaf tissue was
preserved on silica gel for DNA extraction. Spores of selected taxa were
observed with a standard compound light microscope.

DNA was extracted with the DNEasy plant mini kit following the manufacturer’s
protocol (Qiagen). One species per taxon was sampled for morphologically
distinct taxa, and up to five specimens per taxon for taxa that are more
difficult to identify using standard keys and morphological characters. The
plastid rbcL gene was amplified using PCR primers and thermocycler settings of
Schuettpelz and Pryer (2007). PCR products were purified with Exo-STAR enzyme
(GE Healthcare) and sequenced using the Big Dye Terminator v3.1 Cycle Sequencing
Kit (ThermoFisher) with two internal primers, ESRBCL654R and ESRBCL628F
(Schuettpelz and Pryer 2007) in addition to the amplification primers. The
resulting AB1 trace files were imported into Geneious (Kearse et al. 2012),
assembled into contigs, and the consensus sequences exported in FASTA format. A
multi-sequence alignment was generated using MAFFT (Katoh et al. 2002), and a
phylogenetic tree inferred using IQ-TREE with automatic model selection (Nguyen
et al. 2015). For a small number of genera that were not supported as
monophyletic in the original phylogenetic analysis (Cyathea and Lellingeria),
all available rbcL sequences for closely related taxa (at the family or
subfamily level, respectively) were downloaded from GenBank, aligned in
combination with the newly generated sequences from Nectandra with MAFFT, and a
phylogenetic tree inferred using FastTree on default settings (Price, Dehal, and
Arkin 2009, 2010).

Molecular analysis was performed under permits R-CM-RN-001-2014-OT-CONAGEBIO and
R-CM-RN-002-2017-OT-CONAGEBIO.

For additional methodological details, see Nitta JH, Ebihara A, Smith AR (2020).

--------------------------

DATA-SPECIFIC INFORMATION

--------------------------

costa_rica_richness.csv: Data on species richness of pteridophytes in protected
areas in Costa Rica. Compiled by Joel Nitta based on references in the
“citation” column.

Number of variables: 12

Number of cases/rows: 6

Variable list:

•	name: Abbreviated name of site.

•	full_name: Full name of site.

•	min_el_m: Minimum elevation of site in meters.

•	max_el_m: Maximum elevation of site in meters.

•	area_ha: Area of site in hectares.

•	richness: Number of species occurring at the site.

•	richness_per_ha: Number of species per hectare occurring at the site.

•	holdridge_type: Holdridge (1967) life-zone type.

•	citation: Reference for data.

•	citation_number: Reference number in manuscript.

•	latitude: Latitude in decimal-degrees.

•	longitude: Longitude in decimal-degrees.

Missing data codes: Missing data have no values (nothing entered between commas
in the CSV file).

Specialized formats or other abbreviations used: None.

--------------------------

cyatheaceae_rbcL.fasta: Aligned rbcL sequences of family Cyatheaceae from the
Nectandra Cloud Forest Reserve, Costa Rica and all available sequences on
GenBank in FASTA format. Species from Nectandra in family Dicksoniaceae included
as outgroup. Sequences aligned using MAFFT (Katoh et al. 2002). Numbers after
species names are GenBank accession numbers for sequences downloaded from
GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained
by this study. 323 sequences; 1309 bp; 185 parsimony-informative sites.

--------------------------

cyatheaceae_rbcL.tre: Phylogenetic tree of family Cyatheaceae from the Nectandra
Cloud Forest Reserve, Costa Rica and all available rbcL sequences on GenBank in
Newick format. Species from Nectandra in family Dicksoniaceae included as
outgroup. Tree inferred using FastTree (Price, Dehal, and Arkin 2009, 2010).
Numbers after species names are GenBank accession numbers for sequences
downloaded from GenBank or J. H. Nitta specimen collection numbers for sequences
newly obtained by this study. Numbers at nodes indicate local support values
computed with the Shimodaira–Hasegawa test. 323 tips; 281 internal nodes.

--------------------------

grammitidoideae_rbcL.fasta: Aligned rbcL sequences of subfamily Grammitidoideae
from the Nectandra Cloud Forest Reserve, Costa Rica and all available sequences
on GenBank in FASTA format. Sequences aligned using MAFFT (Katoh et al. 2002).
Species from Nectandra in subfamily Polypodioideae included as outgroup. Numbers
after species names are GenBank accession numbers for sequences downloaded from
GenBank or J. H. Nitta specimen collection numbers for sequences newly obtained
by this study. 751 sequences; 1314 bp; 441 parsimony-informative sites.

--------------------------

grammitidoideae_rbcL.tre: Phylogenetic tree of subfamily Grammitidoideae from
the Nectandra Cloud Forest Reserve, Costa Rica and all available rbcL sequences
on GenBank in Newick format. Species from Nectandra in subfamily Polypodioideae
included as outgroup. Tree inferred using FastTree (Price, Dehal, and Arkin
2009, 2010). Numbers after species names are GenBank accession numbers for
sequences downloaded from GenBank or J. H. Nitta specimen collection numbers for
sequences newly obtained by this study. Numbers at nodes indicate local support
values computed with the Shimodaira–Hasegawa test. 751 tips; 676 internal nodes.

--------------------------

JNG4254.fasta: DNA sequence in FASTA format of rbcL gene from Amauropelta
atrovirens (C. Chr.) Salino & T.E. Almeida (Nitta 2237).

--------------------------

nectandra_DNA_accessions.csv: DNA accession numbers and specimen accession
numbers of pteridophytes from the Nectandra Cloud Forest Reserve, Costa Rica.

Number of variables: 2

Number of cases/rows: 235

Variable list:

•	genomic_id: Genomic accession number assigned during DNA extraction, of the
form “JNG” plus a four-digit number. Unique values.

•	specimen_id: Specimen accession number assigned to each specimen in
nectandra_specimens.csv. Integer (not unique).

Missing data codes: No missing data.

Specialized formats or other abbreviations used: None.

--------------------------

nectandra_gb_template.sbt: Plain text file (submit-block object) containing
metadata related to GenBank submission (author names and contact information).
Generated using template at
https://submit.ncbi.nlm.nih.gov/genbank/template/submission/

Specialized formats or other abbreviations used: Submit-block object format.

--------------------------

nectandra_rbcL.fasta: Newly generated rbcL sequences of pteridophytes from the
Nectandra Cloud Forest Reserve, Costa Rica in FASTA format. All species included
occur at the Nectandra Cloud Forest Reserve, Costa Rica; a small number of
sequences are from specimens collected elsewhere. Sequence names correspond to
‘genomic_id’ in nectandra_DNA_accessions.csv. 186 sequences; shortest sequence
466 bp; longest sequence 1309 bp; mean sequence length 1292 bp. Exported from
Geneious project folder “Clean Sporos Trimmed Genbank Submission” (raw Geneious
project file not included in this dataset).

--------------------------

nectandra_rbcL.phy: Aligned rbcL sequences of pteridophytes from the Nectandra
Cloud Forest Reserve, Costa Rica in PHYLIP format. 191 sequences; 1309 bp; 591
parsimony-informative sites.

--------------------------

nectandra_rbcL.treefile: Phylogenetic tree of pteridophytes from the Nectandra
Cloud Forest Reserve, Costa Rica in Newick format inferred with IQTREE (Nguyen
et al. 2015). Values at each node indicate SH-aLRT support (%) / UFboot support
(%). 191 tips; 189 internal nodes.

--------------------------

nectandra_specimens.csv: Specimen data of pteridophytes from the Nectandra Cloud
Forest Reserve, Costa Rica collected by Joel Nitta. Formatting UTF-8.

Number of variables: 23

Number of cases/rows: 320

Variable list:

•	specimen_id: Unique specimen identification number (integer).

•	specimen: Voucher specimen number.

•	genus: Genus

•	specific_epithet: Specific epithet.

•	infraspecific_rank: Infraspecific rank.

•	infraspecific_name: Infraspecific name.

•	certainty: Degree of taxonomic certainty if not completely certain.

•	species: Species (genus plus specific epithet).

•	taxon: Species plus infraspecific name.

•	scientific_name: Taxon plus its author.

•	author: Author of the species.

•	var_author: Author of the variety.

•	country: Country of origin.

•	locality: General area of collection.

•	site: Specific site where collected.

•	observations: Observations about specimen.

•	elevation: Elevation in m.

•	latitude: Latitude in decimal-degrees.

•	longitude: Longitude in decimal-degrees.

•	collector: Name of collector.

•	other_collectors: Names of other collectors if present.

•	herbaria: Codes of herbaria where voucher specimens are lodged.

•	date_collected: Date collected in YYYY-MM-DD format.

Missing data codes: Missing or non-applicable data have no values (nothing
entered between commas in the CSV file).

Specialized formats or other abbreviations used: Herbaria codes follow Index
Herbariorum (Thiers 2020), except for “Nectandra”, which indicates the private
herbarium at the Nectandra Cloud Forest Reserve.

--------------------------

ppgi_taxonomy.csv: Taxonomic system of Pteridophyte Phylogeny Group I (2016) for
pteridophytes at the genus level and above. Updated with one new genus (Hiya).

Number of variables: 6

Number of cases/rows: 338

Variable list:

•	class: Class.

•	order: Order.

•	suborder: Suborder.

•	family: Family.

•	subfamily: Subfamily.

•	genus: Genus.

Missing data codes: Non-applicable data have no values (nothing entered between
commas in the CSV file).

Specialized formats or other abbreviations used: None.

--------------------------

seqids.txt: Newly assigned GenBank accession numbers for sequences generated by
this project. Received via email from GenBank admin (gb-admin@ncbi.nlm.nih.gov)
2020-10-21. Tab-separated text file without column names.

Number of variables: 2

Number of cases/rows: 186

Variable list:

•	(first column): name of sequence submission file followed by genomic ID number
separated by a space.

•	(second column): GenBank accession number.

Missing data codes: No missing data.

Specialized formats or other abbreviations used: None.

--------------------------

CHANGE LOG

---

2020-11-26

README.txt: Update with DOI for PLOS ONE paper.

---

2020-10-23

costa_rica_richness.csv: Change richness for Nectandra from 176 to 175 after
excluding a single non-native species, Macrothelypteris torresiana. Accordingly,
change richness_per_ha for Nectandra from 1.113924 to 1.107595. Change reference
numbers to reflect updated reference numbers in MS.

cyatheaceae_rbcL.fasta: The previous version was not sequences of Cyatheaceae,
but rather Grammitidoideae by mistake. Change to Grammitidoideae.

cyatheaceae_rbcL.tre: Update tree file after re-running phylogenetic analysis.

grammitidoideae_rbcL.fasta: Change name of sequence
“Mycopteris_taxifolia_Nitta_707” to “Mycopteris_costaricensis_Nitta_707”.

grammitidoideae_rbcL.tre: Update tree file after re-running phylogenetic
analysis.

nectandra_DNA_accessions.csv: Add GenBank accession numbers for sequences newly
generated by this study (those starting with “MW”).

nectandra_gb_template.sbt: Newly added file.

nectandra_rbcL.fasta: Remove two sequences (“JNG3448”, “JNG3479”) that were
excluded from the final analysis.

nectandra_rbcL.phy: Update tree file after re-running phylogenetic analysis.

nectandra_rbcL.treefile: Update tree file after re-running phylogenetic
analysis.

nectandra_specimens.csv: Add columns “author” (author of the species) and
“var_author” (author of the variety). Change Mycopteris taxifolia (L.) Sundue to
Mycopteris costaricensis (Rosenst.) Sundue. Change all instances of “TNS” in
“herbaria” column to “TI”. Change value of “uncertainty” for “Polyphlebium sp1”
(Nitta 123) from “aff” to nothing (NA entry).

README.txt: Update README with these changes.

seqids.txt: Newly added file.

--------------------------

REFERENCES

Ebihara, Atsushi, and Joel H. Nitta. 2019. “An Update and Reassessment of Fern
and Lycophyte Diversity Data in the Japanese Archipelago.” Journal of Plant
Research 132 (6): 723–38. https://doi.org/10.1007/s10265-019-01137-3.

Holdridge, L. R. 1967. Life Zone Ecology. San José, Costa Rica: Tropical Science
Center.

Katoh, Kazutaka, Kazuharu Misawa, Keiichi Kuma, and Takashi Miyata. 2002.
“MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast
Fourier Transform.” Nucleic Acids Research 30 (14): 3059–66.
https://doi.org/10.1093/nar/gkf436.

Kearse, Matthew, Richard Moir, Amy Wilson, Steven Stones-Havas, Matthew Cheung,
Shane Sturrock, Simon Buxton, et al. 2012. “Geneious Basic: An Integrated and
Extendable Desktop Software Platform for the Organization and Analysis of
Sequence Data.” Bioinformatics 28 (12): 1647–9.
https://doi.org/10.1093/bioinformatics/bts199.

Nguyen, Lam-Tung, Heiko A. Schmidt, Arndt von Haeseler, and Bui Quang Minh.
2015. “IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating
Maximum-Likelihood Phylogenies.” Molecular Biology and Evolution 32 (1): 268–74.
https://doi.org/10.1093/molbev/msu300.

Nitta, Joel H., Jean-Yves Meyer, Ravahere Taputuarai, and Charles C Davis. 2017.
“Life Cycle Matters: DNA Barcoding Reveals Contrasting Community Structure
Between Fern Sporophytes and Gametophytes.” Ecological Monographs 87 (2):
278–96. https://doi.org/10.1002/ecm.1246.

Price, Morgan N., Paramvir S. Dehal, and Adam P. Arkin. 2009. “FastTree:
Computing Large Minimum Evolution Trees with Profiles Instead of a Distance
Matrix.” Molecular Biology and Evolution 26 (7): 1641–50.
https://doi.org/10.1093/molbev/msp077.

———. 2010. “FastTree 2 - Approximately Maximum-Likelihood Trees for Large
Alignments.” PLoS ONE 5 (3): e9490.
https://doi.org/10.1371/journal.pone.0009490.

Pteridophyte Phylogeny Group I. 2016. “A Community-Derived Classification for
Extant Lycophytes and Ferns.” Journal of Systematics and Evolution 54 (6):
563–603. https://doi.org/10.1111/jse.12229.

Schuettpelz, Eric, and Kathleen M Pryer. 2007. “Fern Phylogeny Inferred from 400
Leptosporangiate Species and Three Plastid Genes.” Taxon 56 (4). International
Association for Plant Taxonomy: 1037–50. https://doi.org/10.2307/25065903.

Thiers, Barbara. 2020. “Index Herbariorum: A Global Directory of Public Herbaria
and Associated Staff.” NYBG Steere Herbarium. 2020.
http://sweetgum.nybg.org/science/ih/.
