Mitochondrial DNA Sequence Phylogeny of Daucus

Abstract We explored the phylogenetic utility of mitochondrial DNA sequences in Daucus and compared the results with prior phylogenetic results using the same 36 accessions of Daucus (and two additional outgroups) with plastid DNA sequences and with other nuclear results. As in the plastid study we used Illumina HiSeq sequencer to obtain resequencing data of the same accessions of Daucus and outgroups, and analyzed the data with maximum parsimony and maximum likelihood. We obtained data from 47 of 71 total mitochondrial genes but only 17 of these 47 genes recovered major clades that were common in prior plastid and nuclear studies. Our phylogenetic trees of the concatenated data set of 47 genes were moderately resolved, with 100% bootstrap support for most of the external and many of the internal clades, except for the clade of D. carota and its most closely related species D. syrticus. There are areas of hard incongruence with phylogenies using plastid and nuclear data. In agreement with other studies, we conclude that mitochondrial sequences are generally poor phylogenetic markers, at least at the genus level, despite their utility in some other studies.

The genus Daucus is a member of the large and taxonomically complex family Apiaceae with a recent estimate of 466 genera and 3820 species (Plunkett et al. 2019). Identification of members of the Apiaceae to family is generally unambiguous, but generic boundaries are often vague, and recent molecular studies have driven many reassignments of species to other genera. This is most evident in the genus Daucus that has benefitted from molecular systematic studies using a variety of plastid and/or nuclear regions (e.g. Spalik and Downie 2007;Banasiak et al. 2016). Using DNA sequences from nuclear ribosomal internal transcribed spacer regions (ITS) and three plastid markers, Banasiak et al. (2016)  The most recent molecular studies in Daucus used nextgeneration sequencing (NGS) approaches to test these results and add resolution to the clades. Arbizu et al. (2014) investigated the phylogeny of Daucus with 94 nuclear orthologs, producing highly resolved phylogenetic trees, and identified ten ortholog markers that provided a phylogeny nearly identical to the entire dataset. Arbizu et al. (2016) demonstrated the utility of these ten nuclear orthologs in a more focused study of the species boundaries of the D. guttatus complex. Both of these studies resolved two main clades A and B, and within clade A, subclade A 0 composed entirely of species with 2n 5 18 chromosomes (D. carota, D. syrticus), in contrast to all other species with 2n 5 16, 20, 22, and one polyploid of 2n 5 66. In a search for increased taxonomic resolution with a different genome, Spooner et al. (2017) generated resequencing data of 36 accessions of Daucus (and two additional outgroups) representing all major clades of Daucus with entire plastid DNA sequences showing areas of congruence and incongruence with the nuclear data (Arbizu et al. 2014(Arbizu et al. , 2016. Fortuitously, these same resequencing data can distinguish not only plastid and single-to low-copy nuclear sequences, but also mitochondrial sequences (Spooner et al. 2017), leading to the present analysis of these same accessions with mitochondrial DNA.
The mitochondrion is one of three DNA-containing genomes in plants, in addition to plastids and nuclei. The mitochondrion encodes for some of the genes necessary to produce the proteins required in the oxidative phosphorylation reaction that produces ATP. Mitochondrial DNA, like plastid DNA, is predominantly maternally inherited, but with exceptions to paternal and bi-parental inheritance (Breton and Stewart 2015).
Mitochondrial DNA has been comparatively underutilized for phylogenetic analyses in plants relative to the nucleus and plastid because of a number of complicating factors. These include dramatic variation in the size of the genome, even among closely related sister taxa, variation in rates of synonymous substitutions, variation in rates of gene loss accompanied by functional transfer to the nucleus, and in rates of genome rearrangement (Cole et al. 2018). Earlier, Rubinoff and Holland (2005) claimed that mtDNA is a rich source of markers for the study of closely related taxa because of the very low rate of recombination, maternal inheritance, simple genetic structure, reduced effective population size, and relatively rapid rates of evolution. This contrasted with data from Wolfe et al. (1987) showing that mtDNA generally has low mutation rates. An alternative perspective was provided by Galtier et al. (2009) who designated mtDNA as "intrinsically the worst population genetic and phylogenetic molecular marker we can think of," pointing out its very low incidence of recombination, lack of response to selection, and erratic evolutionary rates. With these caveats in mind, the purpose of the present study is to generate a mitochondrial DNA phylogeny of 36 accessions of Daucus (and two additional outgroups) and to compare the results with the same accessions using plastid data, and with prior nuclear phylogenies of Arbizu et al. (2014Arbizu et al. ( , 2016.

Materials and Methods
DNA Extraction and Sequencing-The complete mitochondrial genome of Daucus carota NC_017855.1 can be accessed from NCBI. The remaining accessions we sequenced are listed in Appendix 1 and are the same ones examined in Spooner et al. (2017).
Gene Length and Coverage-The coverage of each assembled gene was determined by mapping the mapped, paired reads back to the *-8.fa using BWA-MEM, and then converted, sorted, indexed (see above), and duplicates marked (Picard MarkDuplicates, standard protocol). GATK (McKenna et al. 2010) and DepthOfCoverage (DePristo et al. 2011;Van der Auwera et al. 2013) determined the coverage for each contig. Gene locations were determined by comparing the list of gene sequences from NC_017855.1 to the *-8.fa with nucmer -maxmatch, followed by showcoords -rd (v. 3.1) (Delcher et al. 2002). The average coverage of each gene was calculated by averaging the coverage of each gene based on the coordinates generated (Supplemental Table 2, Spooner et al. 2020).
Alignment and Annotation of Genes-Mitochondrial gene sequences were aligned using MUSCLE v. 3.8.31b (Edgar 2004). The aligned sequences were manually corrected to minimize gaps using Mesquite v. 3.03 (Maddison and Maddison 2015). Genes with introns were annotated from the NC_017855.1 reference, and added to the final NEXUS file as a CHARSET.
Phylogenetic Analyses-We rooted our trees on Oenanthe virgata, based on Downie et al. (2000). We first performed maximum parsimony (MP) analyses of the entire data set (all taxa and all characters). All MP analyses were conducted in PAUP* v. 4.0a131 (Swofford 2002). Question marks and blank spaces were treated as missing data and gaps, respectively. All characters were treated as unordered and weighted equally (Fitch 1971). The most parsimonious trees were found using a heuristic search (Farris 1970) by generating 100,000 random-addition sequence replicates and one tree held for each replicate. Branch swapping used tree-bisection reconnection (TBR) retaining all most parsimonious trees. Then, we ran a final heuristic search of the most equally parsimonious trees from this analysis using TBR and MULPARS. Bootstrap values (Felsenstein 1985) for the clades were estimated using 1000 replicates with simple addition sequence, setting MAXTREES to 1000. Maximum likelihood (ML) phylogenetic analysis was also obtained for the entire data set with the program RAxML v. 8.0.0 (Stamatakis 2014), using GTR 1 G model and estimating individual alpha-shape parameters, GTR rates, and empirical base frequencies for each individual gene. Using the same program, 1000 nonparametric bootstrap inferences were obtained. Both analyses were conducted via the CIPRES (Miller et al. 2010) portal at the San Diego Supercomputer Center (http://www.phylo.org).

Results
Completeness of Sequencing-Forty-seven of the 71 genes we analyzed recovered complete sequences for all but a few of the 36 accessions of Daucus (and two additional outgroups) without any early stop codons or no stop codons or pseudogenes; we did not analyze the remaining 24 genes that had these problems (Table 1).
Phylogenetic Analyses-Individual bootstrap consensus trees of the 47 genes are presented in Supplemental Fig. 1 (Spooner et al. 2020). These trees vary greatly in resolution (as assessed by polytomies vs. topological structure and high bootstrap support values) from showing no to very little resolution (e.g. cox3, nad2, nad3, rps13), to greatly increased resolution (e.g. cox2, nad4, nad7, rrn18, rrn26) recovering at least clade B and the outgroup (Oenanthe virgata) relative to earlier studies (Appendix 1; Table 1). Maximum parsimony analysis of all 47 genes were performed using the concatenated dataset (Fig. 1A) from 55,730 characters, with 708 variable parsimony uninformative characters, 957 parsimony informative characters, producing 24 equally parsimonious trees of 2121 length, consistency index 0.83, consistency index excluding uninformative characters 0.74, retention index 0.92, and rescaled consistency index 0.76. This concatenated 47 gene mtDNA tree, like the nuclear ortholog data of Arbizu et al. (2014), and the plastid data of Spooner et al. (2017) recovered the same main clades A, B, and outgroup, but with many topological differences within clades A and B. For example, the mtDNA tree failed to separate the 18-chromosome species of clade A 0 (D. carota all subspecies, D. syrticus) from other species in clade A: D. pumilus and D. rouyi. In addition, there were sister group relationship differences in mitochondrial clade A to the plastid (Fig. 1B, C) (Spooner et al. 2017) and nuclear data x nad5 x nad6 nad7 x x x nad9 x orf28 orf31 x x x orf34 orf39 orf40 x orf41 orf42 orf46 x orf47 orf48 orf51 x orf56 orf57 orf58 x orf59 x orf60 x rpl5 x rpl10 rps1 rps3 rps4 rps7 x rps12 rps13 rrn18 x x x rrn26 x x x SYSTEMATIC BOTANY [Volume 45 (Arbizu et al. 2014(Arbizu et al. , 2016 results regarding D. aureus, D. crinitus, D. muricatus, and D. tenuisectus and D. carota. Within clade B, both the plastid and mtDNA trees place D. glochidiatus sister to all other species of clade B, and D. conchitae PI 652385 is in a clade with both accessions of D. bicolor, different from the nuclear data (Arbizu et al. 2014). However, there are topological differences in these two datasets to the mitochondrial data regarding D. pusillus and D. setulosus (Fig. 1B, C).
In an attempt to overcome these differences we examined only the 17 genes as a concatenated dataset that recovered at least clade B and the outgroup, and we then ran a maximum parsimony analysis of these. Maximum parsimony analysis of these 17 genes (Fig. 2) was constructed from 32,462 characters, with 409 variable parsimony uninformative characters, 638 parsimony informative characters, producing 6 equally parsimonious trees of 1345 length, consistency index 0.82, consistency index excluding uninformative characters 0.74, retention index 0.93, and rescaled consistency index 0.77. This tree had lower bootstrap support values than the 47 gene tree and it retained topological differences from the 47 gene tree making it no better in recovering "expected" topological structure relative to the prior plastid and nuclear results.
Maximum likelihood analysis of 47 genes as a concatenated dataset (Fig. 3) produced a topology essentially equal to the MP analysis of the 47 genes except for minor differences in the apex of the tree containing five accessions of D. carota with both trees containing in this area polytomies or low bootstrap support.

Discussion
In summary, our Daucus results support the conclusion of Galtier et al. (2009) that mtDNA is a poor phylogenetic marker, at least at the genus level. We found only 17 of the 47 completely sequenced genes (of 71 total) that recovered clades B and the outgroup, but only four of these (nad7, orf31, rrn18, rrn26) recovered clades A, B, and the outgroup, and none of these four provided well-resolved topological or bootstrap support (Table 1) approaching prior results using the plastid or nuclear genes. The concatenated dataset of 47 genes provided better topological structure but only with considerable input of resources and time in bioinformatic analyses, and with less support and incongruence to nuclear and plastid results. These discordant results are part of a pattern showing up in many phylogenetic studies comparing different genomes (plastid, mitochondrial, nuclear) as well as comparing different regions of the same nuclear genome (e.g. Rokas et al. 2003;Baum 2007). There are a variety of possible mechanisms producing the discordance we highlight here, including hybridization, introgression, or other causes (Wendel and Doyle 1998). Hybridization and introgression are perhaps the most frequently cited causes of this discordance, but they are hard to distinguish from each other. Edelman et al. (2019) developed new techniques to analyze assembled genomes to show that SPOONER ET AL.: MITOCHONDRIAL PHYLOGENY OF DAUCUS 405 2020] hybridization was operative in discordant phylogenies in Heliconius butterflies, and such techniques surely will be used in other groups.
Despite these problems in Daucus, mtDNA has been shown to be useful in addressing important evolutionary and phylogenetic questions at higher taxonomic levels. For example, Beckert et al. (2001) found the nad2 and nad5 genes to be useful for phylogenetic analysis at the ordinal level (but not below the family level) in mosses. Adams et al. (2002) surveyed mitochondrial gene losses in 280 genera of flowering plants and found that the oldest groups of angiosperms contain nearly the same set of genes as their algal ancestors relative to more advanced families that experienced more losses. Mitochondrial DNA has been used in systematic studies at the ordinal level in the Asteraceae (e.g. Wang et al. 2018). Park et al. (2015) constructed a phylogeny of 17 species of Geranium, along with representatives of parasitic plants in other plant families, to infer sources of intracellular gene transfer and horizontal gene transfer in Geranium. In summary, mtDNA appears to provide well-resolved phylogenetic results in some cases, but generally only at higher taxonomic levels, and not well resolved in Daucus.

Author Contributions
DSp conceived of the study and conducted the phylogenetic analyses; HR and DSe performed bioinformatic analyses; all authors wrote and approved of the manuscript.