Reducing long-branch effects in multi-protein data uncovers a close relationship between Alveolata and Rhizaria
Baldauf, Sandra L.
Rhizaria is a major eukaryotic group of tremendous diversity, including amoebae with spectacular skele- tons or tests (Radiolaria and Foraminifera), plasmodial parasites (Plasmodiophorida) and secondary endosymbionts (Chlorarachniophyta). Current phylogeny places Rhizaria in an unresolved trichotomy with Stramenopila and Alveolata (supergroup ''SAR"). We assembled a 147-protein data set with extensive rhizarian coverage (M147), including the first transcriptomic data for a euglyphid amoeba. Phylogenetic pre-screening of individual proteins indicated potential problems with radically misplaced sequences due either to contamination of rhizarian sequences amplified from wild collected material and/or extremely long branches (xLBs). Therefore, two data subsets were extracted containing either all proteins consistently recovering rhizarian monophyly (M34) or excluding all proteins with P3 xLBs (defined as >= 2x the average terminal branch length for the tree). Phylogenetic analyses of M147 give conflicting results depending on the outgroup and method of analysis but strongly support an exclusive Rhizaria + Alveolata (R + A) clade with both data subsets (M34 and M37) regardless of phylogenetic method used. Support for an R + A clade is most consistent when a close outgroup is used and decreases with more distant outgroups, suggesting that support for alternative SAR topologies may reflect a long-branch attraction artifact. A survey of xLB distribution among taxa and protein functional category indicates that small ''informational" proteins in particular have highly variable evolutionary rates with no consistent pattern among taxa.