Published October 20, 2021 | Version v1
Dataset Open

Genomic analysis finds no evidence of canonical eukaryotic DNA processing complexes in a free-living protist

  • 1. Dalhousie University
  • 2. University of Groningen
  • 3. Institute of Parasitology
  • 4. South China Normal University
  • 5. Centro de Investigación en Materiales Avanzados
  • 6. Oncode Institute

Description

Cells replicate and segregate their DNA with precision. Previous studies showed that these regulated cell-cycle processes were present in the last eukaryotic common ancestor and that their core molecular parts are conserved across eukaryotes. However, some metamonad parasites have secondarily lost components of the DNA processing and segregation apparatuses. To clarify the evolutionary history of these systems in these unusual eukaryotes, we generated a genome assembly for the free-living metamonad Carpediemonas membranifera and carried out a comparative genomics analysis. Here, we show that parasitic and free-living metamonads harbor an incomplete set of proteins for processing and segregating DNA. Unexpectedly, Carpediemonas species are further streamlined, lacking the origin recognition complex, Cdc6 and most structural kinetochore subunits. Carpediemonas species are thus the first known eukaryotes that appear to lack this suite of conserved complexes, suggesting that they likely rely on yet-to-be-discovered or alternative mechanisms to carry out these fundamental processes.

Notes

Supplementary sequences:

Sequences corresponding to 'Supplementary Data 4. Spindle assembly, kinetochore and APC/C orthologs in 18 diverse eukaryotic genomes'. Each multifasta sequence file is labeled according to respective aliases reported in the table.

'Orc1_Cdc6_Orc1-Cdc6-like.fasta' file is a multifasta sequence file corresponding to 'Supplementary Data 6. Orc1, Cdc6 and Orc1/Cdc6-likeproteins. Information used in Supplementary Figure 3, panels b and d'

The figures presented here correspond to high resolution versions of the figures presented in the Supplementary Information of the source manuscript.

Supplementary figure legends:

Supplementary Fig. 1: Maximum-likelihood reconstruction of the phylogenetic relationships within the Metamonada clade. An initial reconstruction was carried out in IQ-Tree with the LG+C60+F+Γ model and 1000 ultrafast bootstraps, this was followed by tree inference under LG+PMSF(C60)+F+ Γ model using 100 nonparametric bootstraps; alignment length of 181 genes encompassing 48341 sites. Tree rooted on the ancestral branch of Amorphea. Scale bar shows the inferred number of amino acid substitutions per site. Bootstrap values are represented as shaded dots on each branch, and the values are represented in the following order: SH-aLRT support percentage/aBayes/nonparametric bootstrapping.

Supplementary Fig. 2: Phylogenetic reconstruction of Orc5 proteins inferred with IQ-TREE15 under the LG+ C60+F+ Γ model using 1000 ultrafast bootstraps (SH-aLRT support percentage/aBayes/bootstrap). Value ranges for branches are shown by dots, the red dot indicates that the values apply for each node within the clade. The alignment consists of 60 taxa with 422 sites after trimming. For simplicity, only the domain architecture for metamonads, S. cerevisiae, A. thaliana and H. sapiens are depicted on the tree.

Supplementary Fig. 3: Orc1-6 and Cdc6 proteins. (a) Left: typical domain architecture observed for Orc1-6 and Cdc6 in Saccharomyces cerevisiae. Right: representative domain architecture of metamonad proteins drawn to reflect the most common protein size. If no species name is given, then the depicted domain structure was found in all of the metamonads where present. Numbers on the right of each depiction correspond to the total protein length or its range in the case of metamonads (additional information in Supplementary Data 2). (b) Comparison of Orc1, Cdc6 and Orc1/Cdc6-like protein lengths across 81 eukaryotes encompassing metamonads and non-metamonads protists (source information in Supplementary Data 6). Metamonad proteins are highlighted with green shaded bubbles in the background. (c) Orc1/Cdc6 partial ATPase domain showing Walker A and Walker B motifs including R-finger. Reference species at the top. Multiple sequence alignment was visualized with Jalview72 using the Clustal colouring scheme. (d) Phylogenetic reconstruction of Orc1, Cdc6 and Orc1/Cdc6-like proteins inferred with IQ-TREE15 under the LG+ C10+F+ Γ model using 1000 ultrafast bootstraps (bootstrap value ranges for branches are shown with black and grey dots). The alignment consists of 81 taxa with 367 sites after trimming. Orc1/Cdc6-like proteins do not form a clade with bona fide Orc1 and Cdc6 proteins making it impossible to definitively establish whether or not they are orthologs.

Supplementary Fig. 4: The distribution of core molecular systems of the replisome, double strand break repair and endonucleases in nucleomorph genomes of cryptophyte and chlorarachniophytes.

Supplementary Fig. 5: The distribution of core molecular systems of DNA repair across eukaryotic diversity. A schematic global eukaryote phylogeny is shown on the left with classification of the major metamonad lineages indicated. Double strand break repair and endonuclease sets. ***Carpediemonas-Like Organisms. '?' is used in cases where correct orthology was difficult to establish, so the protein name appears with the suffix '-like' in tables. 


Supplementary Fig. 6: Presence/absence diagram of LECA kinetochore components in eukaryotes, with a greater sampling of metamonads, including C. membranifera and C. frisia. Left: matrix of presences (coloured) and absences (light grey) of kinetochore, SAC and APC/C proteins that were present in LECA. On top: names of the different subunits; single letters (A-X) indicate Centromere protein A-X (e.g., CenpA) and numbers, APC/C subunit 1-15 (e.g., Apc1). E2S and E2C, refer to E2 ubiquitin conjugases S and C, respectively. Colour schemes correspond to the kinetochore overview figure on the right and to those used in Figure 3. Right: cartoon of the components of the kinetochore, SAC signalling, the APC/C and its substrates (Cyclin A/B) in LECA and Carpediemonas species to indicate the loss of components (light grey shading). Blue lines indicate the presence of proteins that are part of the MCC. Asterisk: Apc10 has three paralogs in C. membranifera and two in C. frisia. One is the canonical Apc10, the two others are fused to a BTB-Kelch protein of which its closest homologs is a likely adapter for the E3 ubiquitin ligase Cullin 3.

Supplementary Fig. 7: Carpediemonas harbours three different types of Histone H3 proteins, a centromere-specific variant (CenpA). Multiple sequence alignment of different Histone H3 variants in eukaryotes and metamonads, including the secondary structure of canonical H3 in humans (pdb: 6ESF_A). CenpA orthologs are characterized by extended amino and carboxy termini and a large L1 loop. Red names in the CenpA panel indicate for which species centromere/kinetochore localization has been confirmed. In addition to CenpA and canonical Histone H3-variants, multiple eukaryotes, including C. membranifera and C. frisia, harbour other divergent H3 variants. Such divergent variants make the annotation of Histone H3 homologs ambiguous (see Asterisks; incomplete sequences). Multiple sequence alignments were visualized with Jalview72, using the Clustal colour scheme. Asterisks indicate two potential CenpA candidates in T. vaginalis

Supplementary Fig. 8: Likely presence of SAC signalling in Carpediemonas. (a) Short linear motifs form the basis of SAC signalling. During prometaphase, unattached kinetochores catalyse the production of inhibitor of the cell cycle machinery, a phenomenon known as the SAC73. (I) The main protein scaffold of SAC signalling is the kinase MadBub (paralogs Mad3/Bub1 exist in eukaryotes), which consist of many short linear motifs (SLiMs) that mediate the interaction of SAC components and the APC/C (light blue)74,75. MadBub itself is recruited to the kinetochore through interaction with Bub3 (GLEBS), which on its turn binds repeated phosphomotifs in Knl176-78. The CDI or CMI motif aids to recruit Mad179-81, which has a Mad2-interaction Motif (MIM) that mediated the kinetochore-dependent conversion of open-Mad2 to Mad2 in a closed conformation82. (II) Mad2, MadBub, Bub3 and 2x Cdc20 (APC/C co-activator) form the mitotic checkpoint complex (MCC) and block the APC/C75,83,84. MadBub contains 3 different APC/C degrons (D-box, KEN-box and ABBA motif)74 that direct its interaction with 2x Cdc20s and effectively make the MCC a pseudo substrate of the APC/C. (III) Increasing amounts of kinetochore-microtubule attachments silence the production of the MCC at kinetochores and the APC/C is released. Cdc20 now presents its substrates Cyclin A and Cyclin B (some eukaryotes have other substrates as well, but they are not universally conserved) for ubiquitination and subsequent degradation through recognition of a Dbox motif85. Chromosome segregation will now be initiated (anaphase). (b) Presence/absence matrix of motifs involved in SAC signalling in a selection of Eukaryotes and Metamonads, including C. membranifera and C. frisia. Colours correspond to the motifs in panel a, light grey indicates motif loss. N signifies the number of MadBub homologs that are present in each species. 'Incomplete' points to sequences that were found to be incomplete due to gaps in the genome assembly. Question marks indicate the uncertainty in the presence of that particular motif. Although Metamonads have all four MCC components (Mad2, Bub3, MadBub and Cdc20), most homologs do not contain the motifs to elicit a canonical SAC signalling and it is therefore likely that they do not have a SAC response. Exceptions are C membranifera, C. frisia and Kipferlia bialata. They retained the N-terminal KEN-boxes and one ABBA motif, which are involved in the binding of two Cdc20s and a Mad2-interaction motif (MIM) in Mad1 and Cdc20. c) Multiple sequence alignments of the motifs from panel A and B. Coloured motif boxes correspond to panel a and b. Multiple sequence alignments were visualized with Jalview72, using the Clustal colouring scheme. Asterisks indicate ambiguous motifs in Carpediemonas membranifera.

Supplementary Fig. 9: Histogram showing the frequency distribution of single nucleotide variants in the genome of C. membranifera. Diagram showing the typical distribution of a haploid genome.

Supplementary Fig. 10: Maximum likelihood reconstruction of Endonuclease IV. The unrooted tree contains eukaryotic and prokaryotic Endo IV sequences, showing Carpediemonas sequences emerging within bacterial proteins. The tree was inferred with IQ-TREE under the LG+I+C20 model with 1000 ultrafast bootstraps; alignment length was 276. Scale bar shows the inferred number of amino acid substitutions per site.

Supplementary Fig. 11: Maximum likelihood reconstruction of RarA. The unrooted tree contains eukaryotic and prokaryotic sequences, showing Carpediemonas sequences emerging within bacterial proteins. The tree was inferred with IQ-TREE under the LG+I+C20 model with 1000 ultrafast bootstraps; alignment length was 414. Scale bar shows the inferred number of amino acid substitutions per site.

Supplementary Fig. 12: Maximum likelihood reconstruction of RNAse H1. Carpediemonas RarA-like proteins emerge within bacterial proteins. Parabasalia and Diplomonada proteins highlighting the proteins have been acquired in different events. The tree was inferred with IQ-TREE under the LG+I+G+C20 model with 1000 ultrafast bootstraps; alignment length was 149. Scale bar shows the inferred number of amino acid substitutions per site.

Funding provided by: Canadian Institutes of Health Research
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000024
Award Number: FRN-142349

Funding provided by: Natural Sciences and Engineering Research Council of Canada
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000038
Award Number: RGPIN 05871-2014

Files

README.txt

Files (505.5 kB)

Name Size Download all
md5:ce1be80b3e1bd627a34c95252e56c94f
22.5 kB Download
md5:06f990338f5c48f076d9aeb068933f42
5.2 kB Download
md5:2c727de6a3c97b36bdfdbc198a53936d
1.6 kB Download
md5:3fad3f18ff1013cf8e1f6013ee2cd58d
495 Bytes Download
md5:2e5cf164c5346617c3b13ed02f95ffb6
772 Bytes Download
md5:0200bd4a1d6958d13399e926c005265b
879 Bytes Download
md5:8a2b4cee823f68385d2a6e8bf75274f2
10.4 kB Download
md5:35de42d8b4baa35784e72330e8409cfc
10.4 kB Download
md5:833ed60a700aef106f3ef8c6abd285b5
7.5 kB Download
md5:30ff867cccc2a6bf43c0f31cc21d8575
5.9 kB Download
md5:36019afb85bf13de8629948abbee0326
8.4 kB Download
md5:e7d9b106016018381d89cdb29b6e2a62
5.8 kB Download
md5:d00659d8630b22c26edffd4a06473c84
10.7 kB Download
md5:97d647718478b3dba0303ef5c76de9ba
1.5 kB Download
md5:c70c7a026c88caf294c3281c11dc1382
13.6 kB Download
md5:0e988426e14021078bd03565f1a3798e
3.4 kB Download
md5:5f071e43b1533d6d1377b6e23ca3e029
7.8 kB Download
md5:b803d1a9a10ce3e806d5ad37e84b39f2
4.2 kB Download
md5:f25710fa8dedfa6c8d0db7a88d74f422
3.3 kB Download
md5:c1340dc38cdb856b17734e27f4c4b12c
18.5 kB Download
md5:2e826b27ffaa87abc8fedd6fa60713fc
5.8 kB Download
md5:83bba8240baf0a75a974bfb5410dac47
4.4 kB Download
md5:904f1e050f8bdb45f92f1050138ae8f5
7.5 kB Download
md5:bd0ff7b57f1e6b70dc8cd8acf7bb496d
25.2 kB Download
md5:07f95f20be7efa6c64f1f501004ca798
1.3 kB Download
md5:de64f84f5d2316ea43dc9629863c7b34
5.1 kB Download
md5:d6374316707e5ed72e2288d57fcace9b
1.6 kB Download
md5:75f61e87a7ebf52c26e79d9f6a0641ac
2.2 kB Download
md5:975e06043716add1705a09e7a9b54c0b
800 Bytes Download
md5:a76a747544e9401ba40fc75209ce9898
2.0 kB Download
md5:adaaecca0495ff99494625f1440ffb78
2.0 kB Download
md5:087e3a1c81b719c18dd42c8c7c89bdb9
1.7 kB Download
md5:d5cd42f2b71cf8c4a408412edf028edd
1.1 kB Download
md5:b141ccf0fd72b4f65ab9d5ac4bd7baf7
2.1 kB Download
md5:52ce0bf74699e9a267e4c4ec28a7e77c
2.0 kB Download
md5:a1688bd6a69988c2f7a7572ac16d9085
1.3 kB Download
md5:72fff507093804b8a893c7fc7fc41f64
306 Bytes Download
md5:89d5a1c1ec4592ed1c5e2489a6428529
1.2 kB Download
md5:812e862b33e0a7a36774b0c0a506f5d9
2.1 kB Download
md5:f965ce48e4a663baf8582d9752a37d1e
23.5 kB Download
md5:314d88c69fe3a976d03cd02f9a50ea2c
24.1 kB Download
md5:922445ad7b45eb9cd9283bcca2fb0bdc
367 Bytes Download
md5:0f83cd75cfa393b35aa8baf323b074f1
319 Bytes Download
md5:89b8264bca21f84dc3cbac219e5f9805
488 Bytes Download
md5:75e17a14af5f6d3b64f4a837c185c23f
170 Bytes Download
md5:558e368900f90005b4c341ee76ae537f
1.2 kB Download
md5:f916fd134d1eab25d30de216d21ae735
3.4 kB Download
md5:be7f924174f4bab421c422a9fb6bbf72
390 Bytes Download
md5:102766ad9f14c186f32fd0b4b12952ef
83 Bytes Download
md5:091f93dac6f9b848d0b6e763c42848bd
15.6 kB Download
md5:be7fa5fb06cd6c0d3b4dbd41ee8b89f5
11.4 kB Download
md5:c95d92b8bd89fac91e92755201134ccb
7.7 kB Download
md5:a4cae6285103e70e92308e575f65cfc3
4.7 kB Download
md5:989ea30941dbb869b86b7f747199d046
18.1 kB Download
md5:33448fb4b65d9045bd3c6cc25e114a32
2.5 kB Download
md5:870a3fb49737b3dba51ef043e055228f
13.9 kB Download
md5:55c54f48e54c53a11cdbceb0aea9a87c
7.3 kB Download
md5:021cc6710793b7f38b78bfd9d0b1a9c3
252 Bytes Download
md5:3552ea88e8f996a5f62abdf918761a2d
167 Bytes Download
md5:f2020758d228d4c988537484e23710d5
1.6 kB Download
md5:0b32bddb9040d8bcb1e27b30567cacd5
1.6 kB Download
md5:06a0015cbd944bb059272761f8a280e6
8.4 kB Download
md5:cec32a8932bbb32d27995782e573d473
49.7 kB Download
md5:964cc07da7f0bc8d4274a6331dfa37b3
2.4 kB Download
md5:ec064659282696bb47650101ea0a9e08
5.2 kB Download
Plk
md5:5f0dac833140185102f0bad9dc025fdd
25.6 kB Download
md5:69f07dc2fcc11e1c694e50789fa975a0
4.7 kB Preview Download
Rod
md5:53aba69ffbeb56564c1135a66eafd083
8.7 kB Download
Sgo
md5:00f94d5bf0390cccca9dddccba4e051a
3.5 kB Download
md5:0a62cdc750efe280159699a208719ebb
1.5 kB Download
md5:3c0a5e87c3d924234af74ed424a7a19e
1.6 kB Download
md5:c246b36d9f59b3617cc02513fb165590
2.4 kB Download
md5:a3e909307fa09e48c5b16e9e1d5742ae
179 Bytes Download
md5:379c9b4b989e68945c06ad807c05e443
5.5 kB Download
md5:ddf1c11512b969fe7393be818b12abc9
4.9 kB Download
md5:44a68c2de8a904b0810b6bf1e036d717
1.1 kB Download
md5:752b9340eb86b4fad4752efc5c319cfc
2.2 kB Download
md5:66897643d63b82b132e6d0de02f739f1
9.2 kB Download
md5:a476779221b821cd0af82d18a9f9a171
1.6 kB Download
md5:64ad4ac6cdc3e08e4a93c0f8f0a69132
2.9 kB Download
md5:134cd2291f8f8b81a1f9422625f9385c
7.1 kB Download
md5:b177886bc0db6cacdf4faecc7f632d98
1.4 kB Download
md5:703c6b0d9761b139ca1c8c59baac5a60
4.3 kB Download

Additional details

Related works

Is source of
10.5281/zenodo.5165246 (DOI)