Published June 30, 2022 | Version v1
Dataset Open

The GENOMES UNCOUPLED1 protein has an ancient, highly conserved role but not in retrograde signalling

  • 1. University of Western Australia

Description

The pentatricopeptide repeat protein GENOMES UNCOUPLED1 (GUN1) is required for chloroplast-to-nucleus signalling in response to plastid stress during chloroplast development in Arabidopsis thaliana but its exact molecular function remains unknown. Current data on GUN1 function is limited to Arabidopsis, so we set out to investigate the origin and evolution of the land plant GUN1 proteins. We retrieved GUN1 sequences from 76 phylogenetically diverse land plants and developed a GUN1 sequence profile using hmmbuild (http://hmmer.org). We then used this profile to systematically analyse the presence/absence of GUN1 sequences in transcriptomes from land plants and streptophyte algae. This dataset includes the GUN1 profile we developed, the code we used to analyse the results of screening over 500,000 PPR protein sequences with the profile, and an alignment of the 893 GUN1 sequences that we obtained.

We used this data to show that GUN1 is an ancient protein that is highly conserved across land plants but missing from the Rafflesiaceae that lack chloroplast genomes. Our findings suggest that GUN1 is an ancient protein that evolved within the streptophyte algal ancestors of land plants before the first plants colonised land more than 470 million years ago. 

This dataset also includes transcript count data from an RNA-seq experiment looking at gene expression in liverwort Marchantia polymorpha wild type and Mpgun1 mutant spore samples grown in the presence or absence of spectinomycin. We used this data to show that GUN1 does not act significantly in chloroplast retrograde signalling in the liverwort M. polymorpha. Its primary role is likely to be in chloroplast gene expression and its role in chloroplast retrograde signalling probably evolved more recently.

Notes

For dataset 1, the files included here are:

  • 76_GUN1.alignment.fasta — FASTA format file containing aligned GUN1 sequences from 76 phylogenetically diverse land plants
  • 76_GUN1. conserved_region.alignment.fasta — FASTA format file containing aligned GUN1 sequences (central conserved region only) from 76 phylogenetically diverse land plants; corresponds to the alignment shown in Figure 1.
  • GUN1.hmm — Hidden Markov model profile generated by hmmbuild using the 76 GUN1 sequences in 76_GUN1. conserved_region.alignment.fasta
  • GUN1s.domt — 'domain table' output from hmmsearch listing the GUN1 profile matches found in the 76 GUN1 sequences
  • taxonomy.csv — CSV file with taxonomical metadata on 1KP samples needed to run the analyses in GUN1 classification.ipynb
  • GUN1 classification.ipynb — Jupyter notebook (Julia code) used to identify GUN1 sequences from the 1KP dataset and to generate Fig. S2 and Table S2 
  • 893_GUN1.alignment.fasta — FASTA format file containing 893 aligned GUN1 sequences (includes both those found in the 1KP dataset, which are often partial sequences, and the 76 full-length sequences from 76_GUN1.alignment.fasta)

For dataset 2, the files included here are:

  • Mpdata_spec.csv — table of read counts extracted from Salmon output 
  • mp_sample_table — text file containing experimental design used for identifying differentially expressed genes (supplemental table S3 from the paper)

·       mp_sample_table2 — text file containing experimental design used for making Figure 7a

  • RNAseq_WT_gun1_spores_spec.ipynb — Jupyter notebook (Python code) to reproduce supplemental table S3 from the paper — DEseq2 analysis identifying differentially expressed genes between all genotype and treatment combinations using the salmon quants (Mpdata_spec.csv). Requires Python packages pandas, numpy, matplotlib, seaborn and diffexpr (https://github.com/wckdouglas/diffexpr).
  • Figure_7a.ipynb — Jupyter notebook (Python code) to reproduce Figure 7a from the paper using the salmon quants (Mpdata_spec.csv). Requires Python packages pandas, numpy, matplotlib, seaborn and diffexpr (https://github.com/wckdouglas/diffexpr).

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: FL140100179

Funding provided by: Australian Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000923
Award Number: CE140100008

Funding provided by: Commonwealth Scientific and Industrial Research Organisation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000943
Award Number: CSIRO Synthetic Biology Fellowship

Files

Figure_7a.ipynb

Files (14.8 MB)

Name Size Download all
md5:178a67414d1599317569d18a964f8e8f
122.2 kB Download
md5:6ce8b5b93d82a4e0c49e16ae6059e2ac
68.9 kB Download
md5:b9a2c77a26d465d1486a7e8a2f88659c
2.3 MB Download
md5:d0791fdb8a4392b2ffe587312dc7c78d
167.2 kB Preview Download
md5:f5a6b485ec619893a0c1971777673abd
355.3 kB Download
md5:3bac05a7543bf194d3c52e1653a8f9b9
1.3 MB Preview Download
md5:6117da5981ab9ac1399f267a06337a4a
19.4 kB Download
md5:479429ab64b3f894ddac3acf006be942
339 Bytes Preview Download
md5:97512af3da77e93e802bfda6ebfad339
342 Bytes Preview Download
md5:7eab39323613568539632914276a9d18
898.0 kB Preview Download
md5:e5d08d23dea55d639bd4f158b494852c
9.1 MB Preview Download
md5:bc6bc150e1b29a44680e4fa6bde8b637
2.9 kB Preview Download
md5:ee67c2581d367e96604c4c7c3eb1c2d7
202.9 kB Preview Download
md5:22baef1513711d0b9d02de040d05134d
149.1 kB Preview Download

Additional details

Related works

Is cited by
10.1111/nph.18318 (DOI)