Published March 16, 2022 | Version v1
Dataset Open

mRNA editing analysis of Doryteuthis pealeii

  • 1. University of California Berkeley
  • 2. Marine Biological Laboratory
  • 3. University of California, Berkeley
  • 4. University of Vienna
  • 5. Hiroshima University
  • 6. University of Chicago
  • 7. HudsonAlpha Institute for Biotechnology

Description

Cephalopods are known for their large nervous systems, complex behaviors and morphological innovations. To investigate the genomic underpinnings of these features, we assembled the chromosomes of the Boston market squid Doryteuthis (Loligo) pealeii and the California two-spot octopus, Octopus bimaculoides, and compared them with those of the Hawaiian bobtail squid, Euprymna scolopes. The genomes of the soft-bodied (coleoid) cephalopods are highly rearranged relative to other extant molluscs, indicating an intense, early burst of genome restructuring. The coleoid genomes feature multi-megabase, tandem arrays of genes associated with brain development and cephalopod-specific innovations. We find that another coleoid hallmark, extensive A-to-I mRNA editing, displays two fundamentally distinct patterns: one exclusive to the nervous system and concentrated in genic sequences, the other widespread and directed toward repetitive elements. We conclude that coleoid novelty is mediated in part by substantial genome reorganization, gene family expansion, and tissue-dependent mRNA editing.

Notes

Raw transcriptome sequence data used in the current study (SRA as Bioproject PRJNA641326) is identified by 'Albertin' tag on the file name (for reference, alternate, depth, and annotation files). Analysis of a separate Doryteuthis pealeii specimen[1] following our analysis pipeline has the *Alon* identifier. 

All files presented on this repository are tabulated.

 

File Description
Source_Editing_Albertin.tab Main RNA editing annotation table used to create all the manuscript figures (edit sites overlapping genic features that do not overlap with genomic variants). The columns integrate the annotation from PFAM, TMHMM, Repeat overlap. Each row corresponds to a unique edit site identified by {Chr:Position}. Find the column description: README_Source_Editing_Albertin.txt
   
ADAR_Albertin.tab, ADAR_Alon.tab Annotation of ADAR target sites on the Doryteuthis pealeii reference genome.
EF_Albertin.tab, EF_Alon.tab Edit frequencies. The A>G edit sites (rows) and individual tissues samples (columns). Refer to README_Tissue_sampleID.tab
Ref_Albertin.tab, Ref_Alon.tab Number of reads with reference nucleotide 'A' for all edit sites (rows) and individual tissues samples (columns). Refer to README_Tissue_sampleID.tab
Alt_Albertin.tab, Alt_Alon.tab Number of reads with alternate/edited nucleotide 'G' for all edit sites (rows) and individual tissues samples (columns). Refer to README_Tissue_sampleID.tab
DP_Albertin.tab, DP_Alon.tab Sum of reads with 'A' or 'G' for a given edit site (rows) and individual tissues samples (columns). Refer to README_Tissue_sampleID.tab
Weighted_values_Albertin.tab Weighted average edit frequencies and read depth for Neural and Non-neural samples for each edit target (rows).
Counts_edits_by_Etype_Repeat_Not_Edited.tab Number of Adenosines in genic regions, categorized by genomic feature (3', 5', Intron, Rec/recoding), SJ/splice junction, and Syn/synonymous) and subcategorized by the presence of repeats (True: Overlapping repeat; False: Non-overlapping repeat). The gene orientation strand orientation was is taken into account for these calculations. The numbers were obtained from the genomic variant calls.
Edit_prot_pos_intersect.bed bedtools intersect output from Edited aminoacids and PFAM domains. Columns = ['GeneID', 'Prot_pos', 'Prot_pos_v2', 'Gene_ID2', 'PFAM_start', 'PFAM_end', 'PFAM']. The first three columns correspond to the positions of edited amino acids. The remaining columns are the overlap with the PFAM table.
Edit_prot_pos_TMHMM_intersect.bed bedtools intersect output from Edited aminoacids and TMHMMv.2 transmembrane annotation. Columns = ['GeneID', 'Prot_pos', 'Prot_pos_v2', 'Gene_ID2', 'TMHMM_start', 'TMHMM_end', 'Location']. Note: The duplicate Prot_pos variable corresponds to the position of edited amino acid on the protein. The term 'Location' refers to the location of the amino acid with respect to the TMHMM annotation (outside, inside, or TMhelix protein segments)
Edited_vs_Non_edited_DpalHbl.gt.tab Genotypes of D. opalescens and H. bleekeri at sites where Doryteuthis pealeii is 'Adenosine' on coding regions. Genotypes derive from high-confidence genomic calls obtained from genomic shotgun sequence reads against the reference D. pealeii genome.
Dpealeiiv2.gene_exons.filt_Chr.gff3 GFF3 file of filtered exons
DpealeiiV2.filtered.annot.txt Gene description file of filtered genes
featureCount.tophat.M.primary.ignoreDup.filtered_tpm.txt TPM expression table obtained using tophat

 

Citation:

[1] Alon, S. et al. The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing. eLife https://elifesciences.org/articles/05198 (2015) doi:10.7554/eLife.05198.

Funding provided by: Austrian Science Fund
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100002428
Award Number: P30686-B29

Funding provided by: NSF
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100008982
Award Number: IOS-1354898

Funding provided by: National Institutes of Health
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000002
Award Number: 5UL1TR002389-02

Funding provided by: National Institutes of Health
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000002
Award Number: UL1 TR000430

Funding provided by: Grass Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100001654
Award Number:

Funding provided by: Marine Biological Laboratory
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100006049
Award Number: Hibbitt Early Career Fellowship

Funding provided by: Marine Biological Laboratory
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100006049
Award Number: Whitman Fellowship

Funding provided by: Chan-Zuckerberg BioHub*
Crossref Funder Registry ID:
Award Number:

Files

DpealeiiV2.filtered.annot.txt

Files (668.2 MB)

Name Size Download all
md5:a19a49b4beec38861be4e07b68351b9a
47.0 MB Download
md5:508712857e5b8812273cf9941d88144b
30.1 MB Download
md5:7ac86bd32f7882b680e907e8d95a333a
14.7 MB Download
md5:9823730ca841a979454f1d34c36a997f
5.3 MB Download
md5:379e80aa1ae8e1dd099102f97380630a
125 Bytes Download
md5:4c3fb8924f6c3283c1a5441d565581f6
20.5 MB Download
md5:48fd0838b8035b7335523c42f52dfb14
6.5 MB Download
md5:43c92a5cbb5e3a64f44fdfdd0988cc2c
18.4 MB Preview Download
md5:9107531a5a67002182a0b4ec0f1aac48
15.2 MB Download
md5:a0c8a44ae4b137b5d3517cedf7e1381d
32.1 MB Download
md5:ee7577e89003d439f19ceb4cdceb0c90
273.5 MB Download
md5:f7e0cf7c812af07a28aaa0d603088897
50.8 MB Download
md5:2091f02ce309abf7c294e87210454ac6
15.7 MB Download
md5:a783b0ae007683cbf3514c99d01f958c
17.1 MB Preview Download
md5:6f2b3c8b64b1de4985b3f9e46fec7ab1
4.0 kB Preview Download
md5:6e20ff030a988ba42a4a6a85d833da4a
945 Bytes Download
md5:e63994904d6f17000d5922d08bd6bae2
20.3 MB Download
md5:6e40941257818049af87a3583aea1b96
6.5 MB Download
md5:76251b4d57327df64ac1ad89b9554949
78.7 MB Download
md5:4d302a4292b88473e34685e5059878c2
15.9 MB Download