There is a newer version of the record available.

Published June 11, 2023 | Version 1.4
Dataset Open

A Catalog of Natural Channelrhodopsins

Authors/Creators

  • 1. Technion – Israel Institute of Technology

Description

This repository is an appendix to the publication [Rozenberg20](10.1016/j.cub.2020.09.056), "Lateral Gene Transfer of Anion-Conducting Channelrhodopsins between Green Algae and Giant Viruses", that is intended to keep the list of the channelrhodopsins (ChRs) presented there as a constantly updated resource that includes updates to the old records, addition of novel unpublished entries, entries from newly published experiments and corrections to the previously published sets.

Most of the credit belongs to the projects that generated and assembled the data that ultimately contained the ChRs (especially, [MMETSP_Keeling14](10.1371/journal.pbio.1001889) and [1KP_initiative19](10.1038/s41586-019-1693-2)) and to the experimentalists who took the challenge of characterizing them. The catalog currently focuses mostly but not exclusively on proteins from cultured organisms.

Although the format might eventually change, the current version of the catalog is composed of two xlsx spreadsheets:

  • Channelrhodopsins_Original_List.xlsx that represents the original ChR list with additional metadata, but without changes to the sequences and additions and without changing the classification of the ChRs. This file will remain unchanged throughout releases. It contains the following fields:
    • ID - integer index
    • Sequence name - unique name of the sequence
    • Is outgroup (not ChR) - flag indicating whether the sequence is not a ChR (a small number of rhodopsins inherited from [Klapoetke14](10.1038/nmeth.2836) belonged to different families)
    • Symbol - gene symbol, short alias
    • Species - source species/strain
    • Taxonomic group - taxonomic affiliation of the species
    • ChR group - clade name, if empty indicates clades unknown back then
    • Bioproject - NCBI bioproject for the sequence
    • Sequence source - initiative/database/publication the gene nucleotide sequence was part of
    • Activity - activity confirmed for the exact sequence (potentially, a shorter version or in rare cases an allelic variant thereof). In the original list, selectivity (cation/anion) was indicted for all ChRs with demonstrated channel activity even if selectivity could not be or was not assessed experimentally.
    • Reference - publication(s) where the activity was demonstrated
    • Full-length 98%-identity clustering:
      • Representative - representative sequence for the cluster with an identity of 98%
      • Cluster - cluster number
      • Identity - identity % to the reference
    • Rhodopsin-domain 100%-identity clustering:
      • Representative - representative sequence for the cluster with an identity of 100% after the 98%-identity clusters were aligned and trimmed to include only the rhodopsin domain
      • Cluster - cluster number
    • Sequence - full protein sequence for the entry (this might be the complete sequence of the gene, partial sequence of the gene that was not recovered entirely in the assembly or sequence of a specific construct used for expression)
    • Is partial sequence - flag indicating whether the rhodopsin domain is truncated
    • Has indels - flag indicating whether the rhodopsin domain contains indels (unspliced introns in transcripts, gene annotation artifacts)
    • Sequence completeness comment - comments about the nature of the indels
  • Channelrhodopsins_Updated_List.xlsx includes amended ChR sequences, manually added entries and novel expressed or otherwise published proteins. In this release all of the sequences with complete rhodopsin domains are assigned to a family (some families have only provisional names). The spreadsheet is structured differently from the first one and focuses more on the unique complete sequences by separating genes from constructs that are derived from them. The redundancy of the dataset was further reduced by combining identical sequences. Highly similar sequences are treated mostly separately. In some cases they have been downgraded to allelic or splice variants, but this is not yet consistent. There are now three sheets:
    • full_channelrhodopsins - full ChR sequences
    • fragmented_channelrhodopsins - ChR sequences with incomplete rhodopsin domains
    • not_channelrhodopsins - additional proteins that have been mentioned in ChR datasets that are not from the ChR family
  • The sheets have the following fields:
    • ID - integer index corresponding to the record (new records have IDs >875).
    • Sequence name - this is the chosen sequence name for the longest version of the sequence
    • Version - sequence version. Sequences get updated and increment their versions in the following cases:
      • the correct start codon was identified
      • a full sequence for the gene was found in an alternative database
      • a full sequence for the gene has been obtained or corrected from the raw data
      • gene annotation artifacts have been corrected manually based for the genomic sequence
      • unspliced introns were found and removed
    • Reviewed - a somewhat arbitrary flag indicating whether the complete sequence has been manually reviewed
    • Constructs - NCBI protein accessions of constructs overlapping with the gene indicating the overlapping region. Note that construct are allowed to overlap multiple full-length genes even from nominally different species.
    • Symbol - short gene alias
    • Species - species from which the longest representative sequence belongs
    • Taxonomic group
    • ChR group - ChR affiliation based on phylogeny
    • ChR supergroup - ChR supergroup (A: ACRs and green algal CCRs, B: "bacteriorhodopsin-like" CCRs, D: the clade of dinoflagellate and related colpodellid ChRs)
    • Bioproject - bioproject accession(s) for the data from which the sequence is derived
    • Sequence source - one or multiple sources for the sequence, preference is given to earlier released publications
    • Source type - the kind of source(s) (e.g. genome/transcriptome) the sequence comes from
    • First mention - reference for the source where the gene was first indicated as a ChR. Notice that with highly similar sequences this is sometimes tricky.
    • Currents - channeling activity. This differentiates between confirmed and unconfirmed selectivities: square brackets specify the likely but unconfirmed selectivity
    • Cation selectivity - further details on selectivity of cation ChRs
    • Currents reference
    • Absorption maximum, nm
    • Action maximum, nm
    • Spectra references
    • Spectra comment
    • Representative sequence from [Rozenberg20] - representative highly similar sequence (this is inherited from the from the original file: full-length 98% clustering > rhodopsin domain 100% clustering)
    • Sequence - the amino acid sequence
    • Superseded identical sequences - identical sequences included in the same record (with species/strain indicated if different)
    • Splice variants - putative minor splice variants from the same species/strain
    • Allelic variants - putative minor allelic variations from the same species/strain
    • Superseded included sequences - shorter sequences included in the record
    • Superseded incorrect sequences - other sequences that are different due to artifacts
    • Version comments - brief version history
    • Other comments

This is work in progress, use with care. If a formal citation is needed, please cite [Rozenberg20](10.1016/j.cub.2020.09.056), "Lateral Gene Transfer of Anion-Conducting Channelrhodopsins between Green Algae and Giant Viruses" and [Vierock22](10.1126/sciadv.add7729), "WiChR, a highly potassium selective channelrhodopsin for low-light one- and two-photon inhibition of excitable cells".

Files

Files (1.2 MB)

Name Size Download all
md5:ec7be398f0ca65910ccb867b7d4e8382
185.8 kB Download
md5:9587357729ffd5a36862a6caded8c7c6
975.1 kB Download