Lemna minor annotation R package (org.Lminor.eg.db) and corresponding files behind the custom built

Reinwald, Hannes; Loll, Alexandra; Schäfers, Christoph; Eilebrecht, Sebastian

doi:10.5281/zenodo.5772211

Published February 22, 2022 | Version v1

Dataset Open

Lemna minor annotation R package (org.Lminor.eg.db) and corresponding files behind the custom built

1. Fraunhofer Attract Eco'n'OMICs, Fraunhofer Institute for Molecular Biology and Applied Ecology, Schmallenberg, Germany

This public repository containing the following files:

Custom built annotation R package for the Lemna minor reference genome [ org.Lminor.eg.db.7z ]. The package was built via AnnotationForge using sequence homology of protein coding genes for functional characterisation (Description, PFAMs, GO terms). A combined approach using blastx and EMBL's eggNOG mapper was used for this task. This package is compatible with clusterProfiler for downstream functional enrichment analysis (ORA / GSEA) of L. minor transcriptomic / proteomic data.

For how to install and use this package in your R session, check the R code example below.
Reference genome, genome annotation (gtf), gene coding sequences (cds) and cds translated peptide sequences (cds.pep) of the duckweed Lemna minor [ Lminor_refGenome_GTF_CDS.7z ].
The reference genome assembly fasta was downloaded from www.lemna.org. Matching GTF annotation file was generated via 'gffread', from the GFF annotation file available here.
With the cds translated peptide file, a blastp search was performed against a custom plant protein sequence database [ Lminor_ref.org.Db4blastp.7z ]. The custom database was built from the proteomes of well annotated reference plant species. (For details refer to the readme file within the compressed folder)

For more details please refer to our publication in XXX
DOI: XXX

# 1. Download and unzip (7zip format) the org.Lminor.eg.db package.
# 2. Install the package via:
orgDb = "path/to/org.Lminor.eg.db/"
install.packages(orgDb, type="source", repos=NULL)

# 3. Restart R session then load package
require(org.Lminor.eg.db)
require(AnnotationDbi)

# to check for columns and keytypes:
columns(org.Lminor.eg.db)
keytypes(org.Lminor.eg.db)

# query the org.Lminor.eg.db for particular Lemna gene IDs (GID)
gid = keys(org.Lminor.eg.db, keytype="GID")
col = columns(org.Lminor.eg.db)[c(5,17,9,15,1,8,14)]
df = select(org.Lminor.eg.db, keys=gid[1000:1100], columns=col, keytype="GID")
View(df)

### Running overrepresenation analysis in clusterProfiler using the Lminor annotation package ###

# ORA for multiple gene sets via compareCluster()
require(clusterProfiler)
genLs = list(setA = gid[1:40],
             setB = gid[100:140],
             setC = gid[1000:1040])
res = compareCluster(genLs, fun = "enrichGO", OrgDb = "org.Lminor.eg.db",
               keyType = "GID", ont = "BP", universe = gid)
# Compute semantic similiarities among GO terms:
d = GOSemSim::godata('org.Lminor.eg.db', ont="BP", computeIC=FALSE, keytype = "GID")
res = enrichplot::pairwise_termsim(res, method = "Wang", semData = d)
# Rmv GO terms with redudant biological information
resS = simplify(res, .8)
# resort results after pvalues
resS@compareClusterResult = resS@compareClusterResult[order(resS@compareClusterResult$pvalue),]
View(res@compareClusterResult)
# Network plot
emapplot(resS, showCategory = 30)

Notes

Link to publication will be added after release.

Files

Files (331.3 MB)

Name	Size	Download all
Lminor_ref.org.Db4blastp.7z md5:b852b1f20908ca500062d94cb2bce5bf	149.9 MB	Download
Lminor_refGenome_GTF_CDS.7z md5:4fec9336827ce42de04d8f265440a9d5	159.4 MB	Download
org.Lminor.eg.db.7z md5:1f559287b143f5376a677ca8129c2439	21.9 MB	Download

	All versions	This version
Views	835	277
Downloads	222	35
Data volume	22.7 GB	4.4 GB

Lemna minor annotation R package (org.Lminor.eg.db) and corresponding files behind the custom built

Authors/Creators

Description

Notes

Files

Files (331.3 MB)