There is a newer version of the record available.

Published February 22, 2022 | Version v1
Dataset Open

Lemna minor annotation R package (org.Lminor.eg.db) and corresponding files behind the custom built

  • 1. Fraunhofer Attract Eco'n'OMICs, Fraunhofer Institute for Molecular Biology and Applied Ecology, Schmallenberg, Germany

Description

This public repository containing the following files:

  1. Custom built annotation R package for the Lemna minor reference genome [ org.Lminor.eg.db.7z ]. The package was built via AnnotationForge using sequence homology of protein coding genes for functional characterisation (Description, PFAMs, GO terms). A combined approach using blastx and EMBL's eggNOG mapper was used for this task. This package is compatible with clusterProfiler for downstream functional enrichment analysis (ORA / GSEA) of L. minor transcriptomic / proteomic data.

    For how to install and use this package in your R session, check the R code example below.
     
  2. Reference genome, genome annotation (gtf), gene coding sequences (cds) and cds translated peptide sequences (cds.pep) of the duckweed Lemna minor [ Lminor_refGenome_GTF_CDS.7z ].
    The reference genome assembly fasta was downloaded from www.lemna.org. Matching GTF annotation file was generated via 'gffread', from the GFF annotation file available here.
     
  3. With the cds translated peptide file, a blastp search was performed against a custom plant protein sequence database [ Lminor_ref.org.Db4blastp.7z ]. The custom database was built from the proteomes of well annotated reference plant species. (For details refer to the readme file within the compressed folder)

For more details please refer to our publication in XXX
DOI: XXX

# 1. Download and unzip (7zip format) the org.Lminor.eg.db package.
# 2. Install the package via:
orgDb = "path/to/org.Lminor.eg.db/"
install.packages(orgDb, type="source", repos=NULL)

# 3. Restart R session then load package
require(org.Lminor.eg.db)
require(AnnotationDbi)

# to check for columns and keytypes:
columns(org.Lminor.eg.db)
keytypes(org.Lminor.eg.db)

# query the org.Lminor.eg.db for particular Lemna gene IDs (GID)
gid = keys(org.Lminor.eg.db, keytype="GID")
col = columns(org.Lminor.eg.db)[c(5,17,9,15,1,8,14)]
df = select(org.Lminor.eg.db, keys=gid[1000:1100], columns=col, keytype="GID")
View(df)

### Running overrepresenation analysis in clusterProfiler using the Lminor annotation package ###

# ORA for multiple gene sets via compareCluster()
require(clusterProfiler)
genLs = list(setA = gid[1:40],
             setB = gid[100:140],
             setC = gid[1000:1040])
res = compareCluster(genLs, fun = "enrichGO", OrgDb = "org.Lminor.eg.db",
               keyType = "GID", ont = "BP", universe = gid)
# Compute semantic similiarities among GO terms:
d = GOSemSim::godata('org.Lminor.eg.db', ont="BP", computeIC=FALSE, keytype = "GID")
res = enrichplot::pairwise_termsim(res, method = "Wang", semData = d)
# Rmv GO terms with redudant biological information
resS = simplify(res, .8)
# resort results after pvalues
resS@compareClusterResult = resS@compareClusterResult[order(resS@compareClusterResult$pvalue),]
View(res@compareClusterResult)
# Network plot
emapplot(resS, showCategory = 30)

Notes

Link to publication will be added after release.

Files

Files (331.3 MB)

Name Size Download all
md5:b852b1f20908ca500062d94cb2bce5bf
149.9 MB Download
md5:4fec9336827ce42de04d8f265440a9d5
159.4 MB Download
md5:1f559287b143f5376a677ca8129c2439
21.9 MB Download