Published February 22, 2022
| Version v1
Dataset
Open
Lemna minor annotation R package (org.Lminor.eg.db) and corresponding files behind the custom built
Authors/Creators
- 1. Fraunhofer Attract Eco'n'OMICs, Fraunhofer Institute for Molecular Biology and Applied Ecology, Schmallenberg, Germany
Description
This public repository containing the following files:
- Custom built annotation R package for the Lemna minor reference genome [ org.Lminor.eg.db.7z ]. The package was built via AnnotationForge using sequence homology of protein coding genes for functional characterisation (Description, PFAMs, GO terms). A combined approach using blastx and EMBL's eggNOG mapper was used for this task. This package is compatible with clusterProfiler for downstream functional enrichment analysis (ORA / GSEA) of L. minor transcriptomic / proteomic data.
For how to install and use this package in your R session, check the R code example below.
- Reference genome, genome annotation (gtf), gene coding sequences (cds) and cds translated peptide sequences (cds.pep) of the duckweed Lemna minor [ Lminor_refGenome_GTF_CDS.7z ].
The reference genome assembly fasta was downloaded from www.lemna.org. Matching GTF annotation file was generated via 'gffread', from the GFF annotation file available here.
- With the cds translated peptide file, a blastp search was performed against a custom plant protein sequence database [ Lminor_ref.org.Db4blastp.7z ]. The custom database was built from the proteomes of well annotated reference plant species. (For details refer to the readme file within the compressed folder)
For more details please refer to our publication in XXX
DOI: XXX
# 1. Download and unzip (7zip format) the org.Lminor.eg.db package.
# 2. Install the package via:
orgDb = "path/to/org.Lminor.eg.db/"
install.packages(orgDb, type="source", repos=NULL)
# 3. Restart R session then load package
require(org.Lminor.eg.db)
require(AnnotationDbi)
# to check for columns and keytypes:
columns(org.Lminor.eg.db)
keytypes(org.Lminor.eg.db)
# query the org.Lminor.eg.db for particular Lemna gene IDs (GID)
gid = keys(org.Lminor.eg.db, keytype="GID")
col = columns(org.Lminor.eg.db)[c(5,17,9,15,1,8,14)]
df = select(org.Lminor.eg.db, keys=gid[1000:1100], columns=col, keytype="GID")
View(df)
### Running overrepresenation analysis in clusterProfiler using the Lminor annotation package ###
# ORA for multiple gene sets via compareCluster()
require(clusterProfiler)
genLs = list(setA = gid[1:40],
setB = gid[100:140],
setC = gid[1000:1040])
res = compareCluster(genLs, fun = "enrichGO", OrgDb = "org.Lminor.eg.db",
keyType = "GID", ont = "BP", universe = gid)
# Compute semantic similiarities among GO terms:
d = GOSemSim::godata('org.Lminor.eg.db', ont="BP", computeIC=FALSE, keytype = "GID")
res = enrichplot::pairwise_termsim(res, method = "Wang", semData = d)
# Rmv GO terms with redudant biological information
resS = simplify(res, .8)
# resort results after pvalues
resS@compareClusterResult = resS@compareClusterResult[order(resS@compareClusterResult$pvalue),]
View(res@compareClusterResult)
# Network plot
emapplot(resS, showCategory = 30)
Notes
Files
Files
(331.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b852b1f20908ca500062d94cb2bce5bf
|
149.9 MB | Download |
|
md5:4fec9336827ce42de04d8f265440a9d5
|
159.4 MB | Download |
|
md5:1f559287b143f5376a677ca8129c2439
|
21.9 MB | Download |