TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
Description
TrendyGenes Literature Mining
This repository contains the files and code to build the TrendyGenes pipeline described in the paper "TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery" (Serrano Nájera et al. 2021).
Contents
The folder contains the following files:
- PubMed_*.csv.gz: CSV files containing PubMed metadata (titles, abstracts etc.) split into multiple files
- CoCitations*.csv.gz: CSV files containing co-citation networks computed from PubMed
- MeSH2PMID.csv.gz: Map of MeSH terms to PMIDs
- Authorship_Neo4J_complete.csv.gz: Authorship information for PubMed papers
- Disease2PMID_Neo4J_complete.csv.gz: Map of disease terms to PMIDs after disambiguation
- Genes_Neo4J_complete_CCPU.csv.gz: Map of genes to PMIDs after disambiguation
- genes.csv.gz: List of human genes
- diseases.csv.gz: List of MeSH disease terms
- import_command*.txt: Commands to import data into Neo4j graph database
Building the Knowledge Graph
The various CSV files can be imported into a Neo4j graph database to build the knowledge graph containing publications, authors, genes, diseases etc. and their connections as described in the paper.
The import_command*.txt files contain the Neo4J bulk import syntax needed to import the data into Neo4j:
https://neo4j.com/developer/guide-import-csv/
Citation
Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.
License
[MIT]
This summarizes the key files provided and briefly explains how they can be used to build the knowledge graph database for the TrendyGenes pipeline. The citation provides a reference to the original paper.
Files
import_command.txt
Files
(23.2 GB)
Name | Size | Download all |
---|---|---|
md5:d2064a154248c9a4b4341236d1c170cf
|
74.6 MB | Download |
md5:8ae5ad0d87a26b2a1d560914c85778cb
|
632.3 MB | Download |
md5:da60ea517f82630ad07ef849ddda11ca
|
986.9 MB | Download |
md5:30fd9f8c9449001decfc808617977554
|
940.4 MB | Download |
md5:590d65ab29f7b93608aa553f5f132f1f
|
2.7 GB | Download |
md5:1c927ef74a768a3b1658fb1c555ee9b1
|
2.9 GB | Download |
md5:ece18fb08778c00607d3c0af56995493
|
100.1 MB | Download |
md5:72cd2a6a077b28509a58153b4317be1f
|
76.2 kB | Download |
md5:b8292200b7b5249276f8d18f1b665ad8
|
53.1 MB | Download |
md5:e9093b1a12caee020fed19fe46696881
|
152.5 kB | Download |
md5:cfeeffa2ef5d9249b25801cca694f4a1
|
2.0 kB | Preview Download |
md5:cfeeffa2ef5d9249b25801cca694f4a1
|
2.0 kB | Preview Download |
md5:810ca70b29b165c58695d1c010589bfe
|
617.9 kB | Download |
md5:9e0785c18e5edca4b2125f4e83d47066
|
928.7 MB | Download |
md5:180077b1a6a3900462f2c83ee4021419
|
4.1 GB | Download |
md5:7ea73840d1aa2517857bb9f0fe2b2dfc
|
3.4 GB | Download |
md5:fd4e63d477f4cb71c334fede561cece0
|
2.5 GB | Download |
md5:5601527393a83324d2405ce9d1387897
|
2.3 GB | Download |
md5:49234204ae6ba94a4ae19ab8f508c523
|
1.6 GB | Download |
Additional details
References
- Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.