Published August 1, 2021 | Version 1.0
Dataset Open

TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery

  • 1. former University of Dundee

Description

TrendyGenes Literature Mining

This repository contains the files and code to build the TrendyGenes pipeline described in the paper "TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery" (Serrano Nájera et al. 2021).

Contents

The folder contains the following files:

  • PubMed_*.csv.gz: CSV files containing PubMed metadata (titles, abstracts etc.) split into multiple files
  • CoCitations*.csv.gz: CSV files containing co-citation networks computed from PubMed
  • MeSH2PMID.csv.gz: Map of MeSH terms to PMIDs
  • Authorship_Neo4J_complete.csv.gz: Authorship information for PubMed papers
  • Disease2PMID_Neo4J_complete.csv.gz: Map of disease terms to PMIDs after disambiguation
  • Genes_Neo4J_complete_CCPU.csv.gz: Map of genes to PMIDs after disambiguation
  • genes.csv.gz: List of human genes
  • diseases.csv.gz: List of MeSH disease terms
  • import_command*.txt: Commands to import data into Neo4j graph database

Building the Knowledge Graph

The various CSV files can be imported into a Neo4j graph database to build the knowledge graph containing publications, authors, genes, diseases etc. and their connections as described in the paper.

The import_command*.txt files contain the Neo4J bulk import syntax needed to import the data into Neo4j:
https://neo4j.com/developer/guide-import-csv/

Citation

Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.

License

[MIT]

This summarizes the key files provided and briefly explains how they can be used to build the knowledge graph database for the TrendyGenes pipeline. The citation provides a reference to the original paper.

Files

import_command.txt

Files (23.2 GB)

Name Size Download all
md5:d2064a154248c9a4b4341236d1c170cf
74.6 MB Download
md5:8ae5ad0d87a26b2a1d560914c85778cb
632.3 MB Download
md5:da60ea517f82630ad07ef849ddda11ca
986.9 MB Download
md5:30fd9f8c9449001decfc808617977554
940.4 MB Download
md5:590d65ab29f7b93608aa553f5f132f1f
2.7 GB Download
md5:1c927ef74a768a3b1658fb1c555ee9b1
2.9 GB Download
md5:ece18fb08778c00607d3c0af56995493
100.1 MB Download
md5:72cd2a6a077b28509a58153b4317be1f
76.2 kB Download
md5:b8292200b7b5249276f8d18f1b665ad8
53.1 MB Download
md5:e9093b1a12caee020fed19fe46696881
152.5 kB Download
md5:cfeeffa2ef5d9249b25801cca694f4a1
2.0 kB Preview Download
md5:cfeeffa2ef5d9249b25801cca694f4a1
2.0 kB Preview Download
md5:810ca70b29b165c58695d1c010589bfe
617.9 kB Download
md5:9e0785c18e5edca4b2125f4e83d47066
928.7 MB Download
md5:180077b1a6a3900462f2c83ee4021419
4.1 GB Download
md5:7ea73840d1aa2517857bb9f0fe2b2dfc
3.4 GB Download
md5:fd4e63d477f4cb71c334fede561cece0
2.5 GB Download
md5:5601527393a83324d2405ce9d1387897
2.3 GB Download
md5:49234204ae6ba94a4ae19ab8f508c523
1.6 GB Download

Additional details

References

  • Serrano Nájera G, Narganes Carlón D, Crowther DJ. TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery. Scientific Reports. 2021 Aug 3;11(1):15747.