00000nmm##2200000uu#4500 5375482 doi 10.5281/zenodo.5375482 oai:zenodo.org:5375482 CIViCmine Jake Lever Stanford University info:eu-repo/semantics/openAccess Creative Commons Zero v1.0 Universal https://creativecommons.org/publicdomain/zero/1.0/legalcode cc0-1.0 spdx This describes the output files for the <a href="https://github.com/jakelever/civicmine">CIViCmine</a> project. These files are loaded directly by the <a href="http://bionlp.bcgsc.ca/civicmine/">CIViCmine viewer</a>. The code for this viewer is available in the CIViCmine Github repo if you want to run it independently. Each file is a tab-delimited file with a header, no comments and no quoting. You likely want civicmine_collated.tsv if you just want the list of cancer biomarkers. If you want the supporting sentences, look at civicmine_sentences.tsv. You can use the matching_id column to connect the two files. If you want to dig further and are okay with a higher false positive rate, look at civicmine_unfiltered.tsv. civicmine_collated.tsv: This contains the cancer biomarkers with citation counts supporting them. It contains the normalized cancer and gene names along with IDs for HUGO, Entrez Gene and the Disease Ontology. civicmine_sentences.tsv: This contains the supporting sentences for the cancer biomarker in the collated file. Each row is a single supporting sentence for one cancer biomarker. This file contains information on the source publication (e.g. journal, publication date, etc), the actual sentence and the cancer biomarker extracted. civicmine_unfiltered.tsv: This is the raw output of the applyModelsToSentences.py script across all of PubMed, Pubmed Central Open Access and PubMed Central Author Manuscript Collection. It contains every predicted relation with a prediction score above 0.5. So this may contain many false positives. Each row contain information on the publication (e.g. journal, publication date, etc) along with the sentence and the specific cancer biomarker extracted (with HUGO, Entrez Gene and Disease Ontology IDs). This file is further processed to create the other two. Zenodo 2021-09-02 info:eu-repo/semantics/other 20230301191748.0 17158393 md5:17ebdfa26214c0b3f62ea4ced4ce5b72 https://zenodo.org/records/5375482/files/civicmine_collated.tsv 264474722 md5:7b67790e66ef164d77f5b22d1533daf6 https://zenodo.org/records/5375482/files/civicmine_sentences.tsv 669456169 md5:8e0d36ae26f77348c6595dce5c9c32a0 https://zenodo.org/records/5375482/files/civicmine_unfiltered.tsv open 10.5281/zenodo.1472826 isVersionOf doi