Published January 18, 2024 | Version v1
Dataset Open

Two million URL resources and contexts extracted from full text (PubMed Central) and abstracts (PubMed) of biomedical articles

  • 1. ROR icon Nanjing University

Description

Recently, we created a dataset containing two million biomedical online resources (URLs) and their descriptive information from two biomedical literature repositories, PubMed and PubMed Central. The dataset is helpful for biomedical researchers to find nearly any possible biomedical resources in an integrated way, thereby improving the reuse of biomedical resources and the reproducibility of biomedical research.

One of the prominent features of this dataset is that it contains not only the URL, but also the descriptive information of the resource (we call it resource context), which describes how the resource is usually used by previous research as well as the functions of the resource and the type of the resource.

In order to facilitate access to resources, we also provide a retrieval system whereby researchers can locate the resources they need. The system is now under development. In additon, we also are preparing a research paper about this. Interested researchers can contact me. My email address is zlahu@foxmail.com.

 

Data Records:

Name

Data Type

Description

pmid_or_pmcid

string

identidier of the PubMed (PM) article or PubMed Central (PMC) article

pm_or_pmc

string

‘PM’ or ‘PMC’,

url

string

url, usually starting with ‘http’

url_context

string

sentence containing the url

url_start_position

int

position of the url in the sentence

url_domain

string

web domain of the url

pub_year

int

publishing year of the article

 

Examples:

pmid_or_pmcid

pm_or_pmc

url

url_context

url_start_position

url_domain

pub_year

5177603

PMC

http://crispr.mit.edu

EZH2 gRNA (CCGCTTCTGCTGTGCCCTTATC) was designed usinghref:http://crispr.mit.edu id:intref0010(CTMRK).

59

crispr.mit.edu

2016

5438617

PMC

http://firebrowse.org

RNA sequencing data sets and clinical information of kidney PRCC patients were downloaded from the TCGA repository website (href:http://firebrowse.org/).

130

firebrowse.org

2017

5854262

PMC

http://www.southbayrestoration.org

Changes to the available prey assemblage over time, due to large-scale regional habitat restoration (href:http://www.southbayrestoration.org) or ecological shifts in the managed pond habitats, could influence tern foraging.

107

southbayrestoration.org

2018

5854262

PMC

http://www.southbayrestoration.org

Consequently, the changes we observed in relative fish abundance returned to Forster’s tern colonies over the course of our study could be a result of changes in prey selection or may be the result of changes in fish availability because of altered habitat from management associated with the South Bay Salt Pond Restoration Project (href:http://www.southbayrestoration.org).

342

southbayrestoration.org

2018

5257025

PMC

http://www.mediterranee-infection.com/article.php?laref=256&titre=urms-database

The MALDI-TOF MS spectrum of ‘Ndongobacter massiliensis’ strain Marseille-P3170is available online (href:http://www.mediterranee-infection.com/article.php?laref=256&titre=urms-database id:intref0010).

110

mediterranee-infection.com

2016

 

Files

Files (283.6 MB)

Name Size Download all
md5:d0f0ad6a6892d9d8920785e60dabcfa7
283.6 MB Download

Additional details

Dates

Available
2024-01-18