Two million URL resources and contexts extracted from full text (PubMed Central) and abstracts (PubMed) of biomedical articles
Description
Recently, we created a dataset containing two million biomedical online resources (URLs) and their descriptive information from two biomedical literature repositories, PubMed and PubMed Central. The dataset is helpful for biomedical researchers to find nearly any possible biomedical resources in an integrated way, thereby improving the reuse of biomedical resources and the reproducibility of biomedical research.
One of the prominent features of this dataset is that it contains not only the URL, but also the descriptive information of the resource (we call it resource context), which describes how the resource is usually used by previous research as well as the functions of the resource and the type of the resource.
In order to facilitate access to resources, we also provide a retrieval system whereby researchers can locate the resources they need. The system is now under development. In additon, we also are preparing a research paper about this. Interested researchers can contact me. My email address is zlahu@foxmail.com.
Data Records:
Name |
Data Type |
Description |
pmid_or_pmcid |
string |
identidier of the PubMed (PM) article or PubMed Central (PMC) article |
pm_or_pmc |
string |
‘PM’ or ‘PMC’, |
url |
string |
url, usually starting with ‘http’ |
url_context |
string |
sentence containing the url |
url_start_position |
int |
position of the url in the sentence |
url_domain |
string |
web domain of the url |
pub_year |
int |
publishing year of the article |
Examples:
pmid_or_pmcid |
pm_or_pmc |
url |
url_context |
url_start_position |
url_domain |
pub_year |
5177603 |
PMC |
http://crispr.mit.edu |
EZH2 gRNA (CCGCTTCTGCTGTGCCCTTATC) was designed usinghref:http://crispr.mit.edu id:intref0010(CTMRK). |
59 |
crispr.mit.edu |
2016 |
5438617 |
PMC |
http://firebrowse.org |
RNA sequencing data sets and clinical information of kidney PRCC patients were downloaded from the TCGA repository website (href:http://firebrowse.org/). |
130 |
firebrowse.org |
2017 |
5854262 |
PMC |
http://www.southbayrestoration.org |
Changes to the available prey assemblage over time, due to large-scale regional habitat restoration (href:http://www.southbayrestoration.org) or ecological shifts in the managed pond habitats, could influence tern foraging. |
107 |
southbayrestoration.org |
2018 |
5854262 |
PMC |
http://www.southbayrestoration.org |
Consequently, the changes we observed in relative fish abundance returned to Forster’s tern colonies over the course of our study could be a result of changes in prey selection or may be the result of changes in fish availability because of altered habitat from management associated with the South Bay Salt Pond Restoration Project (href:http://www.southbayrestoration.org). |
342 |
southbayrestoration.org |
2018 |
5257025 |
PMC |
http://www.mediterranee-infection.com/article.php?laref=256&titre=urms-database |
The MALDI-TOF MS spectrum of ‘Ndongobacter massiliensis’ strain Marseille-P3170is available online (href:http://www.mediterranee-infection.com/article.php?laref=256&titre=urms-database id:intref0010). |
110 |
mediterranee-infection.com |
2016 |
Files
Files
(283.6 MB)
Name | Size | Download all |
---|---|---|
md5:d0f0ad6a6892d9d8920785e60dabcfa7
|
283.6 MB | Download |
Additional details
Dates
- Available
-
2024-01-18