Text Mining Scholarly Publications using APIs
Description
Researchers often create custom datasets for their work instead of using whole corpora of scholarly publications. In this extended abstract, I describe my work constructing a pipeline that will make the creation of these custom datasets easy. My pipeline will be reusable such that given any Digital Object Identifier (DOI) of scholarly papers it can extract the full texts, if available, and researchers can create their own datasets to analyze the papers. My pipeline uses Crossref, Elsevier, and Wiley’s TDM APIs to help navigate the license problems and other access issues related to full-text extraction and allow researchers to focus on their analysis work.
Files
ASIST_METSTI2023_poster_Sarraf_et_al.pdf
Files
(343.3 kB)
Name | Size | Download all |
---|---|---|
md5:f5dfbbbb2887f22d4ff54fe516bfea18
|
343.3 kB | Preview Download |
Additional details
Related works
- Is version of
- Presentation: 2142/120049 (Handle)
Funding
Software
- Repository URL
- https://github.com/infoqualitylab/text-mining-scholarly-API
- Programming language
- Python