Dataset of first appearances of the scholarly bibliographic references on English Wikipedia articles as of 1 March 2017 and as of 1 October 2021
Description
Abstract
We developed a methodology to detect the oldest scholarly reference added to Wikipedia articles by which a certain paper is uniquely identifiable as the "first appearance of the scholarly reference." We identified the first appearances of 923,894 scholarly references (611,119 unique DOIs) in180,795 unique pages on English Wikipedia as of March 1, 2017, and stored them in the dataset. Moreover, we assessed the precision of the dataset, which was and it was a high precision regardless of the research field. In this version, it is available not only the dataset of English Wikipedia as of March 1, 2017, but also English Wikipedia as of October 1, 2021, generated by using the same methodology.
Data Records
The data format of the dataset is JSON lines, where each line is a single record. In this dataset, we detected the first appearance of each scholarly reference added to Wikipedia articles. If there are multiple references corresponding to the same paper on the same page, only the oldest one is collected. Sample of the record is the following.
- doi -- DOI corresponding to the paper (String), e.g., "10.1006/anbe.1996.0497"
- paper_type -- Document type of the paper (String), e.g., "journal-article"
- paper_container_title -- Journal title, book title, or proceedings title (Array of String), e.g., ["Animal Behaviour"]
- paper_publisher -- Publisher name (String), e.g., "Elsevier BV"
- paper_title -- Paper title (Array of String), e.g., ["Push or pull: an experimental study on imitation in marmosets"]
- paper_published_year -- Published year (String), e.g., "1997"
- paper_issue -- Issue number (String), e.g., "4"
- paper_volume -- Volume number (String), e.g., "54"
- paper_page -- Page numbers (String), e.g., "817-831"
- paper_author -- Authors information consisted of given and family names, sequences (order in author names), and affiliations (Array of JSON), e.g., [{"given":"THOMAS", "family":"BUGNYAR", "sequence":"first", "affiliation":[]}, {"given":"LUDWIG", "family":"HUBER", "sequence":"additional", "affiliation":[]}]
- issn -- ISSN related to the paper (Array of String), e.g., ["0003-3472"]
- research_field -- Research fields from ESI categories (Array of String), e.g., ["PLANT & ANIMAL SCIENCE"]
- page_id -- Page id (String), e.g., "577858"
- page_title -- Page title (String), e.g., "Imitation"
- revision_id -- Revision id (String), e.g., "203309031"
- revision_timestamp -- Revision timestamp (String), e.g., "2008-04-04 15:54:09 UTC"
- revision_comment -- Revision comment (edit summary) (String), e.g., "/* Animal Behaviour */"
- editor_name -- Wikipedia editor's name (String), e.g., "Nicemr"
- editor_type -- Type of the editor (String), e.g., "User"
References
- Kikkawa, J., Takaku, M. & Yoshikane, F. "Dataset of first appearances of the scholarly bibliographic references on Wikipedia articles", Scientific Data, Vol. 9, Article number 85, pp. 1-11, 2022. https://doi.org/10.1038/s41597-022-01190-z.
FUNDING
- JSPS KAKENHI Grant Number JP20K12543
- JSPS KAKENHI Grant Number JP21K21303
Files
Files
(447.3 MB)
Name | Size | Download all |
---|---|---|
md5:2bf0954978b0a74f839e3f4b069aa31f
|
144.4 MB | Download |
md5:db6cd9be26979bb615a4ce95830f0336
|
302.9 MB | Download |