Published October 25, 2024 | Version 5.3.0
Dataset Open

CSET scholarly literature metadata over OpenAlex works

  • 1. Center for Security and Emerging Technology

Description

This dataset contains metadata developed at the Center for Security and Emerging Technology that augments OpenAlex works. Currently, this includes outputs of AI, Computer Vision, Robotics, Natural Language Processing, and Cybersecurity classifiers for English-language OpenAlex works published after 2009, and title and abstract-level language IDs. For works with positive predictions for AI, Computer Vision, Robotics, or Natural Language Processing, predictions by an AI Safety classifier are also available.

The attached zip file contains a set of JSONL files which comprise our dataset. Each row conforms to this schema, with null values omitted. This dataset is currently a work in progress and full documentation will be made available at a later date.

Files

cset_openalex.zip

Files (1.9 GB)

Name Size Download all
md5:b4db5b0ac001db561bdaf204a21cb703
1.9 GB Preview Download

Additional details

Related works

Documents
10.51593/20220030 (DOI)
10.48550/arXiv.2002.07143 (DOI)
Is referenced by
Publication: 10.48550/arXiv.2403.09097 (DOI)

Software

Repository URL
https://github.com/georgetown-cset/cset_openalex
Programming language
Python
Development Status
Active