There is a newer version of the record available.

Published July 12, 2023 | Version 0.1.3
Dataset Open

A collection of text embeddings of the arXiv corpus by title and abstract

  • 1. Yale School of Medicine

Description

A popular online repository of arXiv is home to numerous preprints in many scientific domains. Other than playing a role of disseminating up-to-date knowledge in pertaining domains, arXiv is an interesting complex system by itself from text analytics point of view. In this repository, we provide a collection of text embedding outputs for (almost) all papers' from the arXiv corpus by their titles and abstracts in order to provide multi-faceted characteristics of scientific knowledge.

Files

Files (107.1 GB)

Name Size Download all
md5:97bf660f4e5f72045c7798b21e275d8e
3.6 GB Download
md5:bece0f637eaece168452be4d81d013f2
3.6 GB Download
md5:ff651a05e4ef1a146d5b8a9126b6c866
7.2 GB Download
md5:be1790829c05dd618cc68567d4cebc53
7.1 GB Download
md5:321a5e37a328c904dabc58f8949a2cc0
7.1 GB Download
md5:e5aaf979013a019be3d325b4bd46a056
7.1 GB Download
md5:a4dfb0d29ed5ec2e4d037934969e93a7
7.1 GB Download
md5:7303dbc26997936c85e629ad4f76feee
3.6 GB Download
md5:892e37a45ca6a092a43cfdc5e7e3429b
3.6 GB Download
md5:46736b067fe193015adc2d510f0ccf38
7.2 GB Download
md5:26c8db4b80d77d2518d73ccaf7bae8b5
7.1 GB Download
md5:1b839d69b3048cc2e9ffc757ee9fdf32
7.1 GB Download
md5:23718e27c1951f9bb95a2889e62e66e7
7.1 GB Download
md5:0b8d84c5f63cc29803ac5ad442718952
7.1 GB Download
md5:e494d127ae927a182af329d28a92a49d
7.1 GB Download
md5:3228794033537753aff21853dee53151
7.1 GB Download
md5:f0dd319092ac3231fbcd54e54c26513f
7.1 GB Download