Published March 8, 2026
| Version v1
Software
Open
Extraction of Abstracts from arXiv TEI XML Files
Description
This repository contains code to extract abstracts from TEI XML files of arXiv papers.
The project includes:
- Python script for parsing XML
- Dockerfile for reproducible execution
- Sample dataset
Files
text-extraction-analysis-grobid.zip
Files
(61.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:efe1dd9f7bd880f04d9cb4e5c3748804
|
61.2 MB | Preview Download |