Published June 4, 2026
| Version v1.5
Software
Open
geonlp-pipeline-paper-2026: A Reproducible Pipeline for Geoscientific Text Mining
Authors/Creators
- 1. University of Saskatchewan, Department of Geological Sciences
Description
Production pipeline source code, database schema, migrations, and Kubernetes deployment manifests accompanying Heasman and Eglington (2026), a methodology paper describing a reproducible Python and PostgreSQL pipeline for assembling domain-specific text corpora from the xDD Snippet API. Includes pre-flight hit checking, in-stream Counter pruning for memory-bounded streaming, pool-segregated parallel workers, and in-database information-theoretic statistics.
Notes
Files
DHeasmanGDS/geonlp-pipeline-paper-2026-v1.5.zip
Files
(85.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:19421ed437acc6d7604226c6e4787075
|
85.0 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/DHeasmanGDS/geonlp-pipeline-paper-2026/tree/v1.5 (URL)
Software
- Repository URL
- https://github.com/DHeasmanGDS/geonlp-pipeline-paper-2026