Published June 4, 2026 | Version v1.5

geonlp-pipeline-paper-2026: A Reproducible Pipeline for Geoscientific Text Mining

  • 1. University of Saskatchewan, Department of Geological Sciences

Description

Production pipeline source code, database schema, migrations, and Kubernetes deployment manifests accompanying Heasman and Eglington (2026), a methodology paper describing a reproducible Python and PostgreSQL pipeline for assembling domain-specific text corpora from the xDD Snippet API. Includes pre-flight hit checking, in-stream Counter pruning for memory-bounded streaming, pool-segregated parallel workers, and in-database information-theoretic statistics.

Notes

If you use this software, please cite it using the metadata from this file.

Files

DHeasmanGDS/geonlp-pipeline-paper-2026-v1.5.zip

Files (85.0 kB)

Name Size Download all
md5:19421ed437acc6d7604226c6e4787075
85.0 kB Preview Download

Additional details

Related works