Published February 17, 2026
| Version v1
Publication
Open
Data Mining and Text Analysis of Spanish Declassified UFO Military Records (1962–1995): A Preprint Summary
Authors/Creators
Description
This preprint describes a fully reproducible computational pipeline applied to the complete corpus of UFO sighting records declassified by the Spanish Air Force (1962–1995) and published through the Biblioteca Virtual de Defensa (BVMDefensa). The work includes automated scraping, dual OCR (Apple Vision and olmOCR), corpus fusion, relational database construction (SQLite), structured field extraction, semantic embeddings, clustering (UMAP + HDBSCAN), knowledge graph construction, and a retrieval-augmented generation (RAG) system validated on historical ground-truth cases. The resulting database (78 canonical cases, 2,135 OCR pages, 6,460 indexed text chunks) is intended as an open, auditable computational resource for research on declassified unidentified aerial phenomena (UAP) records in Spain. This work serves as a benchmark prior to scaling the methodology to larger international databases (NUFORC, UFOCAT, GEIPAN).
Files
Data_Mining_Spanish _Declassified _UFO_Military_Records.pdf
Files
(485.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f53ee39ca8c78fbc9824d21a26455a37
|
485.1 kB | Preview Download |