Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published September 5, 2024 | Version 0.2.0
Software Open

OWS.eu Indexer

  • 1. Radboud University

Description

This package contains the OWS.eu indexer, which is built with Apache Spark. The indexer reads files from Parquet, JSONL or WARC files and indexes the files into the Common Index File Format (CIFF). The index can be partitioned into several chunks, depending on the metadata provided in the input data.

Files

owi-indexer-v2.zip

Files (29.3 MB)

Name Size Download all
md5:71c427230639bd3ada7505390fed7bf8
29.3 MB Preview Download

Additional details

Related works

Funding

OpenWebSearch.EU – Piloting a Cooperative Open Web Search Infrastructure to Support Europe's Digital Sovereignty 101070014
European Commission

Software

Repository URL
https://opencode.it4i.eu/openwebsearcheu-public/spark-indexer
Programming language
Scala