Large Oil and Gas industry text dataset from Norwegian , UK and Dutch public oil and gas documents

FORCE NETWORK Group

doi:10.5281/zenodo.10775273

Published March 3, 2024 | Version v1

Dataset Open

Large Oil and Gas industry text dataset from Norwegian , UK and Dutch public oil and gas documents

FORCE NETWORK Group (Data collector)

This is a large dataset of extracted text from public Oil and gas documents that was prepared in the run up to the FORCE 2023 Large Languagel model Hackathon in Stavanger, Norway

The dataset is uninque since it contains the largest public collection of extracted text from Ocr'ed oil and gas documents currently available. It has been created with the aim to make more oil and gas documents knowledge better embedded in language models
Additional the text has been classified in if the extracted pages are real text or mostly gibberish.
Personal identifiable information has been removed as best as possible
A file with 1500 hand classified pages is part of the upload to further train text classifiers.

Files

Netherlands - Netherlands Oil & Gas Portal reports.csv

Files (6.1 GB)

Name	Size
annotated text categories.xls md5:d5a40381aaee02acde189520efe8e151	3.6 MB	Download
Netherlands - Netherlands Oil & Gas Portal reports.csv md5:efc83a69340cb57c692804c5d57f8b24	166.3 MB	Preview Download
Norway - Diskos reports.csv md5:9d76003330ff13f869b5b02925d5ca2f	5.7 GB	Preview Download
Norway - Norwegian Petroleum Directorate relinquishment reports.csv md5:f6ca32637b9117b2f18b5578c02b55be	24.7 MB	Preview Download
UK - North Sea Transition Authority NDR reports.csv md5:00b02f526545961934b23a085d8a0417	198.7 MB	Preview Download

Additional details

Available: 2024-03-03

	All versions	This version
Views	1,096	1,096
Downloads	1,149	1,149
Data volume	3.5 TB	3.5 TB

Large Oil and Gas industry text dataset from Norwegian , UK and Dutch public oil and gas documents

Authors/Creators

Description

Files

Netherlands - Netherlands Oil & Gas Portal reports.csv

Files (6.1 GB)

Additional details

Dates