Intuitive datasets: five-level data abstraction transformations

Sarazin, Arthur; Mourey, Mathis

doi:10.5281/zenodo.18174814

Published January 7, 2026 | Version v1

Dataset Open

Intuitive datasets: five-level data abstraction transformations

1. Université Grenoble Alpes

Intuitive Datasets: Five-Level Data Abstraction Transformations

Overview

This dataset collection demonstrates systematic transformations of open datasets across five levels of abstraction (L4→L0→L3), enabling users with different data literacy levels to access and understand complex data. The transformations implement a meta-design framework for creating "intuitive datasets" that adapt their complexity to user needs.

Citation: If you use these datasets, please cite: 10.5281/zenodo.18174814

License: CC-BY-4.0 (Creative Commons Attribution 4.0 International)

Related Code : https://github.com/ArthurSrz/intuitiveness

Datasets Included

This collection contains three complete dataset transformation cycles from the French open data platform (data.gouv.fr):

test0_schools: French middle school performance indicators and student enrollment data
test1_ademe: ADEME (French environmental agency) funding allocations
test2_energy: Energy price data for gas tariffs in France

Each dataset includes:

raw/: iriginal L4 files (unlinkable multi-level datasets from data.gouv.fr)
descent/: transformed files through L3 (linkable datasets), L2 (categorized table), L1 (feature vector), and L0 (atomic datum)
ascent/: reconstructed datasets from L0 back to L3 with added analytic dimensions
metadata/: transformation metadata, session exports, and join specifications

Five-level abstraction framework

The framework defines five levels of data abstraction:

Level 4 (L4): Unlinkable multi-level datasets - Multiple disconnected CSV files with no apparent structure
Level 3 (L3): Linkable multi-level datasets - Files connected through relationships, forming knowledge graphs
Level 2 (L2): Single dataset with multiple entities and attributes - Categorized or filtered tables
Level 1 (L1): Single entity or single attribute - Feature vectors or entity profiles
Level 0 (L0): Atomic datum - Single entity-attribute-value triplet (e.g., "average school score: 12.5")

Descent Phase (L4→L0)

The descent progressively reduces complexity:

L4→L3: Entity discovery and relationship detection to link disconnected files
L3→L2: Domain isolation through semantic categorization
L2→L1: Feature extraction to create vectors
L1→L0: Aggregation to derive atomic metrics

Ascent Phase (L0→L3)

The ascent intentionally reconstructs complexity:

L0→L1: Expand datum to feature vector with related attributes
L1→L2: Add categorical dimensions (e.g., high/low performance)
L2→L3: Add analytic dimensions to create multi-level structures

File naming convention

All files follow the pattern: `{dataset}_{level}_{description}.{ext}`

Examples:

test0_schools_L4_fr-en-college-effectifs-niveau-sexe-lv.csv - Original L4 raw file
test0_schools_L3_joined_table.csv - Joined table at L3
test0_schools_L0_datum.json - Atomic datum at L0
test0_schools_ascent_L3_table.csv - Reconstructed L3 table during ascent

Data Sources

All datasets originate from data.gouv.fr, France's national open data platform:

test0_schools :

- College enrollment by level, gender, and language : https://www.data.gouv.fr/datasets/effectifs-deleves-par-niveau-sexe-langues-vivantes-1-et-2-les-plus-frequentes-par-college-date-dobservation-au-debut-du-mois-doctobre-chaque-annee

- Middle school performance indicators : https://www.data.gouv.fr/datasets/indicateurs-de-valeur-ajoutee-des-colleges

test1_ademe :

- ADEME financial aid allocations : https://www.data.gouv.fr/datasets/les-aides-financieres-de-lademe-1

- ADEME list of funded projects : https://www.data.gouv.fr/datasets/couts-des-travaux-de-renovation-ecs

test2_energy :

- Regulated gas tariff price levels : https://www.data.gouv.fr/datasets/niveaux-de-prix-par-commune-pour-les-tarifs-reglementes-de-vente-de-gaz-naturel-dengie

- French energy import/export : https://www.data.gouv.fr/datasets/imports-et-exports-commerciaux-2005-a-2021

Transformation methodology

Transformations were performed using the `intuitiveness` Python package (v0.1.0) with the following dependencies:

Python 3.11
pandas 2.x
networkx 3.x
sentence-transformers (multilingual-e5-small model)

For detailed transformation logic, see the session export files in each dataset's `metadata/` folder.

Reuse examples

For data scientists

Test data transformation algorithms across different complexity levels
Benchmark complexity reduction metrics
Validate semantic domain matching techniques
Train machine learning models on multi-level data structures

For open data platforms

Implement multi-level data access features
Design adaptive interfaces for users with varying data literacy
Test complexity-aware search and navigation

For educators

Teach data literacy concepts through concrete examples
Demonstrate descent-ascent transformation cycles
Illustrate complexity management principles

For researchers

Study how data structure affects user comprehension
Analyze relationship discovery patterns in open datasets
Investigate semantic categorization effectiveness across domains

Contact

For questions, issues, or suggestions: arthur.sarazin@etu-iepg.fr

Files

intuitiveness_datasets.zip

Files (5.6 MB)

Name	Size	Download all
intuitiveness_datasets.zip md5:5713c7048ce763c496a3fc5d58ffc2ca	5.6 MB	Preview Download

Additional details

Repository URL: https://github.com/ArthurSrz/intuitiveness

	All versions	This version
Views	145	57
Downloads	6	2
Data volume	43.0 MB	11.1 MB

Intuitive datasets: five-level data abstraction transformations

Authors/Creators

Description

Intuitive Datasets: Five-Level Data Abstraction Transformations

Overview

Datasets Included

Five-level abstraction framework

Descent Phase (L4→L0)

Ascent Phase (L0→L3)

File naming convention

Data Sources

Transformation methodology

Reuse examples

Contact

Files

intuitiveness_datasets.zip

Files (5.6 MB)

Additional details

Software