There is a newer version of the record available.

Published January 7, 2026 | Version v1

Intuitive datasets: five-level data abstraction transformations

  • 1. ROR icon Université Grenoble Alpes

Description

 

Intuitive Datasets: Five-Level Data Abstraction Transformations


Overview

This dataset collection demonstrates systematic transformations of open datasets across five levels of abstraction (L4→L0→L3), enabling users with different data literacy levels to access and understand complex data. The transformations implement a meta-design framework for creating "intuitive datasets" that adapt their complexity to user needs.

Citation: If you use these datasets, please cite: 10.5281/zenodo.18174814

License: CC-BY-4.0 (Creative Commons Attribution 4.0 International)

Related Code : https://github.com/ArthurSrz/intuitiveness

Datasets Included


This collection contains three complete dataset transformation cycles from the French open data platform (data.gouv.fr):

  1. test0_schools: French middle school performance indicators and student enrollment data
  2. test1_ademe: ADEME (French environmental agency) funding allocations
  3. test2_energy: Energy price data for gas tariffs in France

Each dataset includes:
 
  • raw/: iriginal L4 files (unlinkable multi-level datasets from data.gouv.fr)
  • descent/: transformed files through L3 (linkable datasets), L2 (categorized table), L1 (feature vector), and L0 (atomic datum)
  • ascent/: reconstructed datasets from L0 back to L3 with added analytic dimensions
  • metadata/: transformation metadata, session exports, and join specifications

Five-level abstraction framework


The framework defines five levels of data abstraction:

  • Level 4 (L4): Unlinkable multi-level datasets - Multiple disconnected CSV files with no apparent structure
  • Level 3 (L3): Linkable multi-level datasets - Files connected through relationships, forming knowledge graphs
  • Level 2 (L2): Single dataset with multiple entities and attributes - Categorized or filtered tables
  • Level 1 (L1): Single entity or single attribute - Feature vectors or entity profiles
  • Level 0 (L0): Atomic datum - Single entity-attribute-value triplet (e.g., "average school score: 12.5")

Descent Phase (L4→L0)


The descent progressively reduces complexity:
  1. L4→L3: Entity discovery and relationship detection to link disconnected files
  2. L3→L2: Domain isolation through semantic categorization
  3. L2→L1: Feature extraction to create vectors
  4. L1→L0: Aggregation to derive atomic metrics

Ascent Phase (L0→L3)


The ascent intentionally reconstructs complexity:
  1. L0→L1: Expand datum to feature vector with related attributes
  2. L1→L2: Add categorical dimensions (e.g., high/low performance)
  3. L2→L3: Add analytic dimensions to create multi-level structures

File naming convention


All files follow the pattern: `{dataset}_{level}_{description}.{ext}`

Examples:
  • test0_schools_L4_fr-en-college-effectifs-niveau-sexe-lv.csv - Original L4 raw file
  • test0_schools_L3_joined_table.csv - Joined table at L3
  • test0_schools_L0_datum.json - Atomic datum at L0
  • test0_schools_ascent_L3_table.csv - Reconstructed L3 table during ascent

Data Sources


All datasets originate from data.gouv.fr, France's national open data platform:

test0_schools :
- College enrollment by level, gender, and language : https://www.data.gouv.fr/datasets/effectifs-deleves-par-niveau-sexe-langues-vivantes-1-et-2-les-plus-frequentes-par-college-date-dobservation-au-debut-du-mois-doctobre-chaque-annee
- Middle school performance indicators : https://www.data.gouv.fr/datasets/indicateurs-de-valeur-ajoutee-des-colleges

test1_ademe :
- ADEME financial aid allocations : https://www.data.gouv.fr/datasets/les-aides-financieres-de-lademe-1
- ADEME list of funded projects : https://www.data.gouv.fr/datasets/couts-des-travaux-de-renovation-ecs

test2_energy :
- Regulated gas tariff price levels : https://www.data.gouv.fr/datasets/niveaux-de-prix-par-commune-pour-les-tarifs-reglementes-de-vente-de-gaz-naturel-dengie
- French energy import/export : https://www.data.gouv.fr/datasets/imports-et-exports-commerciaux-2005-a-2021

Transformation methodology


Transformations were performed using the `intuitiveness` Python package (v0.1.0) with the following dependencies:
  • Python 3.11
  • pandas 2.x
  • networkx 3.x
  • sentence-transformers (multilingual-e5-small model)

For detailed transformation logic, see the session export files in each dataset's `metadata/` folder.

Reuse examples


For data scientists
  • Test data transformation algorithms across different complexity levels
  • Benchmark complexity reduction metrics
  • Validate semantic domain matching techniques
  • Train machine learning models on multi-level data structures

For open data platforms
  • Implement multi-level data access features
  • Design adaptive interfaces for users with varying data literacy
  • Test complexity-aware search and navigation

For educators
  • Teach data literacy concepts through concrete examples
  •  Demonstrate descent-ascent transformation cycles
  • Illustrate complexity management principles

For researchers
  • Study how data structure affects user comprehension
  • Analyze relationship discovery patterns in open datasets
  • Investigate semantic categorization effectiveness across domains

 

Contact


For questions, issues, or suggestions: arthur.sarazin@etu-iepg.fr

 

Files

intuitiveness_datasets.zip

Files (5.6 MB)

Name Size Download all
md5:5713c7048ce763c496a3fc5d58ffc2ca
5.6 MB Preview Download

Additional details