Published January 7, 2026
| Version v1
Dataset
Open
Intuitive datasets: five-level data abstraction transformations
Authors/Creators
Description
Intuitive Datasets: Five-Level Data Abstraction Transformations
Overview
This dataset collection demonstrates systematic transformations of open datasets across five levels of abstraction (L4→L0→L3), enabling users with different data literacy levels to access and understand complex data. The transformations implement a meta-design framework for creating "intuitive datasets" that adapt their complexity to user needs.
Citation: If you use these datasets, please cite: 10.5281/zenodo.18174814
License: CC-BY-4.0 (Creative Commons Attribution 4.0 International)
Related Code : https://github.com/ArthurSrz/intuitiveness
Datasets Included
This collection contains three complete dataset transformation cycles from the French open data platform (data.gouv.fr):
- test0_schools: French middle school performance indicators and student enrollment data
- test1_ademe: ADEME (French environmental agency) funding allocations
- test2_energy: Energy price data for gas tariffs in France
Each dataset includes:
- raw/: iriginal L4 files (unlinkable multi-level datasets from data.gouv.fr)
- descent/: transformed files through L3 (linkable datasets), L2 (categorized table), L1 (feature vector), and L0 (atomic datum)
- ascent/: reconstructed datasets from L0 back to L3 with added analytic dimensions
- metadata/: transformation metadata, session exports, and join specifications
Five-level abstraction framework
The framework defines five levels of data abstraction:
- Level 4 (L4): Unlinkable multi-level datasets - Multiple disconnected CSV files with no apparent structure
- Level 3 (L3): Linkable multi-level datasets - Files connected through relationships, forming knowledge graphs
- Level 2 (L2): Single dataset with multiple entities and attributes - Categorized or filtered tables
- Level 1 (L1): Single entity or single attribute - Feature vectors or entity profiles
- Level 0 (L0): Atomic datum - Single entity-attribute-value triplet (e.g., "average school score: 12.5")
Descent Phase (L4→L0)
The descent progressively reduces complexity:
- L4→L3: Entity discovery and relationship detection to link disconnected files
- L3→L2: Domain isolation through semantic categorization
- L2→L1: Feature extraction to create vectors
- L1→L0: Aggregation to derive atomic metrics
Ascent Phase (L0→L3)
The ascent intentionally reconstructs complexity:
- L0→L1: Expand datum to feature vector with related attributes
- L1→L2: Add categorical dimensions (e.g., high/low performance)
- L2→L3: Add analytic dimensions to create multi-level structures
File naming convention
All files follow the pattern: `{dataset}_{level}_{description}.{ext}`
Examples:
- test0_schools_L4_fr-en-college-effectifs-niveau-sexe-lv.csv - Original L4 raw file
- test0_schools_L3_joined_table.csv - Joined table at L3
- test0_schools_L0_datum.json - Atomic datum at L0
- test0_schools_ascent_L3_table.csv - Reconstructed L3 table during ascent
Data Sources
All datasets originate from data.gouv.fr, France's national open data platform:
test0_schools :
- College enrollment by level, gender, and language : https://www.data.gouv.fr/datasets/effectifs-deleves-par-niveau-sexe-langues-vivantes-1-et-2-les-plus-frequentes-par-college-date-dobservation-au-debut-du-mois-doctobre-chaque-annee
- Middle school performance indicators : https://www.data.gouv.fr/datasets/indicateurs-de-valeur-ajoutee-des-colleges
test1_ademe :
- ADEME financial aid allocations : https://www.data.gouv.fr/datasets/les-aides-financieres-de-lademe-1
- ADEME list of funded projects : https://www.data.gouv.fr/datasets/couts-des-travaux-de-renovation-ecs
test2_energy :
- Regulated gas tariff price levels : https://www.data.gouv.fr/datasets/niveaux-de-prix-par-commune-pour-les-tarifs-reglementes-de-vente-de-gaz-naturel-dengie
- French energy import/export : https://www.data.gouv.fr/datasets/imports-et-exports-commerciaux-2005-a-2021Transformation methodology
Transformations were performed using the `intuitiveness` Python package (v0.1.0) with the following dependencies:
- Python 3.11
- pandas 2.x
- networkx 3.x
- sentence-transformers (multilingual-e5-small model)
For detailed transformation logic, see the session export files in each dataset's `metadata/` folder.
Reuse examples
For data scientists
- Test data transformation algorithms across different complexity levels
- Benchmark complexity reduction metrics
- Validate semantic domain matching techniques
- Train machine learning models on multi-level data structures
For open data platforms
- Implement multi-level data access features
- Design adaptive interfaces for users with varying data literacy
- Test complexity-aware search and navigation
For educators
- Teach data literacy concepts through concrete examples
- Demonstrate descent-ascent transformation cycles
- Illustrate complexity management principles
For researchers
- Study how data structure affects user comprehension
- Analyze relationship discovery patterns in open datasets
- Investigate semantic categorization effectiveness across domains
Contact
For questions, issues, or suggestions: arthur.sarazin@etu-iepg.fr
Files
intuitiveness_datasets.zip
Files
(5.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5713c7048ce763c496a3fc5d58ffc2ca
|
5.6 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/ArthurSrz/intuitiveness