Published March 30, 2026 | Version 1.0.0
Preprint Open

Etymological Origins of First Names in France (1900–2024)

Authors/Creators

Description

We present a comprehensive analysis of the etymological origins of all first names given at birth in France between 1900 and 2024, using the complete INSEE civil registry dataset (87 million births, 48,516 unique names). Each name was classified into one of 20 etymological origin categories using a large language model (Claude Haiku 4.5) operating as an automated onomastic classifier.

Our analysis reveals four major structural shifts: (1) a sustained decline of names with Germanic etymological roots, from 28% of births in 1920 to 8% in 2024; (2) a collapse and partial recovery of Hebrew/Biblical names, peaking at 40% in 1946 before stabilizing at 23%; (3) a steady, quasi-linear rise of names with Arabic etymological origins from near-zero in 1950 to 16% in 2024; and (4) a monotonic increase in the Shannon diversity index of name origins across the full period. Monte Carlo projections (10,000 trajectories calibrated on 1990–2024 volatility) produce 90% intervals for 2050.

The full classification dataset (48,516 name–origin mappings) and analysis code are included as supplementary materials. A shorter French-language version is included as an additional file.

An interactive visualization of these results is available at https://yukicapital.com/french-first-names-origins

Files

births_by_origin_year.csv

Files (1.9 MB)

Name Size Download all
md5:d148e3a67ffa4fe00acfffb95a8710ff
17.9 kB Preview Download
md5:33bcf6ca796a2b0dc0be46689ec41569
48.1 kB Preview Download
md5:eef955df9867cc11d1ba8d1b44967bfb
969.7 kB Preview Download
md5:a092c3982ff4d01718eb069fc88e2ba2
818.6 kB Preview Download

Additional details

Additional titles

Subtitle (English)
A Century of Cultural Transformation Measured Through Large-Scale AI Classification

Related works