Dataset Open Access
New version 2.0.0 with majors change
For free and complete informations concerning CASSMIR datasets, please visit our website (in French).
The CASSMIR database (Contribution to the Spatial and Sociological Analysis of Residential Real Estate Markets) is a spatial and population datasets on housing property market of the Parisian metropolitan area, from 1996 to 2018. The indicators in the CASSMIR database cover four "thematic areas of investigation" : prices, socio-demographic profile of buyers and sellers, purchasing regimes and types of property transfers and types of real estate. These indicators characterize spatial units at three scales (communal level, 1km grid and 200m grid) and population groups of buyers and sellers declined according to social, generational and gender criteria. The delivery of the database follows a series of matching and aggregation of individual data from two original databases : a database on real estate transactions (BIEN database) and a database on first-time buyer investments (PTZ database). CASSMIR delivers aggregated data (with nearly 350 variables) in open access for non-commercial use.
This repository consists of sevenfiles.
"CASSMIR_SpatialDataBase" is a Geopackage file, it lists all the data aggregated to spatial units of reference. It is composed of three layers that correspond to the geographical scale of aggregation: at a communal level, a grid of one kilometer on each side and a grid of two hundred meters on each side.
"CASSMIR_GroupesPopDataBase" is a .csv file, it lists all the data aggregated to population groups of reference. There are three types of population groups : groups referenced by the social position of the buyers/sellers (social group), groups referenced by the age group to which the buyers/sellers belong (generational group), groups referenced by the sex of the buyers/sellers (gender group).
Two metadata files (.csv) lists the metadata of the indicators made available. They are systematically structured as follows :
"BIENSampleForTest" and "PTZSampleForTest" are two .txt files which restore a sample of individual data from each of the original databases. All data were anonymized and the values randomized. These two files are specifically dedicated to reproducing the different stages of processing that lead to the production of the CASSMIR files ("CASSMIR_SpatialDataBase" or "CASSMIR_GroupesPopDataBase") and cannot be used in any other way.
"LEXIQUE" is a glossary of terms used to name the variables (.csv).
The creation of the database was funded by the National Reseach Agency (ANR WIsDHoM https://anr.fr/Projet-ANR-18-CE41-0004).
All CASSMIR documentation (in French) and R codes are accessible via the Gitlab repository at the following address : https://gitlab.huma-num.fr/tlecorre/cassmir.git
This dataset is registered under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. You are free to copy, distribute, transmit, and adapt the data, provided that you give credit to the CASSMIR data base and specify the original source of the data. If you modify or use the data in other derivative works, you may distribute them only under the same license. You may not make commercial use of this database, nor may you use it for any purpose other than scientific research.
- Figures: (CC - CASSMIR database, indicator(s) constructed from XXX data)
- Bibliography : Productions that use the CASSMIR database must reference the dataset and the data paper.
Dataset: Le Corre T., 2020, CASSMIR (Version 2.0.0) [Data set], Zenodo. http://doi.org/10.5281/zenodo.4497219
Data paper: Le Corre T., 2021, "Une base de données pour étudier vingt années de dynamiques du marché immobilier en Île-de-France", Cybergeo.
"Une base de données pour étudier vingt années de dynamiques du marché immobilier en Île-de-France"
Thibault Le Corre
Housing market, data base, Île-de-France, spatio-temporal dynamics
The time period covered by the indicators in the database depends on the data sources used, respectively:
For data from BIEN: 1996, 1999, 2003-2012, 2015, 2018
For data from PTZ: 1996-2016
Kind of data
Nature of data submitted
vector: Vector data
grid: Data mesh
code: programming code (see the website or GitLab of the project)
Reference Coordinate System (RCS): EPSG 2154 RGF93/Lambert 93.
Municipalities and grid mesh elements (1km side grid and 200 side grid) concerned by real estate transactions
Geographic Bounding Box
- Xmin : 586421.7
- Xmax : 741205.6
- Ymin : 6780020
- Ymax : 6905324
Type of article