Published October 24, 2025
| Version 2.0.0
Dataset
Open
Profiles of rankings by the Mallows model
Description
Dataset description
This dataset contains profiles of rankings sampled using the Mallows model (Mallows, 1957), through the Repeated Insertion Model (Doignon et al, 2004). Profiles with different characteristics are included, and the following parameter values were considered:
- Number of alternatives: {3, 4, …, 16, 20, 25, 30}
- Number of voters: {10, 25, 75, 100, 125, …, 1000}
- Mallows model dispersion: {0.1, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.99, 0.999, 1.0}
For every combination of the following parameters, there are 1000 profiles without nomalising the dispersion and 1000 profiles with the normalised dispersion (Boehmer et al, 2023) regarding the number of alternatives. The central ranking considered can be set when loading the data from disk.
If you use this dataset, please cite the Zenodo entry.
Central ranking considerations
We have design this dataset trying to maximise its flexibility. Thus, we have sampled permutations of the central ranking considered for the Mallows model. When loading a profile, these permutations are mapped with the central ranking to obtain the final rankings. By doing so, the user can choose which central ranking to consider, rather than sampling the data again for that central ranking.
For example, the following profile for 4 alternatives, given by:
| Permutation | Votes |
| [1, 2, 0, 3] | 3 |
| [0, 3, 2, 1] | 2 |
is transformed to the final profile by mapping the permutation of the central ranking for each ranking of the profile. For instance, for the central ranking [3, 2, 1, 0], the final profile would be:
| Ranking | Votes |
| [2, 1, 3, 0] | 3 |
| [3, 0, 1, 2] | 2 |
File distribution
The dataset is provided as two zip files, as it exceeded the number of files limit. Each zip file gathers the profiles corresponding to normalising or not the dispersion. Please, decompress them in the directory used for storing the dataset locally.
The linked GitHub repository gathers the code used for the generation of the data, and includes the functions to load the profiles and calculate the outranking matrix of a profile.
Summary of changes to previous version
- The dataset has been updated to be suitable for use regardless of the central rankings considered by the authors. This has been achieved by sampling permutations of the central ranking.
- The structure of the Parquet dataset has been modified for greater efficiency, enabling users to download data with or without dispersion normalisation independently.
- Data for dispersion levels of 0.99 and 0.999 has also been added.
Files
norm_mallows=false.zip
Files
(49.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:a4f81627013595b367d324751814a155
|
24.2 GB | Preview Download |
|
md5:0daaa89a55dd06655378fcb474902762
|
25.7 GB | Preview Download |
Additional details
Funding
- Universidad de Oviedo
- PhD project grant PAPI-24-TESIS-14
- Government of Spain
- National project MCINN-23-PID2022-139886NB-I00
Dates
- Updated
-
2025-10-24The dataset has been updated to be fit for use regardless the central rankings considered by authors. This is achieved by sampling permutations of the central ranking. The structure of the Parquet Dataset has been changed for efficiency, allowing to download data with or without dispersion independently. Data for 0.99 and 0.999 dispersion has been added.
Software
- Repository URL
- https://github.com/MarioVillar/profiles-rankings-mallows.git
- Programming language
- Python
References
- C. L. Mallows. Non-null ranking models. Biometrika, 44:114–130, 1957. ISSN 00063444. doi: 10.2307/2333244.
- J.-P. Doignon, A. Pekeˇc, and M. Regenwetter. The repeated insertion model for rankings: Missing link between two subset choice models. Psychometrika, 69:33–54, 2004. ISSN 1860-0980. doi: 10.1007/ BF02295838.
- Boehmer, N., Faliszewski, P., & Kraiczy, S. (2023). Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist. Proceedings of Machine Learning Research, 202, 2689–2711.