Published October 24, 2025 | Version 2.0.0
Dataset Open

Profiles of rankings by the Mallows model

Description

Dataset description

This dataset contains profiles of rankings sampled using the Mallows model (Mallows, 1957), through the Repeated Insertion Model (Doignon et al, 2004). Profiles with different characteristics are included, and the following parameter values were considered:
  • Number of alternatives: {3, 4, …, 16, 20, 25, 30}
  • Number of voters: {10, 25, 75, 100, 125, …, 1000}
  • Mallows model dispersion: {0.1, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 0.99, 0.999, 1.0}
For every combination of the following parameters, there are 1000 profiles without nomalising the dispersion and 1000 profiles with the normalised dispersion (Boehmer et al, 2023) regarding the number of alternatives. The central ranking considered can be set when loading the data from disk.
 
If you use this dataset, please cite the Zenodo entry.

Central ranking considerations

We have design this dataset trying to maximise its flexibility. Thus, we have sampled permutations of the central ranking considered for the Mallows model. When loading a profile, these permutations are mapped with the central ranking to obtain the final rankings. By doing so, the user can choose which central ranking to consider, rather than sampling the data again for that central ranking.

For example, the following profile for 4 alternatives, given by:

Permutation Votes
[1, 2, 0, 3] 3
[0, 3, 2, 1] 2

is transformed to the final profile by mapping the permutation of the central ranking for each ranking of the profile. For instance, for the central ranking [3, 2, 1, 0], the final profile would be:

Ranking Votes
[2, 1, 3, 0] 3
[3, 0, 1, 2] 2

File distribution

The dataset is provided as two zip files, as it exceeded the number of files limit. Each zip file gathers the profiles corresponding to normalising or not the dispersion. Please, decompress them in the directory used for storing the dataset locally.
 
The linked GitHub repository gathers the code used for the generation of the data, and includes the functions to load the profiles and calculate the outranking matrix of a profile.

Summary of changes to previous version

  • The dataset has been updated to be suitable for use regardless of the central rankings considered by the authors. This has been achieved by sampling permutations of the central ranking.
  • The structure of the Parquet dataset has been modified for greater efficiency, enabling users to download data with or without dispersion normalisation independently.
  • Data for dispersion levels of 0.99 and 0.999 has also been added.

Files

norm_mallows=false.zip

Files (49.9 GB)

Name Size Download all
md5:a4f81627013595b367d324751814a155
24.2 GB Preview Download
md5:0daaa89a55dd06655378fcb474902762
25.7 GB Preview Download

Additional details

Funding

Universidad de Oviedo
PhD project grant PAPI-24-TESIS-14
Government of Spain
National project MCINN-23-PID2022-139886NB-I00

Dates

Updated
2025-10-24
The dataset has been updated to be fit for use regardless the central rankings considered by authors. This is achieved by sampling permutations of the central ranking. The structure of the Parquet Dataset has been changed for efficiency, allowing to download data with or without dispersion independently. Data for 0.99 and 0.999 dispersion has been added.

Software

Repository URL
https://github.com/MarioVillar/profiles-rankings-mallows.git
Programming language
Python

References

  • C. L. Mallows. Non-null ranking models. Biometrika, 44:114–130, 1957. ISSN 00063444. doi: 10.2307/2333244.
  • J.-P. Doignon, A. Pekeˇc, and M. Regenwetter. The repeated insertion model for rankings: Missing link between two subset choice models. Psychometrika, 69:33–54, 2004. ISSN 1860-0980. doi: 10.1007/ BF02295838.
  • Boehmer, N., Faliszewski, P., & Kraiczy, S. (2023). Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist. Proceedings of Machine Learning Research, 202, 2689–2711.