Published March 3, 2020
| Version v1
Dataset
Open
Optimization of the Mainzelliste Software for Fast Privacy-preserving Record Linkage
- 1. Database Group, University of Leipzig
- 2. German Cancer Research Center Heidelberg, Germany
Description
Synthetically generated person related datasets used in the evaluation of linkage quality and runtime of the Mainzelliste. To generate person records we used the established GeCo data generator modified with small extensions such as including look-up files for German names in addition to English names. A generated dataset consists of two subsets, org and dup, to be compared with each other. The duplicate records can contain data errors (e.g., different
but similarly sounding letters, OCR errors or typos) to simulate reduced data quality, making matching more
challenging.
Files
mainzelliste_datasets.zip
Files
(12.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:327c7bf2e4a54f4b59da25855adf5c3b
|
12.2 MB | Preview Download |