Published October 21, 2022
| Version v2
Dataset
Open
Datasets for Supervised Matching in Clean-Clean Entity Resolution
Description
The repository includes 13 established datasets for evaluating ML- and DL-based matching algorithms:
- Structured DBLP-ACM
- Structured DLBLP-Scholar
- Structured iTunes-Amazon
- Structured Walmart-Amazon
- Structured BeerAdvo-RateBeer
- Structured Amazon-Google Products
- Strucutred Fodors-Zagats
- Dirty DBLP-ACM
- Dirty DBLP-Scholar
- Dirty iTunes-Amazon
- Dirty Walmart-Amazon
- Textual Abt-Buy
- Textual CompanyA-CompanyB
Additionally, the repository includes five new benchmark datasets that are drawn from the following databases using a principled approach based on DeepBlocker:
- Abt-Buy
- Amazon-Google Products
- IMDB-TVDB
- TMDB-TVDB
- Walmart-Amazon
The datasets are available in six different formats so that they can be processed by the following matching algorithms:
- EMTransformer
- GNEM
- HierMatcher
- Magellan
- ZeroER
Files
Files
(730.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:5fb0bbec3869a9d9ce12e9ba6f3fe461
|
614.8 MB | Download |
|
md5:8b2562e00e4146cf5a70e8afeb301d5a
|
25.8 MB | Download |
|
md5:b4b2cf33229bb3acf2791015864348c4
|
90.3 MB | Download |