There is a newer version of the record available.

Published October 21, 2022 | Version v2
Dataset Open

Datasets for Supervised Matching in Clean-Clean Entity Resolution

Authors/Creators

  • 1. University of Athens

Description

The repository includes 13 established datasets for evaluating ML- and DL-based matching algorithms:

  1. Structured DBLP-ACM
  2. Structured DLBLP-Scholar
  3. Structured iTunes-Amazon
  4. Structured Walmart-Amazon
  5. Structured BeerAdvo-RateBeer
  6. Structured Amazon-Google Products
  7. Strucutred Fodors-Zagats
  8. Dirty DBLP-ACM
  9. Dirty DBLP-Scholar
  10. Dirty iTunes-Amazon
  11. Dirty Walmart-Amazon
  12. Textual Abt-Buy
  13. Textual CompanyA-CompanyB

Additionally, the repository includes five new benchmark datasets that are drawn from the following databases using a principled approach based on DeepBlocker:

  1. Abt-Buy
  2. Amazon-Google Products
  3. IMDB-TVDB
  4. TMDB-TVDB
  5. Walmart-Amazon

The datasets are available in six different formats so that they can be processed by the following matching algorithms:

  1. EMTransformer
  2. GNEM
  3. HierMatcher
  4. Magellan
  5. ZeroER

Files

Files (730.9 MB)

Name Size Download all
md5:5fb0bbec3869a9d9ce12e9ba6f3fe461
614.8 MB Download
md5:8b2562e00e4146cf5a70e8afeb301d5a
25.8 MB Download
md5:b4b2cf33229bb3acf2791015864348c4
90.3 MB Download