Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published June 17, 2023 | Version 1.3
Dataset Open

The Yelp Collaborative Knowledge Graph

  • 1. Aalborg University

Description

This is the The Yelp Collaborative Knowledge Graph (YCKG) - a transformation of the Yelp Open Dataset into RDF format using Y2KG. 

Paper Abstract

The Yelp Open Dataset (YOD) contains data about businesses, reviews, and users from the Yelp website and is available for research purposes. This dataset has been widely used to develop and test Recommender Systems (RS), especially those using Knowledge Graphs (KGs), e.g., integrating taxonomies, product categories, business locations, and social network information. Unfortunately, researchers applied naive or wrong mappings while converting YOD in KGs, consequently obtaining unrealistic results. Among the various issues, the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs and reuse existing vocabularies. In this work, we overcome these issues by introducing Y2KG, a utility to convert the Yelp dataset into a KG. Y2KG consists of two components. The first is a dataset including (1) a vocabulary that extends Schema.org with properties to describe the concepts in YOD and (2) mappings between the Yelp entities and Wikidata. The second component is a set of scripts to transform YOD in RDF and obtain the Yelp Collaborative Knowledge Graph (YCKG). The design of Y2KG was driven by 16 core competency questions. YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over 244 million triples (with 144 distinct predicates) for about 72 million resources, with an average in-degree and out-degree of 3.3 and 12.2, respectively.

Links

Latest GitHub release: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph/releases/latest

PURL domain: https://purl.archive.org/domain/yckg

Files

  • Graph Data Triple Files
    • One sample file for each of the Yelp domains (Businesses, Users, Reviews, Tips and Checkins),  each containing 20 entities.
    • yelp_schema_mappings.nt.gz containing the mappings from Yelp categories to Schema things.
    • schema_hierarchy.nt.gz containing the full hierarchy of the mapped Schema things.
    • yelp_wiki_mappings.nt.gz containing the mappings from Yelp categories to Wikidata entities.
    • wikidata_location_mappings.nt.gz containing the mappings from Yelp locations to Wikidata entities.
  • Graph Metadata Triple Files
    • yelp_categories.ttl contains metadata for all Yelp categories.
    • yelp_entities.ttl contains metadata regarding the dataset
    • yelp_vocabulary.ttl contains metadata on the created Yelp vocabulary and properties.
  • Utility Files
    • yelp_category_schema_mappings.csv. This file contains the 310 mappings from Yelp categories to Schema types. These mappings have been manually verified to be correct.
    • yelp_predicate_schema_mappings.csv. This file contains the 14 mappings from Yelp attributes to Schema properties. These mappings are manually found.
    • ground_truth_yelp_category_schema_mappings.csv. This file contains the ground truth, based on 200 manually verified mappings from Yelp categories to Schema things. The ground truth mappings were used to calculate precision and recall for the semantic mappings.
    • manually_split_categories.csv. This file contains all Yelp categories containing either a & or /, and their manually split versions. The split versions have been used in the semantic mappings to Schema things.

Files

ground_truth_yelp_category_schema_mappings.csv

Files (5.2 MB)

Name Size Download all
md5:9962e4bec740a7c9536683db566f132a
30.2 kB Preview Download
md5:6f4434bee190ebc0ac88e735db0d7380
5.9 kB Preview Download
md5:00a3d1c41f3a876ccdb2f6c8a645779a
3.0 kB Download
md5:614691b2065eca0f386da5f7171280e0
3.2 MB Download
md5:5849f22f7ec218704c9c806017257078
28.3 kB Download
md5:1d2c99488d9adec1ac5d3ce3fb4d216f
175.0 kB Download
md5:d0b01bec689845e05d869dad3d497338
10.0 kB Preview Download
md5:b4ec3ebae034fd01c967970839739f1a
80.2 kB Download
md5:1f971bf07e4f2ece697b5043ec60decf
883 Bytes Download
md5:2c6a065f9c1242b9884ed4df1101cfc8
1.0 kB Preview Download
md5:62bad3d5390846425f17830409078a3a
7.7 kB Download
md5:8dfb6dcbed8b4a294a9dbde90c880ebc
5.9 kB Download
md5:09422a60365462d5fe394a89519b93aa
3.1 kB Download
md5:a6e0a609e26064209d767bc3f33856bf
1.6 MB Download
md5:9f0b9d05e016435e757f411bbb77296f
5.7 kB Download

Additional details