Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction Models
Creators
- 1. University of Texas at Arlington
Description
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.
Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
- Subject matter triples file
- fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
- Example of a row in train.txt, valid.txt, and test.txt:
- 2, 192, 0
- Example of a row in entity2id.txt:
- /g/112yfy2xr, 2
- Example of a row in relation2id.txt:
- /music/album/release_type, 192
- Explaination
- "/g/112yfy2xr" and "/m/02lx2r" are the MID of the subject entity and object entity, respectively. "/music/album/release_type" is the realtionship between the two entities. 2, 192, and 0 are the IDs assigned by the authors to the objects.
- Example of a row in train.txt, valid.txt, and test.txt:
- fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
- Type system file
- freebase_endtypes: Each row maps an edge type to its required subject type and object type.
- Example
- 92, 47178872, 90
- Explanation
- "92" and "90" are the type id of the subject and object which has the relationship id "47178872".
- Example
- freebase_endtypes: Each row maps an edge type to its required subject type and object type.
- Metadata files
- object_types: Each row maps the MID of a Freebase object to a type it belongs to.
- Example
- /g/11b41c22g, /type/object/type, /people/person
- Explanation
- The entity with MID "/g/11b41c22g" has a type "/people/person"
- Example
- object_names: Each row maps the MID of a Freebase object to its textual label.
- Example
- /g/11b78qtr5m, /type/object/name, "Viroliano Tries Jazz"@en
- Explanation
- The entity with MID "/g/11b78qtr5m" has name "Viroliano Tries Jazz" in English.
- Example
- object_ids: Each row maps the MID of a Freebase object to its user-friendly identifier.
- Example
- /m/05v3y9r, /type/object/id, "/music/live_album/concert"
- Explanation
- The entity with MID "/m/05v3y9r" can be interpreted by human as a music concert live album.
- Example
- domains_id_label: Each row maps the MID of a Freebase domain to its label.
- Example
- /m/05v4pmy, geology, 77
- Explanation
- The object with MID "/m/05v4pmy" in Freebase is the domain "geology", and has id "77" in our dataset.
- Example
- types_id_label: Each row maps the MID of a Freebase type to its label.
- Example
- /m/01xljxh, /government/political_party, 147
- Explanation
- The object with MID "/m/01xljxh" in Freebase is the type "/government/political_party", and has id "147" in our dataset.
- Example
- entities_id_label: Each row maps the MID of a Freebase entity to its label.
- Example
- /g/11b78qtr5m, Viroliano Tries Jazz, 2234
- Explanation
- The entity with MID "/g/11b78qtr5m" in Freebase is "Viroliano Tries Jazz", and has id "2234" in our dataset.
- properties_id_label: Each row maps the MID of a Freebase property to its label.
- Example
- /m/010h8tp2, /comedy/comedy_group/members, 47178867
- Explanation
- The object with MID "/m/010h8tp2" in Freebase is a property(relation/edge), it has label "/comedy/comedy_group/members" and has id "47178867" in our dataset.
- Example
- uri_original2simplified and uri_simplified2original: The mapping between original URI and simplified URI and the mapping between simplified URI and original URI repectively.
- Example
- uri_original2simplified
- "http://rdf.freebase.com/ns/type.property.unique": "/type/property/unique"
- uri_simplified2original
- "/type/property/unique": "http://rdf.freebase.com/ns/type.property.unique"
- uri_original2simplified
- Explanation
- The URI "http://rdf.freebase.com/ns/type.property.unique" in the original Freebase RDF dataset is simplified into "/type/property/unique" in our dataset.
- The identifier "/type/property/unique" in our dataset has URI http://rdf.freebase.com/ns/type.property.unique in the original Freebase RDF dataset.
- Example
- Example
- object_types: Each row maps the MID of a Freebase object to a type it belongs to.
Files
idirlab-freebases.zip
Files
(14.1 GB)
Name | Size | Download all |
---|---|---|
md5:170689b7aad9f029566a4deb36605b01
|
14.1 GB | Preview Download |