DISSICON: DISSINET lexico-semantic network of concepts and actions
Authors/Creators
Description
This is a JSON-format, machine-operable, open-licence lexicographic resource, which forms a subset of the research database of the “Dissident Networks Project” (DISSINET, https://dissinet.cz). It focuses on conceptual and verbal entities which hold the database semantically together - its lexico-semantic network. DISSICON stands for DISSI(NET) and CON(CEPTUAL NETWORK). The broader project focuses on the semantic annotation and computational analysis of medieval inquisition records in a social science framework, which for the most part defines (albeit not exclusively) the focus of this lexico-semantic network. The version published here contains the full set of:
- Action entities (n=945), that is verbs and multiword verbal expressions (
entities.json); - Concept entities (n=6,131), that is non-verbal parts of speech (nouns, adjectives, adverbs, occasionally others), including multiword expressions (
entities.json); - semantic relations (n=10,171) between those entities (
relations.json), including hypernymy (n=5,584), actant role semantics / functors (n=2,045), synonymy (n=518), action/event equivalence (n=795), and others; - those Reference entities that Concepts and Actions use for pointing to external resources, typically to WordNet and the Lila Lemma Collection (
entities.json); - Value entities mostly denoting textual values of IDs within the resources denoted in those References (
entities.json).
The dataset’s value for social science history and data-oriented history is that it provides an in-progress, but already powerful machine-operable ontology born from research into medieval inquisition records, including sometimes quite elaborated hypernym paths as well as other semantic relations (synonymy, meaning of all actant slots under specific verbs, etc.).
The value of DISSICON for lexicography is that it maps various uses of verbs and other parts of speech, with special attention paid to meanings of Latin verbs and concepts encountered in medieval inquisition records and their semantic anchoring in a network of English-language analytical concepts. In addition, verbs have an extensively described valency in terms of (1) the entity type a given actant slot can accept (Person, Group, Physical Object, Concept...), (2) morphosyntactic connectivity (case and preposition), and (3) a highly granular rendering of functors, i.e. roles held by an entity when it occupies a given actant slot (e.g., actant slots for giver, beneficiary, and gift under the Latin verb "dedit" / "do", in the meaning of "to give as gift"). A significant part of the entities refers to WordNet for meaning identifiers, but we create our own meanings whenever WordNet did not yield a meaning with a satisfactory definition. DISSICON thus provides a referenceable resource for meanings not covered by Latin WordNet (or even WordNet at large). The dataset identities are UUID-based, and our internal processes make the meanings highly persistent.
The typical users of this dataset will be users of a new instance of the InkVisitor software, who will decide to pre-populate their instance with this existing data, if they find it useful. (V1 was used to populate the MedHate instance of InkVisitor.)
A basic, gradually extended documentation of the broader data model is available from the website From texts to structured data: Building knowledge graphs through Computer-Assisted Semantic Text Modelling (CASTEMO). A brief introduction in the data collection approach are available at InkVisitor | CASTEMO.
DISSICON contains the full set of Actions and Concepts, and as such, it is not so reliable and thorough as the more curated subset called DISSILEX, which focuses on the description of Latin verbs. The work is ongoing and the semantic relations may be of imbalanced depth, can contain inconsistencies, and the mapping of this ontology upon some relevant standards, such as Ontolex Lemon, is still due. We hope to make some progress on this soon.
The contributions to the up-to-date version of the DISSICON dataset (as per the Audits feature of the InkVisitor software, in one of whose instances the database is curated) are as follows, with contributions 5% or more amounting to authorship, and 1% or more but less than 5% to contributorship:
- David Zbíral 39.64 %;
- Robert Shaw 19.01 %;
- Katia Riccardo 18.08 %;
- Katalin Suba 14.21 %;
- Davor Salihović 5.34 %;
- Stanisław Banach 1.91 %.
The creation of the dataset has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 101000442, project “Networks of Dissent: Computational Modelling of Dissident and Inquisitorial Cultures in Medieval Europe”).
Files
entities.json
Files
(55.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:652ab1b32588094b8f3e65054008c72f
|
50.7 MB | Preview Download |
|
md5:c6b7f1afdba1c499a022761eb72cda81
|
4.4 MB | Preview Download |