RIRE corpus
Authors/Creators
Description
The JIGS (Joke-like IncongruityGathering System) was developed as part of the SNSF-funded project “Le Rire des vers / Mining the comic verse”. It is used to tag JOLIS (Joke-Like Incongruity Segments)
A JOLIS (joke-like incongruity segment) is a text segment where:
-
Two possible meanings are present (S1/S2), and
-
The two meanings (scripts) are ovelapping in the same segment, and
-
The two scripts are incompatible.
If all three criteria are met, the segment is a JOLI-S
The corpus featured in the “Le Rire des vers / Mining the comic verse” contains 8255 French poems that have been annotated for versification features by means of Richard Renault's Malherbe. Some of these poems have been obtained from the anamètre database, others have been gathered and prepared by us. Part of the poems has been annotated for joke-like incongruities and for tunes they are associated with (tune annotation is still preliminary).
Corpus is stored in a single JSON file and simultaneously in XML files (one XML file per poem).
XML structure
The structure of each XML files is as follows:
<RIRE>
<header>
<title>[title of the poem]</title>
<author>[author of the poem]</author>
<book_title>[title of the book the poem comes from]</book_title>
<project url="https://data.snf.ch/grants/grant/185674">Le Rire des vers</project>
</header>
<text>
<body>
<!-- poem-level annotation : tune_ attributes are optional -->
<div type="poem">
<!-- stanza-level annotation -->
<lg>
<!-- line-level annotation -->
<l>
<!-- word-level annotation -->
<w>
<!-- segment-level annotation -->
<seg>[text of the segment]</seg>...
</w>...
</l>...
</lg>...
<!-- JOLI annotation -->
<jolis>
<joli>
<jabline/>
<punchline/>
</joli>...
</jolis>
</div>
</body>
</text>
</RIRE>
The attributes of the elements are:
<div type="poem">
-
id: id of the poem
-
tune_name: name of the associated tune as stated in the text (optional)
-
tune_name_standardized: standardized name of the tune (optional)
-
tune_genre: genre of the tune (optional)
-
tune_composer: composer of the tune (optional)
-
for other attributes see https://crisco4.unicaen.fr/verlaine/index.php?navigation=description
<lg>
-
id: id of the stanza
-
type: stanza type, e.g. quatrain
-
rhyme: rhyme scheme, e.g. abba
<l>
-
id: id of the line
-
text: text of the line
-
for other attributes see https://crisco4.unicaen.fr/verlaine/index.php?navigation=description
<w>
-
id: id of the word
-
text: text of the word
-
pos_stanford: part-of-speech tag assigned Stanford Tagger
-
pos_treetagger: part-of-speech tag assigned by TreeTagger
-
lemma_treetagger: part-of-speech tag assigned by TreeTagger
<seg>
-
id: id of the segment
-
for other attributes see https://crisco4.unicaen.fr/verlaine/index.php?navigation=description
<joli>
-
id: unique id of the joli
-
from: id of word element where joli starts
-
to: id of word element where joli ends
-
so_actual_non: [Only one script is real/actualised], or [Only one script is literal (vs figurative)]
-
so_normal_abnormal: [Only one script is normal]
-
so_possible_impossible: [Only one script is possible]
-
so_good_bad: [Only one script is positive or negative]
-
so_life_death: [Script pitting life against death], or [Script pitting animated against inanimate]
-
so_obscenity: [Sexual/satological reference]
-
so_money: [Mention of money]
-
so_high_low_stature: [Mention of high/low stature]
-
so_human_non: [Only one script is human]
-
s1_s2: open field: [Short description of script 1 and script 2]
-
si: open field: [Brief description of the situation]
-
ta: open field: short mention of the target (if any)
-
la: open field: remarks about the language
-
la_argot: [use of slang]
-
la_cacography: [use of misspellings]
-
comment: optional comment
-
annotator: id of the annotator
<jabline>/<punchline>
-
from: id of word element where jabline/punchline starts
-
to: id of word element where jabline/punchline ends
-
comment: optional comment
JSON structure
JSON file follows similar logic as the XMLs. It is (in Python terms) a list of dicts representing individual poems. Beside keys listed above under <div type=”poem”> attributes, each poem holds an “lg” key, which contains a list of dicts representing individual stanzas. Beside keys listed above under <lg> attributes, each stanza holds an “l” key, which contains a list of dicts representing individual lines…
JOLIs are stored under the “jolis” key of individual poems.
Files
poems.json
Files
(2.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e73029f24a9e417a5e6ac55a2f39189f
|
2.0 GB | Preview Download |
|
md5:30b85e850c51f535f56deec1f0bb3c18
|
58.7 MB | Preview Download |