Published January 29, 2025 | Version v1
Dataset Open

RIRE corpus

  • 1. ROR icon University of Basel
  • 2. Institute of Czech Literature, Czech Academy of Sciences

Description

The JIGS (Joke-like IncongruityGathering System) was developed as part of the SNSF-funded project “Le Rire des vers / Mining the comic verse”. It is used to tag JOLIS (Joke-Like Incongruity Segments)

A JOLIS (joke-like incongruity segment) is a text segment where:

  • Two possible meanings are present (S1/S2), and

  • The two meanings (scripts) are ovelapping in the same segment, and

  • The two scripts are incompatible.

If all three criteria are met, the segment is a JOLI-S

The corpus featured in the “Le Rire des vers / Mining the comic verse” contains 8255 French poems that have been annotated for versification features by means of Richard Renault's Malherbe. Some of these poems have been obtained from the anamètre database, others have been gathered and prepared by us. Part of the poems has been annotated for joke-like incongruities and for tunes they are associated with (tune annotation is still preliminary).

Corpus is stored in a single JSON file and simultaneously in XML files (one XML file per poem).

XML structure

The structure of each XML files is as follows:

<RIRE>
  <header>
    <title>[title of the poem]</title>
    <author>[author of the poem]</author>
    <book_title>[title of the book the poem comes from]</book_title>
    <project url="https://data.snf.ch/grants/grant/185674">Le Rire des vers</project>
  </header>
  <text>
    <body>
      <!-- poem-level annotation : tune_ attributes are optional -->
      <div type="poem">
        <!-- stanza-level annotation -->
        <lg>
          <!-- line-level annotation -->
          <l>
            <!-- word-level annotation -->
            <w>
              <!-- segment-level annotation -->
              <seg>[text of the segment]</seg>...
            </w>...
          </l>...
         </lg>...
 
        <!-- JOLI annotation -->
        <jolis>
          <joli>
            <jabline/>
            <punchline/>
          </joli>...
        </jolis>
      </div>
    </body>
  </text>
</RIRE>

The attributes of the elements are:

<div type="poem">

<lg>

  • id: id of the stanza

  • type: stanza type, e.g. quatrain

  • rhyme: rhyme scheme, e.g. abba

<l>

<w>

  • id: id of the word

  • text: text of the word

  • pos_stanford: part-of-speech tag assigned Stanford Tagger

  • pos_treetagger: part-of-speech tag assigned by TreeTagger

  • lemma_treetagger: part-of-speech tag assigned by TreeTagger

<seg>

<joli>

  • id: unique id of the joli

  • from: id of word element where joli starts

  • to: id of word element where joli ends

  • so_actual_non: [Only one script is real/actualised], or [Only one script is literal (vs figurative)] 

  • so_normal_abnormal: [Only one script is normal]

  • so_possible_impossible: [Only one script is possible]

  • so_good_bad: [Only one script is positive or negative]

  • so_life_death: [Script pitting life against death], or [Script pitting animated against inanimate]

  • so_obscenity: [Sexual/satological reference]

  • so_money: [Mention of money]

  • so_high_low_stature: [Mention of high/low stature]

  • so_human_non: [Only one script is human]

  • s1_s2: open field: [Short description of script 1 and script 2]

  • si: open field: [Brief description of the situation]

  • ta: open field: short mention of the target (if any)

  • la: open field: remarks about the language

  • la_argot: [use of slang]

  • la_cacography: [use of misspellings]

  • comment: optional comment

  • annotator: id of the annotator

<jabline>/<punchline>

  • from: id of word element where jabline/punchline starts

  • to: id of word element where jabline/punchline ends

  • comment: optional comment

JSON structure

JSON file follows similar logic as the XMLs. It is (in Python terms) a list of dicts representing individual poems. Beside keys listed above under <div type=”poem”> attributes, each poem holds an “lg” key, which contains a list of dicts representing individual stanzas. Beside keys listed above under <lg> attributes, each stanza holds an “l” key, which contains a list of dicts representing individual lines…

JOLIs are stored under the “jolis” key of individual poems.

 

Files

poems.json

Files (2.0 GB)

Name Size Download all
md5:e73029f24a9e417a5e6ac55a2f39189f
2.0 GB Preview Download
md5:30b85e850c51f535f56deec1f0bb3c18
58.7 MB Preview Download