Published September 19, 2023
| Version 1.0
Dataset
Restricted
FRACAS: FRench Annotated Corpus of Attribution relations in newS
Authors/Creators
- 1. PACTE, LIG, Université Grenoble-Alpes
- 2. LIG, Université Grenoble-Alpes
- 3. LIG, INP, CNRS, Université Grenoble-Alpes
Description
A human-annotated corpus for French quotation extraction containing 1676 newswire texts with 10 965 annotated attribution relations (quotes attributed to its speaker).
Data: 1676 newswire texts in French from Reuters annotated with 10 965 attribution relations
Date: April 1995 to April 1996
Data structure:
{
"text": text of the newswire,
"entities": a list of each entity in the following format ["id": unique_id, "text": text of entity, "label": entity label, "gender": gender (if labelled), "char_span": a list of character index span]
"relations": a list of each relation in the following format [id of relation, label of relation, id of first entity, id of second entity
}
Labels:
- Entities:
- Quotation (Direct, Indirect or Mixed)
- Speaker (Agent, Organization, Group of People, Source Pronoun)
- Cue
- Attributes:
- Speaker Gender (Male, Female, Mixed, Unknown, Other)
- Relations:
- Speaker Quoted in Quotation
- Cue Indicates Quotation
- Source Pronoun Refers to Speaker
Files
Additional details
Related works
- Is described by
- Preprint: arXiv:2309.10604 (arXiv)
- Conference paper: https://aclanthology.org/2024.lrec-main.654 (URL)