Published December 29, 2025 | Version 1.0
Dataset Open

G&T_COMMENTARY_TD: Portuguese Commentary Corpus – Discourse Type Annotation (v1.0)

  • 1. ROR icon Centro de Linguística
  • 2. ROR icon Universidade Nova de Lisboa

Description

G&T_COMMENTARY_TD is a FAIR-compliant, manually annotated corpus of Portuguese commentary texts, developed by the G&T – Gramática & Texto research group (CLUNL, NOVA FCSH).

The corpus comprises 82 commentary texts published in Portuguese newspapers and magazines between 2005 and 2016, segmented into 373 Discourse Type (TD) units, following the theoretical framework of Sociodiscursive Interactionism (SDI).

This public release provides only the structural segmentation and discourse-type annotation (DI, DT, RI, N, including citation contexts), with all original textual content removed due to copyright restrictions.

The dataset is distributed as a single XML file and is intended for discourse analysis, sequential modelling, graph-based approaches (including directed and multiplex networks), and quantitative–qualitative studies of discourse organisation and genre tendencies.

Files

Commenta_Fase2_V3_TDonly.xml

Files (21.0 kB)

Name Size Download all
md5:305194b148aac29eda2bdfd01275a577
12.2 kB Preview Download
md5:b16618f53c3dfb6fb319ebca81366618
1.9 kB Preview Download
md5:e73db07bd23961371bafe9cae7b217f1
6.8 kB Preview Download