Processing written labels in AI2D-RST

doi:10.5281/zenodo.5834586

Published January 10, 2022 | Version 1.0.0

Software Open

Processing written labels in AI2D-RST

These scripts are intended for parsing the AI2D and AI2D-RST diagram corpora for certain linguistic features.

construct_dataframes.py

This script is executed first; it iterates over the corpora to find labels that function in certain rhetorical relations and their content, producing a pickled DataFrame in a subdirectory named processed_data.

form_figures.py

This script is executed afterwards. It processes labels by rhetorical relation and macro-group using spaCy, extracting part-of-speech (POS) patterns, phrase classes, and average word counts. It also produces CSV files in the processed_data subdirectory as well as heatmaps of the most commonly occurring POS patterns by relation and macro-group.

Files

Files (33.5 kB)

Name	Size	Download all
construct_dataframes.py md5:f23a93271378b7971297ef8b0dd9ce74	15.5 kB	Download
form_figures.py md5:84c8fb1f9ebf8e018fd6df1b2a06070a	15.8 kB	Download
zscore.py md5:126eb503436a305b757a0be12a301824	2.2 kB	Download

Views

Downloads

Show more details

	All versions	This version
Views	20	20
Downloads	2	2
Data volume	35.7 kB	35.7 kB

More info on how stats are collected....

DOI

Resource type

Software

Publisher

Zenodo

Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 10, 2022
Modified: January 11, 2022

Processing written labels in AI2D-RST

Creators

Description

Files

Files (33.5 kB)