Published January 10, 2022 | Version 1.0.0
Software Open

Processing written labels in AI2D-RST

Description

These scripts are intended for parsing the AI2D and AI2D-RST diagram corpora for certain linguistic features.

 

construct_dataframes.py

This script is executed first; it iterates over the corpora to find labels that function in certain rhetorical relations and their content, producing a pickled DataFrame in a subdirectory named processed_data.

 

form_figures.py

This script is executed afterwards. It processes labels by rhetorical relation and macro-group using spaCy, extracting part-of-speech (POS) patterns, phrase classes, and average word counts. It also produces CSV files in the processed_data subdirectory as well as heatmaps of the most commonly occurring POS patterns by relation and macro-group.

Files

Files (33.5 kB)

Name Size Download all
md5:f23a93271378b7971297ef8b0dd9ce74
15.5 kB Download
md5:84c8fb1f9ebf8e018fd6df1b2a06070a
15.8 kB Download
md5:126eb503436a305b757a0be12a301824
2.2 kB Download