Processing written labels in AI2D-RST
Description
These scripts are intended for parsing the AI2D and AI2D-RST diagram corpora for certain linguistic features.
construct_dataframes.py
This script is executed first; it iterates over the corpora to find labels that function in certain rhetorical relations and their content, producing a pickled DataFrame in a subdirectory named processed_data
.
form_figures.py
This script is executed afterwards. It processes labels by rhetorical relation and macro-group using spaCy, extracting part-of-speech (POS) patterns, phrase classes, and average word counts. It also produces CSV files in the processed_data
subdirectory as well as heatmaps of the most commonly occurring POS patterns by relation and macro-group.
Files
Files
(33.5 kB)
Name | Size | Download all |
---|---|---|
md5:f23a93271378b7971297ef8b0dd9ce74
|
15.5 kB | Download |
md5:84c8fb1f9ebf8e018fd6df1b2a06070a
|
15.8 kB | Download |
md5:126eb503436a305b757a0be12a301824
|
2.2 kB | Download |