There is a newer version of the record available.

Published September 6, 2021 | Version 1.0.0
Dataset Open

Classification of hierarchical text using geometric deep learning: the case of clinical trials corpus

Description

We consider the hierarchical representation of documents as graphs and use geometric deep learning to classify them into different categories. While graph neural networks can efficiently handle the variable structure of hierarchical documents using the permutation invariant message passing operations, we show that we can gain extra performance improvements using our proposed selective graph pooling operation that arises from the fact that some parts of the hierarchy are invariable across different documents. We applied our model to classify clinical trial (CT) protocols into completed and terminated categories. We use bag-of-words based as well as pre-trained transformer-based embeddings to featurize the graph nodes, achieving f1-scores $\simeq 0.85$ on a publicly available large scale CT registry of around 360K protocols. We further demonstrate how the selective pooling can add insights into the CT termination status prediction.

Files

AllAPIJSON.zip

Files (1.9 GB)

Name Size Download all
md5:d22ebbb3d9742d38fc7aa00fb60ceb1f
1.9 GB Preview Download
md5:363840bf4499cfc5f530ccd09c04ed85
122 Bytes Preview Download
md5:a4ba4e2ed469f159a751e22fe5b2310e
398.6 kB Preview Download
md5:c3c8b94fe2c5ec37e1e19212dc54271f
1.9 MB Preview Download
md5:8e3d507883c1f5a1e71880d1e24479ad
395.7 kB Preview Download