Webis Health CauseNet 2022
Authors/Creators
- 1. Martin-Luther-Universität Halle-Wittenberg
- 2. Westfälische Wilhelms-Universität Münster
- 3. Friedrich-Schiller-Universität Jena
- 4. Bauhaus-Universität Weimar
- 5. Universität Leipzig
Description
An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause–effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Health CauseNet 2022, a new resource of 7.8 million health-related cause–effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled.
@InProceedings{schlatt2022health-causenet,
author = {Ferdinand Schlatt and
Dieter Bettin and
Matthias Hagen and
Benno Stein and
Martin Potthast},
booktitle = {29th International Conference on Computational Linguistics (COLING 2022)},
publisher = {Association for Computational Linguistics},
site = {Gyeongju, Republic of Korea},
title = {{Mining Health-related Cause-Effect Statements with High Precision at Large Scale}},
year = 2022
}
Files
Files
(1.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6a002156b662443151933208cf5f2d1d
|
539.1 MB | Download |
|
md5:711fd807d6388be3c09fa271787be3f3
|
380.6 MB | Download |
|
md5:ba8f34d5f08b33d8ebc38e5fdcf8af10
|
91.2 MB | Download |
|
md5:3c499286e3dbf46c92f82b70577d020f
|
85.0 MB | Download |