Published September 29, 2022 | Version v1
Dataset Open

Webis Health CauseNet 2022

  • 1. Martin-Luther-Universität Halle-Wittenberg
  • 2. Westfälische Wilhelms-Universität Münster
  • 3. Friedrich-Schiller-Universität Jena
  • 4. Bauhaus-Universität Weimar
  • 5. Universität Leipzig

Description

An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine. We propose a new efficient and effective termhood score for predicting the health relatedness of phrases and sentences, which achieves 69% recall at over 90% precision on a web dataset with cause–effect statements. It is more effective than state-of-the-art medical entity linkers and as effective but much faster than BERT-based approaches. Using our method, we compile the Webis Health CauseNet 2022, a new resource of 7.8 million health-related cause–effect statements such as “Studies show that stress induces insomnia” in which the cause (‘stress’) and effect (‘insomnia’) are labeled.

@InProceedings{schlatt2022health-causenet,
  author    = {Ferdinand Schlatt and 
               Dieter Bettin and 
               Matthias Hagen and 
               Benno Stein and 
               Martin Potthast},
  booktitle = {29th International Conference on Computational Linguistics (COLING 2022)},
  publisher = {Association for Computational Linguistics},
  site      = {Gyeongju, Republic of Korea},
  title     = {{Mining Health-related Cause-Effect Statements with High Precision at Large Scale}},
  year      = 2022
}

 

Files

Files (1.1 GB)

Name Size Download all
md5:6a002156b662443151933208cf5f2d1d
539.1 MB Download
md5:711fd807d6388be3c09fa271787be3f3
380.6 MB Download
md5:ba8f34d5f08b33d8ebc38e5fdcf8af10
91.2 MB Download
md5:3c499286e3dbf46c92f82b70577d020f
85.0 MB Download