Dataset Open Access

Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages

Limsopatham, Nut; Collier, Nigel


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">natural language processing</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">lexical semantics</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">machine translation</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">biomedical</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">named entity</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">normalization</subfield>
  </datafield>
  <controlfield tag="005">20190410034156.0</controlfield>
  <controlfield tag="001">27354</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">17-21 September 2015</subfield>
    <subfield code="g">EMNLP 2015</subfield>
    <subfield code="a">Conference on Empirical Methods in Natural Language Processing</subfield>
    <subfield code="c">Lisboa, Portugal</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Cambridge</subfield>
    <subfield code="a">Collier, Nigel</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">255421</subfield>
    <subfield code="z">md5:de242f0764e5c875bc671300d1da1abf</subfield>
    <subfield code="u">https://zenodo.org/record/27354/files/EMNLP_supplement.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1743293836</subfield>
    <subfield code="z">md5:e5e320a3e3e112473243dfed1351717a</subfield>
    <subfield code="u">https://zenodo.org/record/27354/files/word-vector-examples.zip</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">11307</subfield>
    <subfield code="z">md5:673803dbdef98794112104861e099d94</subfield>
    <subfield code="u">https://zenodo.org/record/27354/files/EMNLP_gold_standard.txt</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u">http://www.emnlp2015.org/accepted-papers.html</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2015-08-10</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o">oai:zenodo.org:27354</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Cambridge</subfield>
    <subfield code="a">Limsopatham, Nut</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/publicdomain/zero/1.0/legalcode</subfield>
    <subfield code="a">Creative Commons Zero v1.0 Universal</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Data and supplementary information for the paper entitled &amp;quot;Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages&amp;quot; to be published at EMNLP 2015: Conference on Empirical Methods in Natural Language Processing &amp;mdash; September 17&amp;ndash;21, 2015 &amp;mdash; Lisboa, Portugal.&lt;/p&gt;

&lt;p&gt;ABSTRACT: Previous studies have shown that health reports in social media, such as DailyStrength and Twitter, have potential for monitoring health conditions (e.g. adverse drug reactions, infectious diseases) in particular communities. However, in order for a machine to understand and make inferences on these health conditions, the ability to recognise when laymen&amp;#39;s terms refer to a particular medical concept (i.e. text normalisation) is required. To achieve this, we propose to adapt an existing phrase-based machine translation (MT) technique and a vector representation of words to map between a social media phrase and a medical concept. We evaluate our proposed approach using a collection of phrases from tweets related to adverse drug reactions. Our experimental results show that the combination of a phrase-based MT technique and the similarity between word vector representations outperforms the baselines that apply only either of them by up to 55%.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.27354</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
94
36
views
downloads
All versions This version
Views 9494
Downloads 3636
Data volume 20.9 GB20.9 GB
Unique views 8888
Unique downloads 2323

Share

Cite as