Generalizable and Scalable Multistage Biomedical Concept Normalization Leveraging Large Language Models
Creators
Description
Background: Biomedical entity normalization is critical to biomedical research because the richness of
free-text clinical data, such as progress notes, can often be fully leveraged only after translating words and
phrases into structured and coded representations suitable for analysis. Large Language Models (LLMs),
in turn, have shown great potential and high performance in a variety of natural language processing (NLP)
tasks, but their application for normalization remains understudied.
Methods: We applied both proprietary and open-source LLMs in combination with several rule-based nor-
malization systems commonly used in biomedical research. We used a two-step LLM integration approach,
(1) using an LLM to generate alternative phrasings of a source utterance, and (2) to prune candidate UMLS
concepts, using a variety of prompting methods. We measure results by Fβ , where we favor recall over
precision, and F1.
Results: We evaluated a total of 5,523 concept terms and text contexts from a publicly available dataset
of human-annotated biomedical abstracts. Incorporating GPT-3.5-turbo increased overall Fβ and F1 in nor-
malization systems +16.5 and +16.2 (OpenAI embeddings), +9.5 and +7.3 (MetaMapLite), +13.9 and +10.9
(QuickUMLS), and +10.5 and +10.3 (BM25), while the open-source Vicuna model achieved +20.2 and
+21.7 (OpenAI embeddings), +10.8 and +12.2 (MetaMapLite), +14.7 and +15 (QuickUMLS), and +15.6
and +18.7 (BM25).
Conclusions: Existing general-purpose LLMs, both propriety and open-source, can be leveraged to greatly
improve normalization performance using existing tools, with no fine-tuning.
Files
requirements.txt
Files
(122.6 kB)
Name | Size | Download all |
---|---|---|
md5:1cb1b14c3d2910d15179b0079dd6a500
|
10.1 kB | Download |
md5:4a6d60515c816d34d870e4dd93eb5b98
|
9.4 kB | Download |
md5:d78d83b16c09e415dcf0f00b6c7acc45
|
9.6 kB | Download |
md5:0175c83f9f3e52e67c7659af7c7874ef
|
9.3 kB | Download |
md5:51364f39976e6461d542f0ff6d5b0679
|
1.1 kB | Download |
md5:2a46ae81a2c8b717b607a6475893a749
|
9.8 kB | Download |
md5:3dad2b4aee96333b74e3a87afcd0127a
|
2.3 kB | Download |
md5:eaed865be94c26c6abcb8ba9777cce6a
|
3.9 kB | Download |
md5:47a2de9d81b3f79d8e41750d87dfe5ed
|
2.6 kB | Download |
md5:4a6c49231154861fd2695494e76d0073
|
1.3 kB | Download |
md5:baaa83df39e53bf6556b05b471a6bce0
|
3.4 kB | Download |
md5:70aacf66769d971f6f265d13fc825dc9
|
4.6 kB | Download |
md5:4850fe7f944064ac247f41550b6c8d24
|
2.9 kB | Download |
md5:00a21d1bc27fc88bb55b6a9920b77b44
|
6.9 kB | Download |
md5:ac3e09d60184eea6dc9a65acf846ade7
|
5.0 kB | Download |
md5:b054a6530cd7de7f35d4e584874ef182
|
12.1 kB | Download |
md5:6d66ccd21dec531c9d08a7400361e0f9
|
3.3 kB | Download |
md5:f99e00d53e46a8ec76461812b909b184
|
3.9 kB | Download |
md5:fa479d9d2d18ebd571a2a6c1d380c0dc
|
1.2 kB | Download |
md5:422871b6cef94d237912376bcfa0a4ef
|
1.2 kB | Download |
md5:32651fb907e636eb7fb25cf3aa8d3adb
|
7.5 kB | Download |
md5:8bb520bca82efc1240a1cc9482c9d8aa
|
7.0 kB | Download |
md5:85bc60cc1181fcc88ef0ca0256794b73
|
3.9 kB | Download |
md5:68804a51a7e9ff02ab0acb9dfa54dd0b
|
95 Bytes | Preview Download |
md5:5045724a33e838fe646f58e5f7bef0fe
|
357 Bytes | Download |