Published May 22, 2019 | Version v1
Journal article Open

Bi-directional Relevance Matching Between Medical Corpora

  • 1. University of Virginia
  • 2. Cochrane

Description

Readily available, trustworthy, and usable medical information is vital to promoting global health. Cochrane is a non-profit medical organization that conducts and publishes systematic reviews of medical research findings. Over 3000 Cochrane Reviews are presently used as evidence in Wikipedia articles. Currently, Cochrane’s researchers manually search Wikipedia pages related to medicine in order to identify Wikipedia articles that can be improved with Cochrane evidence. Our aim is to streamline this process by applying existing document similarity and information retrieval methods to automatically link Wikipedia articles and Cochrane Reviews. Potential challenges to this project include document length and the specificity of the corpora. These challenges distinguish this problem from ordinary document representation and retrieval problems. For our methodology, we worked with data from 7400 Cochrane Reviews, ranging from one to several pages in length, and 33,000 Wikipedia articles categorized as medical. We explored different methods of document vectorization including TFIDF, LDA, LSA, word2Vec, and doc2Vec. For every document in both corpora, their similarity to each document in the opposing set was calculated using established vector similarity metrics such as cosine similarity and KLdivergence. Labeled data for this unsupervised task was not available. Models were evaluated by comparing the results to two standards: (1) Cochrane Reviews currently cited in Wikipedia articles and (2) a data set provided by a medical expert that indicates which Cochrane Reviews could be considered for specific Wikipedia articles. Our system performs best using TFIDF document representation and cosine similarity.

Files

Bi-directional Relevance Matching Between Medical Corpora.pdf

Files (429.3 kB)