On the contribution of specific entity detection and comparative construction to automatic spin detection in biomedical scientific publications
In this article we address the problem of providing automatized aid for the detection of misrepresentation (spin) of research results in scientific publications from the biomedical domain. For identifying automatically inadequate claims in medical articles, i.e. claims that state the beneficial effect of the experimental treatment to be greater than it is actually proven by the research results, we propose a Natural Language Processing (NLP) approach. We first make a review of related works and an NLP analysis of the problem; then we present our first results obtained on the type of publications most likely amenable to automatic processing: articles which present results of Randomized Controlled Trials (RCTs), i.e. comparisons done by applying the experimental or standard treatment on different registered patient groups. Our results concern the identification of specific entities necessarily present in an RCT description (here outcomes and patient groups), obtained with basic methods (local grammars) on a corpus extracted from the PubMed open archive. Then we describe our findings on the support we could gain by identifying comparative constructions and their relationship to the identified entities as preliminary step for deploying sentiment analysis as one of the constituent functionalities of our automatic spin detection algorithm.
Ballard, Bruce W., 1988. A general computational treatment of comparatives for natural language question answering. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics. Buffalo, New York, USA: Association for Computational Linguistics.
Boutron, I., D. G. Altman, S. Hopewell, F. Vera-Badillo, I. Tannock, and P. Ravaud, 2014. Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the spiin randomized controlled trial. Journal of Clinical Oncology, 32.
Boutron, I., S. Dutton, P. Ravaud, and D.G. Altman, 2010. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA, 303:20582064.
Bruijn, B. De, S. Carini, S. Kiritchenko, J. Martin, and I. Sim, 2008. Automated information extraction of key trial design elements from clinical trial publications. In Proceedings of the AMIA Annual Symposium.
Chung, G. Y., 2009. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform, 42(5):790–800.
Dawes, M., P. Pluye, L. Shea, R. Grad, A. Greenberg, and J.-N. Nie, 2007. The identification of clinically important elements within medical journal abstracts: Patient-population-problem, exposure-intervention, comparison, outcome, duration and results (pecodr). Informatics in Primary Care, 15(1):916.
Friedman, Carol, 1989. A general computational treatment of the omparative. In 27th Annual Meeting of the Association for Computational Linguistics.
Ganapathibhotla, Murthy and Bing Liu, 2008. Mining opinions in comparative sentences. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). Coling 2008 Organizing Committee.
Gupta, Samir, A. S. M. Ashique Mahmood, Karen E. Ross, Cathy H. Wu, and K. Vijay-Shanker, 2017. Identifying comparative structures in biomedical text. In Proceedings of the BioNLP 2017 workshop.
Hatzivassiloglou, Vasileios and Janyce M. Wiebe, 2000. Effects of adjective orientation and gradability on sentence subjectivity. In COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics.
Higgins, J.P. and S. Green (eds.), 2008. Cochrane handbook for systematic reviews of interventions. West Sussex: Wiley & Sons Ltd.
Higgins, J.P., D.G. Altman, and P.C. Gotzsche, 2011. The cochrane collaboration's tool for assessing risk oof bias in randomised trials. BMJ, 343:d5928.
Kiritchenko, S., B. De Bruijn, S. Carini, J. Martin, and I. Sim, 2010. Exact: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak., 10:56-10.1186/1472-6947-10-56.
Li, Shasha, Chin-Yew Lin, Young-In Song, and Zhoujun Li, 2010. Comparable entity mining from comparative questions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics.
Marchall, I.J., J. Kuiper, and B.C. Wallace, 2015. Robotreviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association, ocv044.
Nguyen, N., M. Miwa, Y. Tsuruoka, and S. Tojo, 2013. Open information extraction from biomedical literature using predicate-argument structure patterns. In Proceedings of the 5th International Symposium on Languages in Biology and Medicine.
Olawsky, Duane E., 1989. The lexical semantics of comparative expressions in a multi-level semantic processor. In 27th Annual Meeting of the Association for Computational Linguistics.
Paumier, S., 2016. Unitex 3.1 user manual. http://unitexgramlab.org/releases/3.1/man/Unitex-GramLab-3.1-usermanual-en.pdf.
Ryan, Karen, 1981. Corepresentational grammar and parsing english comparatives. In Proceedings of the 19th Annual Meeting of the Association for Computational Linguistics. Stanford, California, USA: Association for Computational Linguistics.
Summerscales, R. L., S. Argamon, J. Hupert, and A. Schwartz, 2009. Identifying treatments, groups, and outcomes in medical abstracts. In Proceedings of the Sixth Midwest Computational Linguistics Colloquium (MCLC).
Summerscales, R.L., S. Argamon, S. Bai, J. Hupert, and A. Schwartz, 2011. Automatic summarization of results from clinical trials. In The 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
Wallace, B. C., J. Kuiper, A. Sharma, M. Zhu, and I.J. Marchall, 2016. Extracting pico sentences from clinical trial reports using supervised distant supervision. Journal of Machine Learning Research, 17(132):125.
Xu, R., Y. Garten, K.S. Supekar, A.K. Das, R.B. Altman, and A.M. Garber, 2007. Extracting subject demographic information from abstracts of randomized clinical trial reports. Amsterdam: IOS Press.
Yang, Seon and Youngjoong Ko, 2011. Extracting comparative entities and predicates from texts using comparative type classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
Yavchitz, A., P. Ravaud, D.G. Altman, D. Moher, A. Hrobjartsson, T. Lasserson, and I. Boutron, 2016. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. Journal of Clinical Epidemiology, 75:56–65.