Poster Open Access

Leveraging Open Access publishing to fight fake news

Sylvain Massip; Charles Letaillieur

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3776797", 
  "language": "eng", 
  "title": "Leveraging Open Access publishing to fight fake news", 
  "issued": {
    "date-parts": [
  "abstract": "<p>Since the very first experiences in Open Access publishing at the end of 20 th century,<br>\n(arXiv and PLOS, two pioneers of open access distribution of academic articles were<br>\ncreated in 1991 and 2001, respectively), Open Access has developed tremendously.</p>\n\n<p>Today, a significant fraction of research is published open access. Evaluation estimates<br>\nit to be as high as 28% [Piwowar, 2018] and it occupies an ever-growing position in the<br>\nscientific debate with the adoption, in 2018 of the plan S which creates an European<br>\nlevel mandate for Open Access.</p>\n\n<p>In addition to being ethically desirable per se, there are many academic, economic and<br>\nsocietal arguments in favor of open access. These arguments, based on an improvement<br>\nof the exploitation and reuse of research results, are well described theoretically in the<br>\nlitterature [Tennant, 2017]. Nevertheless, the practical demonstration of the use of Open<br>\nAccess outside research communities are not common, and we have not many reports of<br>\nthese. The objective of our project is to illustrate the possible uses of Open Access<br>\noutside of academia.</p>\n\n<p>In this study, we will examine how open access combined with the right machine<br>\nlearning tools can help fight fake news.</p>\n\n<p>Natural Language processing has been revolutionized these last years, by the use of<br>\nneural networks based language models such as word2Vec [Mikolov, 2013] and Bert<br>\n[Devlin, 2018].</p>\n\n<p>By building space representation of the words and concepts used in texts, these models<br>\nare able to take into account the meanings of studied texts. These methods have been<br>\nshown to be of use to create knowledge bases from corpus of texts [Petroni, 2019] in a<br>\nunsupervised manner. More specifically, [Tshitoyan, 2019] has shown that these<br>\nmethods, applied to a scientific corpus in an unsupervised manner, were able to retrieve<br>\nthe links between concepts that exists in the texts.</p>\n\n<p>This study will investigate how these principles will be used to build a text-mining<br>\npipeline that indicates whether a scientific claim is backed by the scientific literature or<br>\nnot.</p>\n\n<p>In this exploratory phase, the following methods will be applied:</p>\n\n<ul>\n\t<li>data from Euro Pubmed Central database will be used to train a Word2Vec model.</li>\n\t<li>claims will be restricted to health-related questions of the pattern &ldquo;Does X cure/cause/prevent Y?&rdquo;.</li>\n\t<li>Claims will then be classified by exploring the links between X, Y and the concept of cure / cause / prevent as learned in the language model.</li>\n</ul>\n\n<p>The pipeline will be evaluated with claims taken from expert-based scientific<br>\nfact-checking network such as or</p>\n\n<p>By validating the principle of fact-checking scientific claims with Open Access<br>\nliterature, we hope to pave the way to improved automatic fact-checking tools, which<br>\nwill allow an increased understanding of research results by the broad public and to<br>\nshow a strong impact of open science in society.</p>", 
  "author": [
      "family": "Sylvain Massip"
      "family": "Charles Letaillieur"
  "id": "3776797", 
  "event-place": "Berlin, Germany", 
  "type": "graphic", 
  "event": "Open Science Conference 2020"
All versions This version
Views 386386
Downloads 7272
Data volume 21.8 MB21.8 MB
Unique views 363363
Unique downloads 6363


Cite as