Poster Open Access

Leveraging Open Access publishing to fight fake news

Sylvain Massip; Charles Letaillieur

Since the very first experiences in Open Access publishing at the end of 20 th century,
(arXiv and PLOS, two pioneers of open access distribution of academic articles were
created in 1991 and 2001, respectively), Open Access has developed tremendously.

Today, a significant fraction of research is published open access. Evaluation estimates
it to be as high as 28% [Piwowar, 2018] and it occupies an ever-growing position in the
scientific debate with the adoption, in 2018 of the plan S which creates an European
level mandate for Open Access.

In addition to being ethically desirable per se, there are many academic, economic and
societal arguments in favor of open access. These arguments, based on an improvement
of the exploitation and reuse of research results, are well described theoretically in the
litterature [Tennant, 2017]. Nevertheless, the practical demonstration of the use of Open
Access outside research communities are not common, and we have not many reports of
these. The objective of our project is to illustrate the possible uses of Open Access
outside of academia.

In this study, we will examine how open access combined with the right machine
learning tools can help fight fake news.

Natural Language processing has been revolutionized these last years, by the use of
neural networks based language models such as word2Vec [Mikolov, 2013] and Bert
[Devlin, 2018].

By building space representation of the words and concepts used in texts, these models
are able to take into account the meanings of studied texts. These methods have been
shown to be of use to create knowledge bases from corpus of texts [Petroni, 2019] in a
unsupervised manner. More specifically, [Tshitoyan, 2019] has shown that these
methods, applied to a scientific corpus in an unsupervised manner, were able to retrieve
the links between concepts that exists in the texts.

This study will investigate how these principles will be used to build a text-mining
pipeline that indicates whether a scientific claim is backed by the scientific literature or

In this exploratory phase, the following methods will be applied:

  • data from Euro Pubmed Central database will be used to train a Word2Vec model.
  • claims will be restricted to health-related questions of the pattern “Does X cure/cause/prevent Y?”.
  • Claims will then be classified by exploring the links between X, Y and the concept of cure / cause / prevent as learned in the language model.

The pipeline will be evaluated with claims taken from expert-based scientific
fact-checking network such as or

By validating the principle of fact-checking scientific claims with Open Access
literature, we hope to pave the way to improved automatic fact-checking tools, which
will allow an increased understanding of research results by the broad public and to
show a strong impact of open science in society.

Files (435.8 kB)
Name Size
331.7 kB Download
104.2 kB Download
All versions This version
Views 669669
Downloads 9595
Data volume 27.9 MB27.9 MB
Unique views 635635
Unique downloads 8282


Cite as