Poster Open Access

Leveraging Open Access publishing to fight fake news

Sylvain Massip; Charles Letaillieur


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.3776797</identifier>
  <creators>
    <creator>
      <creatorName>Sylvain Massip</creatorName>
      <affiliation>Opscidia</affiliation>
    </creator>
    <creator>
      <creatorName>Charles Letaillieur</creatorName>
      <affiliation>Opscidia</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Leveraging Open Access publishing to fight fake news</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2020</publicationYear>
  <subjects>
    <subject>Open Access</subject>
    <subject>Text-mining</subject>
    <subject>Fake News</subject>
    <subject>Fact-checking</subject>
    <subject>Word2Vec</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2020-04-30</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Text">Poster</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3776797</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.3776796</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/osc2020</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;Since the very first experiences in Open Access publishing at the end of 20 th century,&lt;br&gt;
(arXiv and PLOS, two pioneers of open access distribution of academic articles were&lt;br&gt;
created in 1991 and 2001, respectively), Open Access has developed tremendously.&lt;/p&gt;

&lt;p&gt;Today, a significant fraction of research is published open access. Evaluation estimates&lt;br&gt;
it to be as high as 28% [Piwowar, 2018] and it occupies an ever-growing position in the&lt;br&gt;
scientific debate with the adoption, in 2018 of the plan S which creates an European&lt;br&gt;
level mandate for Open Access.&lt;/p&gt;

&lt;p&gt;In addition to being ethically desirable per se, there are many academic, economic and&lt;br&gt;
societal arguments in favor of open access. These arguments, based on an improvement&lt;br&gt;
of the exploitation and reuse of research results, are well described theoretically in the&lt;br&gt;
litterature [Tennant, 2017]. Nevertheless, the practical demonstration of the use of Open&lt;br&gt;
Access outside research communities are not common, and we have not many reports of&lt;br&gt;
these. The objective of our project is to illustrate the possible uses of Open Access&lt;br&gt;
outside of academia.&lt;/p&gt;

&lt;p&gt;In this study, we will examine how open access combined with the right machine&lt;br&gt;
learning tools can help fight fake news.&lt;/p&gt;

&lt;p&gt;Natural Language processing has been revolutionized these last years, by the use of&lt;br&gt;
neural networks based language models such as word2Vec [Mikolov, 2013] and Bert&lt;br&gt;
[Devlin, 2018].&lt;/p&gt;

&lt;p&gt;By building space representation of the words and concepts used in texts, these models&lt;br&gt;
are able to take into account the meanings of studied texts. These methods have been&lt;br&gt;
shown to be of use to create knowledge bases from corpus of texts [Petroni, 2019] in a&lt;br&gt;
unsupervised manner. More specifically, [Tshitoyan, 2019] has shown that these&lt;br&gt;
methods, applied to a scientific corpus in an unsupervised manner, were able to retrieve&lt;br&gt;
the links between concepts that exists in the texts.&lt;/p&gt;

&lt;p&gt;This study will investigate how these principles will be used to build a text-mining&lt;br&gt;
pipeline that indicates whether a scientific claim is backed by the scientific literature or&lt;br&gt;
not.&lt;/p&gt;

&lt;p&gt;In this exploratory phase, the following methods will be applied:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;data from Euro Pubmed Central database will be used to train a Word2Vec model.&lt;/li&gt;
	&lt;li&gt;claims will be restricted to health-related questions of the pattern &amp;ldquo;Does X cure/cause/prevent Y?&amp;rdquo;.&lt;/li&gt;
	&lt;li&gt;Claims will then be classified by exploring the links between X, Y and the concept of cure / cause / prevent as learned in the language model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline will be evaluated with claims taken from expert-based scientific&lt;br&gt;
fact-checking network such as metafact.io or sciencefeedback.co.&lt;/p&gt;

&lt;p&gt;By validating the principle of fact-checking scientific claims with Open Access&lt;br&gt;
literature, we hope to pave the way to improved automatic fact-checking tools, which&lt;br&gt;
will allow an increased understanding of research results by the broad public and to&lt;br&gt;
show a strong impact of open science in society.&lt;/p&gt;</description>
  </descriptions>
</resource>
379
72
views
downloads
All versions This version
Views 379379
Downloads 7272
Data volume 21.8 MB21.8 MB
Unique views 356356
Unique downloads 6363

Share

Cite as