There is a newer version of this record available.

Software Open Access

Replication package for identify bot comments

Mehdi Golzadeh; Alexandre Decan; Eleni Constantinou; Tom Mens

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">GitHub, automated comments, distributed software development, classification model, empirical analysis</subfield>
  <controlfield tag="005">20210305002722.0</controlfield>
  <controlfield tag="001">4580394</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0002-5824-5823</subfield>
    <subfield code="a">Alexandre Decan</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Eindhoven University of Technology</subfield>
    <subfield code="0">(orcid)0000-0002-4242-2581</subfield>
    <subfield code="a">Eleni Constantinou</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0003-3636-5020</subfield>
    <subfield code="a">Tom Mens</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">18979308</subfield>
    <subfield code="z">md5:4a02e7ae7f5e68b2f9d9abcd96765913</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-05-22</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Mons</subfield>
    <subfield code="0">(orcid)0000-0003-1041-439X</subfield>
    <subfield code="a">Mehdi Golzadeh</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Replication package for identify bot comments</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u"></subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This repository contains the replication package for our study about identifying bots at the level of their activity in GitHub submitted to BotSE&amp;#39;21 conference (*&amp;quot;Identifying bot activity in GitHub pull request and issue comments&amp;quot;*).&lt;br&gt;
A link to the paper will be added to this README as soon as the paper is accepted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ground-truth dataset&lt;/strong&gt;&lt;br&gt;
The dataset is extracted from the ground-truth dataset of our study about [identifying bots]( published in JSS journal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication package&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A- Dataset preparation.ipynb: This notebook splits the dataset to two disjoint set for training and test purposes. To avoid any conflict with GDPR regulations we&amp;#39;ve anonymised the account name columns.&lt;/p&gt;

&lt;p&gt;B- Model construction.ipynb: We followed a Grid-search cross validation in this notebook to find the best classifier and construct the final mode. The replication package was originally created on Python 3.8&amp;nbsp; and the dependencies required to run these notebooks are listed in requirements.txt and can be automatically installed using pip install -r requirements.txt.&lt;/p&gt;

&lt;p&gt;C- Model evaluation.ipynb: this notebook contains scripts to evaluate the classifier.&lt;/p&gt;</subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.4580393</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.4580394</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
All versions This version
Views 525
Downloads 110
Data volume 101.8 MB0 Bytes
Unique views 484
Unique downloads 110


Cite as