Dataset Open Access

The Online Conversation Threads Repository (Slashdot, Barrapunto, Wikipedia talk)

Gómez, Vicenç; Kaltenbrunner, Andreas; Laniado, David


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">V. Gómez, A. Kaltenbrunner, V. López (2008)     Statistical analysis of the social network and discussion threads in Slashdot. Proceedings of the 17th International World Wide Web Conference</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">V. Gómez, H. J. Kappen, N. Litvak, A. Kaltenbrunner (2013). A likelihood-based framework for the analysis of discussion threads. World Wide Web 16(5-6):645-675</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">D. Laniado, R. Tasso, Y. Volkovich, A. Kaltenbrunner (2011). When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages. Fifth International AAAI Conference on Weblogs and Social Media</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Slashdot</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Barrapunto</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Wikipedia</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Conversation threads</subfield>
  </datafield>
  <controlfield tag="005">20170908075829.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">Contain: /slashdot slashdot_with_comments.tgz (518M): compressed file with raw xml tree-YY-MM.mat (128M): conversation threads in Matlab structures format Fields data : struct with fields ; post data id : string ; identifier of the post user : string ; writer of the news post date : double ; seconds topics : string ; main topics tree : struct with fields ; hierarchical structure of the thread data : comment data: id, parentid, score, user and date parent : index in tree(index) of the parent (-42 for the root node) depth : depth in the thread child : vector of children /barrapunto raw_xml.tgz (85M): compressed file with raw xml tree-YY.mat (22M): conversation threads in Matlab structures /wikipedia AllArticleTitles.csv.tar.gz (85M): compressed file with article titles all_comments.csv.tar.gz (1.6G): compressed file with comments WP_tree_raw_data_X.mat (87M): conversation threads in Matlab structures Software: Matlab

This work is supported by the Spanish Ministry of Economy and Competitiveness
under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) and 
co-financed by the Marie Curie FP7-PEOPLE-2012-COFUND Action. Grant agreement
no: 600387</subfield>
  </datafield>
  <controlfield tag="001">165404</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain</subfield>
    <subfield code="a">Kaltenbrunner, Andreas</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Barcelona Media – Innovation Centre Information, Technology and Society Group</subfield>
    <subfield code="a">Laniado, David</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">110980658</subfield>
    <subfield code="z">md5:1a198c466a34e3ba38b106a0e693908f</subfield>
    <subfield code="u">https://zenodo.org/record/165404/files/barrapunto.rar</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">3291</subfield>
    <subfield code="z">md5:411ec60c7ede8c42c27ec76a26221dd8</subfield>
    <subfield code="u">https://zenodo.org/record/165404/files/README.txt</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">673642981</subfield>
    <subfield code="z">md5:3f75cc5bddfbd6ca7e0346e7214fb088</subfield>
    <subfield code="u">https://zenodo.org/record/165404/files/slashdot.rar</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">1763476626</subfield>
    <subfield code="z">md5:6da9215622016dc57df9766e6f3740f8</subfield>
    <subfield code="u">https://zenodo.org/record/165404/files/wikipedia.rar</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2008-04-25</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="p">user-ecfunded</subfield>
    <subfield code="p">user-mdm-dtic-upf</subfield>
    <subfield code="o">oai:zenodo.org:165404</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain</subfield>
    <subfield code="a">Gómez, Vicenç</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">The Online Conversation Threads Repository (Slashdot, Barrapunto, Wikipedia talk)</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-ecfunded</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-mdm-dtic-upf</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">600387</subfield>
    <subfield code="a">UPF Fellows</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/</subfield>
    <subfield code="a">Creative Commons Attribution 4.0</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This repository contains datasets with online conversation threads collected and analyzed by different researchers. Currently, you can find datsets from different news aggregators (Slashdot, Barrapunto) and the English Wikipedia talk pages.&lt;/p&gt;

&lt;p&gt;- Slashdot conversations (Aug 2005 - Aug 2006) Online conversations generated at Slashdot during a year. Posts and comments published between August 26th, 2005 and August 31th, 2006. For each discussion thread: sub-domains, title, topics and hierarchical relations between comments. For each comment: user, date, score and textual content. This dataset is different from the Slashdot Zoo social network (it is not a signed network of users) contained in the SNAP repository and represents the full version of the dataset used in the CAW 2.0 - Content Analysis for the WEB 2.0 workshop for the WWW 2009 conference that can be found in several repositories such as Konect Barrapunto conversations (Jan 2005 - Dec 2008)&lt;/p&gt;

&lt;p&gt;- Online conversations generated at Barrapunto (Spanish clone of Slashdot) during three years. For each discussion thread: sub-domains, title, topics and hierarchical relations between comments. For each comment: user, date, score and textual content Wikipedia (2001 - Mar 2010)&lt;/p&gt;

&lt;p&gt;- Data from articles discussions (talk) pages of the English Wikipedia as of March 2010. It contains comments on about 870,000 articles (i.e. all articles which had a corresponding talk page with at least one comment), in total about 9.4 million comments. The oldest comments date back to as early as 2001.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">handle</subfield>
    <subfield code="i">isIdenticalTo</subfield>
    <subfield code="a">10230/26270</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">handle</subfield>
    <subfield code="i">isCompiledBy</subfield>
    <subfield code="a">10230/26817</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">handle</subfield>
    <subfield code="i">isCompiledBy</subfield>
    <subfield code="a">10230/26746</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">handle</subfield>
    <subfield code="i">isCompiledBy</subfield>
    <subfield code="a">10230/26816</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.165404</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>

Share

Cite as