Dataset Restricted Access
Stamatatos, Efstathios;
Daelemans Daelemans amd Ben Verhoeven, Walter;
Juola, Patrick;
López-López, Aurelio;
Potthast, Martin;
Stein, Benno
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Efstathios Stamatatos, Walter Daelemans amd Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. Overview of the Author Identification Task at PAN 2015. In Linda Cappellato, Nicola Ferro, Gareth Jones, and Eric San Juan, editors, CLEF 2015 Evaluation Labs and Workshop – Working Notes Papers, 8-11 September, Toulouse, France, September 2015. CEUR-WS.org. ISSN 1613-0073.</subfield> </datafield> <datafield tag="999" ind1="C" ind2="5"> <subfield code="x">Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. Overview of the PAN/CLEF 2015 Evaluation Lab. In Josiane Mothe et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 6th International Conference of the CLEF Initiative (CLEF 2015), pages 518-538, Berlin Heidelberg New York, September 2015. Springer. ISBN 978-3-319-24026-8.</subfield> </datafield> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">authorship</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">verification</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">pan</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">2015</subfield> </datafield> <controlfield tag="005">20200402082013.0</controlfield> <controlfield tag="001">3737563</controlfield> <datafield tag="711" ind1=" " ind2=" "> <subfield code="g">PAN at CLEF 2015</subfield> <subfield code="a">Conference title: PAN at Conference and Labs of the Evaluation Forum 2015</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="a">Daelemans Daelemans amd Ben Verhoeven, Walter</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="a">Juola, Patrick</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="a">López-López, Aurelio</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Universität Leipzig</subfield> <subfield code="0">(orcid)0000-0003-2451-0665</subfield> <subfield code="a">Potthast, Martin</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Bauhaus-Universität Weimar</subfield> <subfield code="0">(orcid)0000-0001-9033-2217</subfield> <subfield code="a">Stein, Benno</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">restricted</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2015-09-08</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-pan</subfield> <subfield code="o">oai:zenodo.org:3737563</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">Stamatatos, Efstathios</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">PAN15 Author Identification: Verification</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-pan</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>We provide you with a training corpus that comprises a set of author verification problems in several languages/genres. Each problem consists of some (up to five) known documents by a single person and exactly one questioned document. All documents within a single problem instance will be in the same language. However, their genre and/or topic may differ significantly. The document lengths vary from a few hundred to a few thousand words.</p> <p>The documents of each problem are located in a separate folder, the name of which (problem ID) encodes the language of the documents. The following list shows the available sub-corpora, including their language, type (cross-genre or cross-topic), code, and examples of problem IDs:</p> <p>Language; Type; Code; Problem IDs<br> Dutch; Cross-genre; DU; DU001, DU002, DU003, etc.<br> English; Cross-topic; EN; EN001, EN002, EN003, etc.<br> Greek; Cross-topic; GR; GR001, GR002, GR003, etc.<br> Spanish; Cross-genre; SP; SP001, SP002, SP003, etc.</p> <p>The ground truth data of the training corpus found in the file <code>truth.txt</code> include one line per problem with problem ID and the correct binary answer (Y means the known and the questioned documents are by the same author and N means the opposite). For example:</p> <pre>EN001 N EN002 Y EN003 N ...</pre></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.3737562</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.3737563</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 508 | 508 |
Downloads | 39 | 39 |
Data volume | 236.0 MB | 236.0 MB |
Unique views | 384 | 384 |
Unique downloads | 37 | 37 |