Dataset Open Access

Game Walkthrough Corpus (GWTC)

Tiepmar, Jochen; Burghardt, Manuel


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="DOI">10.5281/zenodo.4562336</identifier>
  <creators>
    <creator>
      <creatorName>Tiepmar, Jochen</creatorName>
      <givenName>Jochen</givenName>
      <familyName>Tiepmar</familyName>
      <affiliation>Leipzig University</affiliation>
    </creator>
    <creator>
      <creatorName>Burghardt, Manuel</creatorName>
      <givenName>Manuel</givenName>
      <familyName>Burghardt</familyName>
      <affiliation>Leipzig University</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Game Walkthrough Corpus (GWTC)</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2021</publicationYear>
  <subjects>
    <subject>Game Studies, Walkthrough, Video Games, Text Corpus</subject>
  </subjects>
  <contributors>
    <contributor contributorType="DataCollector">
      <contributorName>Starke, Paul</contributorName>
      <givenName>Paul</givenName>
      <familyName>Starke</familyName>
      <affiliation>Leipzig University (Student)</affiliation>
    </contributor>
    <contributor contributorType="DataCollector">
      <contributorName>Karwasz, Tim</contributorName>
      <givenName>Tim</givenName>
      <familyName>Karwasz</familyName>
      <affiliation>Leipzig University (Student)</affiliation>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2021-02-12</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4562336</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4559182</relatedIdentifier>
  </relatedIdentifiers>
  <version>1.0</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Motivation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Game Walkthrough Corpus (GWTC) contains 12,295 unique&lt;br&gt;
walkthrough documents that cover a total of 6,117 games.&amp;nbsp; For each game walkthrough,&lt;br&gt;
it provides frequencies of unigrams and bigrams, treating it as a bag of words. In&lt;br&gt;
addition, it provides word frequencies on the sentence level. Furthermore, the GWTC&lt;br&gt;
contains a number of game-related metadata, including title, publisher, developer, year,&lt;br&gt;
genre, etc. All the language statistics and metadata are stored in separate plain text files&lt;br&gt;
and can be referenced by means of uniform resource names (URN). These URNs also&lt;br&gt;
can be used to derive any combination of statistics and metadata. Researchers, for&lt;br&gt;
instance, can investigate the most frequent unigrams for games in the &amp;ldquo;Adventure&amp;rdquo;&lt;br&gt;
genre. This way, the GWTC can be reused in various ways, for different kinds of&lt;br&gt;
research questions on the topic of gaming language, which may be summarized as&lt;br&gt;
&amp;ldquo;distant playing&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copyright Information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Game walkthroughs are protected by individual copyright notices that are often very strict. That is why this data set does not include the documents but instead various data formats that are useful for text mining and distant reading methods while not allowing to recreate the documents. It is highly unlikely that even a single sentence can be reconstructed from the published data.&lt;br&gt;
Since the documents are not -- not even in part -- published but only text mining statistics about them, no violation of copyright is done by this project.&lt;br&gt;
Links to the original documents are available in the sourceUrls file in the data folder.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File Information&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;data folder: document data&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;bagofwords: Word frequencies per document&lt;/li&gt;
	&lt;li&gt;bigrams: Bigram frequencies per document&lt;/li&gt;
	&lt;li&gt;corpusstats: Min, avg and max token count, type count, type/token ratio, documents per game plus corressponding standard deviation&lt;/li&gt;
	&lt;li&gt;game_walkthrough_mapping: Documents per game&lt;/li&gt;
	&lt;li&gt;game_walkthrough_mapping: Number of documents per game&lt;/li&gt;
	&lt;li&gt;sentencecollocations: Word frequencies per sentence per document&lt;/li&gt;
	&lt;li&gt;sourceUrls: Links to original text&lt;/li&gt;
	&lt;li&gt;textlength: Number of characters per document&lt;/li&gt;
	&lt;li&gt;tfidf_deu: Word significance per document (German)&lt;/li&gt;
	&lt;li&gt;ifidf_eng: Word significance per document (English)&lt;/li&gt;
	&lt;li&gt;tokencount: Number of unique words per document&lt;/li&gt;
	&lt;li&gt;typecount: Number of words per document&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;metadata: game metadata&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;file names that do not start with &amp;quot;_&amp;quot;: metadata [filename] per game&lt;/li&gt;
	&lt;li&gt;_all: All metadata in one file&lt;/li&gt;
	&lt;li&gt;_mapping_release_date*: Metadata combined with release data for time series&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;doc folder: documentation&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;createdata: Python script to create content of data folder&lt;/li&gt;
	&lt;li&gt;extractMetainformation: Python script to create content of metadata folder&lt;/li&gt;
	&lt;li&gt;metadata_rawg: Game metadata collected from RAWG&lt;/li&gt;
	&lt;li&gt;metadata_steam: Game metadata collected from Steam&lt;/li&gt;
	&lt;li&gt;metadata_symbol: Quality control. Relation of text in source HTML and extracted text&lt;/li&gt;
	&lt;li&gt;titlesandurns: Game titles mapped to project identifiers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Walkthrough Sources &lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;https://portforward.com/games/walkthroughs/&lt;/li&gt;
	&lt;li&gt;https://www.neoseeker.com&lt;/li&gt;
	&lt;li&gt;https://www.spieletipps.de&lt;/li&gt;
	&lt;li&gt;https://jayisgames.com/&lt;/li&gt;
	&lt;li&gt;http://gamesetter.com/&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Corpus Statistics&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Number of unique games: 6,013&lt;/li&gt;
	&lt;li&gt;Number of documents: 12,295&lt;/li&gt;
	&lt;li&gt;Genre associations: 3,806&lt;/li&gt;
	&lt;li&gt;Gameplay tags: 10,246&lt;/li&gt;
	&lt;li&gt;Release dates: 2,443&lt;/li&gt;
	&lt;li&gt;Developers: 3,152&lt;/li&gt;
	&lt;li&gt;Publishers: 2,782&lt;/li&gt;
	&lt;li&gt;Steam IDs: 1,086&lt;/li&gt;
	&lt;li&gt;Platform associations: 5,293 (PC, Gameboy, iOS, Linux,...)&lt;/li&gt;
	&lt;li&gt;Game language associations: 4,631&lt;/li&gt;
	&lt;li&gt;Languages: English, German and a little bit of French&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External Resources&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Project Website: https://www.informatik.uni-leipzig.de/~jtiepmar/forschung/gwtc/&lt;/li&gt;
	&lt;li&gt;Bitbucket: https://bitbucket.org/jtiepmar/the-game-walkthrough-corpus/src/master/&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;There are two version of the GWTC available for download: ver. 0.99 contains all the above corpus files, plus the Git files. Note that after downloading ver. 0.99, the Git folders may be hidden per default, depending on you operating system. Ver. 1.0 is a cleaned up version that comes without the Git files.&lt;/p&gt;</description>
  </descriptions>
</resource>
339
34
views
downloads
All versions This version
Views 339262
Downloads 3427
Data volume 34.1 GB22.3 GB
Unique views 218170
Unique downloads 3225

Share

Cite as