Published December 7, 2015 | Version v1
Dataset Open

What Makes Sports Fans Interactive? Identifying Factors Affecting Chat Interactions in Online Sports Viewing

Description

 

 

1. Chat message data

- kbo2011_naverbasevall_comment.sql: chat messages in Naver Sports that users posted in 2011
- kbo2012_naverbasevall_comment.sql: chat messages in Naver Sports that users posted in 2012

Each MySQL table consists of the following six columns.
1. comment_id: Integer variable to identify each comment
2. gid: Integer variable to identify each game
3. uid: Integer variable to identify each user
4. category_id: A team name that a user selected when a chat message is posted
5. content: Chat message text
6. post_time_relative: Elapsed seconds since the game starts

 

2. KBO game information

- game_basic.sql: KBO game information in 2011/2012

(This data was collected from here, http://sports.news.naver.com/schedule/index.nhn?uCategory=&category=kbo&year=2010&month=03)

This MySQL table consists of the following 12 columns.
1. gid: Integer variable to identify each game
2. game: Game ID used in Naver Sports
3. year: year of the game play
4. month: month of the game play
5. date: date of the game play
6. time: time of the game play
7. day: day of the game play
8. team_away: name of the away team
9 team_home: name of the home team
10. score_away: score by the away team
11. score_home: score by the home team
12. location: location of the game play

 

3. KBO game play-by-play data 

- game_playbyplay.sql: playbyplay data in 2011/2012

(This data was collected from Naver Sports; EX, http://sports.news.naver.com/gameCenter/gameRecord.nhn?category=kbo&gameId=20100327HHSK0)

This MySQL table consists of the following 21 columns.

1. play_no:  Integer variable to identify each play
2. gid: Integer variable to identify each game
3. inning: inning of the play
4. pitcher: current pitcher name
5. batter: current batter name
6. batter_no: current batter order
7. pitch: the pitch result
8. strike: current number of strikes
9. ball: current number of balls
10. out_cnt: current number of outs
11. score_home: current score of the home team
12. score_away: current score of the away team
13. 1st_base: player name on the 1st base 
14. 2nd_base: player name on the 2nd base
15. 3rd_base: player name on the 3rd base
16. runner_state: overall state of the three bases (ex: 1 --> A player is on 1st base, but 2nd and 3rd bases are empty, 13 --> Two players are on 1st and 3rd bases respectively, but 2nd base is empty)
17. runner_cnt: the number of players on bases
18. result_r: r by this play
19. result_rbi: rbi by this play
20. result_score_home: total home score after this play
21. result_score_away: total away score after this play

 

4. Winning rate data 

- game_winrate.sql: winning rates of the home/away teams for an inning

(This data is calculated based on the playbyplay data. For the details, please refer to our paper--currently under review)

This MySQL table consists of the following 21 columns.

1. play_no: Integer variable to identify each play
2. gid: Integer variable to identify each game
3. inning: inning of the play
4. away_win: probability that the away team will win this game
5. draw: probability that the game will finish a tie
6. home_win: probability that the home team will win this game

 

If you have any question about the dataset, please contact Minsam Ko (msko@kaist.ac.kr).

Files

Files (2.6 GB)

Name Size Download all
md5:f433e2e626a40d5f210860c3ec768021
97.9 kB Download
md5:9fcfed96448ef53d54949094389679ac
35.4 MB Download
md5:bd7cc938f844bc79be7a408b89064d4b
3.6 MB Download
md5:49cdb144c52246a2f8eb57ea15ed0752
1.5 GB Download
md5:16f8af23ab24460d87d06dd8a2948326
1.1 GB Download