Published February 13, 2016 | Version v1

Q&A dataset of Naver KiN Here

  • 1. KAIST

Description

We crawled the publicly accessible local questions and answers on Naver KiN Here from December 17, 2012 to December 31, 2013; a total of 508,334 questions and 567,156 answers were obtained. NKH questions are accessible on the web since web users can also answer the questions. However, the site only lists the questions that are less than one month old, and thus, we scraped the question listing pages in every other week. The collected URLs were then used to download the question pages, which contain all the answerers. For the given dataset, we extracted a set of associated items for analysis including  user information  (e.g., ID, the question closing rate and answer acceptance rate), title, content, posted time, categorized region, posted coordinate (i.e., latitude, longitude). Similarly, we extracted all the fields of each answer (e.g., answerer ID, posted time, answerer status information). For field extraction, we manually investigated the page format in HTML to write a parser code with regular expressions.

Files

Files (780.7 MB)

Name Size
md5:17a60032d3e7799a65d95666eacb6c64
780.7 MB Download