Mining Social Science Publications for Survey Variables
Description
Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline
Files
document(1).pdf
Files
(140.5 kB)
Name | Size | Download all |
---|---|---|
md5:4e128a4fd9a433597db7ea9f9f36cd7d
|
140.5 kB | Preview Download |