Published September 30, 2021 | Version v1
Presentation Open

Keywords for data discovery

  • 1. GESIS

Description

Finding research data is often described as difficult or challenging (Brickley, Burgess, & Noy, 2019) (Chapman, et al., 2020), especially in comparison to literature search (Kern & Mathiak, 2015). From observation (Krämer, Papenmeier, Carevic, Kern, & Mathiak, 2021) and surveys (Gregory, Groth, Scharnhorst, & Wyatt, 2020) (Friedrich, 2020) we know that data discovery is a complex process, which involves doing literature review, using data portals, reading documentation, and leveraging personal networks. However, the glue that holds all these steps together is the common web search, e.g. via Google. Unfortunately, due to the lack of central, fully indexed repositories, individual data repositories have the responsibility to make their data visible for web search. In this paper we explore how research data is found via general web search by analyzing the queries made to Google using clustering techniques, retrieved via the Google Search Console. The clustering is based on two different keyword features: their probabilities in the queries and their Comparable Click Through Rate (CCTR). The latter is a normalized version of CTR, which allows keywords comparison. We use the query logs from three data portals from the Social Sciences domain, from two different institutions, in addition to a JSON file with mentions of datasets in research papers taken from Social Science Open Access Repository (SSOAR). The use case we are most interested in is the known item search. Here, a dataset is retrieved by name, which has been communicated through the literature or personal communication. These names are often ambiguous, such as acronyms or common nouns, and additional keywords are added by the researchers to find the dataset’s website. The results of our analysis provide a set of keywords which, when systematically added in proper locations of the research data landing pages, can help to make them more discoverable.

Files

KeywordsForDataDiscovery-MathiakLimaniYounes.pdf

Files (144.8 kB)

Name Size Download all
md5:a01468dd8b11c31faa6e14fc22cbba28
144.8 kB Preview Download