Published January 25, 2021 | Version 1
Dataset Open

Data set of the article: Language Bias in the Google Scholar Ranking Algorithm

  • 1. Department of Communication, Universitat Pompeu Fabra, 08002 Barcelona, Spain

Description

Data of investigation published in the article Cristòfol Rovira; Lluís Codina; Carlos Lopezosa Language Bias in the Google Scholar Ranking Algorithm. Future Internet, 2021, 13.

Abstract: The visibility of academic articles or conference papers depends on their being easily found in academic search engines, above all in Google Scholar. To enhance this visibility, search engine optimization (SEO) has been applied in recent years to academic search engines in order to optimize documents and, thereby, ensure they are better ranked in search pages (i.e., academic search engine optimization or ASEO). To achieve this degree of optimization, we first need to further our understanding of Google Scholar’s relevance ranking algorithm, so that, based on this knowledge, we can highlight or improve those characteristics that academic documents already present and which are taken into account by the algorithm. This study seeks to advance our knowledge in this line of research by determining whether the language in which a document is published is a positioning factor in the Google Scholar relevance ranking algorithm. Here, we employ a reverse engineering research methodology based on a statistical analysis that uses Spearman’s correlation coefficient. The results obtained point to a bias in multilingual searches conducted in Google Scholar with documents published in languages other than in English being systematically relegated to positions that make them virtually invisible. This finding has important repercussions, both for conducting searches and for optimizing positioning in Google Scholar, being especially critical for articles on subjects that are expressed in the same way in English and other languages, the case, for example, of trademarks, chemical compounds, industrial products, acronyms, drugs, diseases, etc.

Files

Files (499.8 kB)

Name Size Download all
md5:2d47e1ef46bf8012f18b3c7429801e4f
167.2 kB Download
md5:70473144fd0b2a4bb2123219f21803dc
173.6 kB Download
md5:85e1f87b67feb5f828efa9a2c8f19cb6
159.0 kB Download