Book section Embargoed Access

Linguistic Bias in Crowdsourced Biographies: A Cross-lingual Examination

Jahna Otterbacher; Ioannis Katakis; Pantelis Agathangelou


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="a">2020-03-31</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">S. Downes, New technology supporting informal learning, Journal of Emerging Technologies in Web Intelligence. 2(1), 27–33 (2010).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Forte and A. Bruckman. From wikipedia to the classroom: Exploring online publication and learning. In Proceedings of the 7th international conference on Learning sciences, pp. 182–188 (2006).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In AAAI, vol. 6, pp. 1419–1424 (2006).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">G. Giannakopoulos, M. El-Haj, B. Favre, M. Litvak, J. Steinberger, and V. Varma, Tac 2011 multiling pilot overview (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In International Conference on Machine Learning, pp. 593–601 (2014).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">M. Kimura, K. Saito, and R. Nakano. Extracting influential nodes for information diffusion on a social network. In AAAI, vol. 7, pp. 1371–1376 (2007).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Capocci, V. D. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli, Preferential attachment in the growth of social networks: The internet encyclopedia wikipedia, Physical review E. 74(3), 036116 (2006).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 243–252 (2007).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">T. W¨ohner and R. Peters. Assessing the quality of wikipedia articles with lifecycle based metrics. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration, p. 16 (2009).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">D. Hasan Dalip, M. Andr´e Gon¸calves, M. Cristo, and P. Calado. Automatic quality assessment of content created collaboratively by web communities:a case study of wikipedia. In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pp. 295–304 (2009).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. E. Blumenstock. Size matters: word count as a measure of quality on wikipedia. In Proceedings of the 17th international conference on World Wide Web, pp. 1095–1096 (2008).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Kittur and R. E. Kraut. Harnessing the wisdom of crowds in wikipedia: Quality through coordination. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, CSCW '08, pp. 37–46, ACM, New York, NY, USA (2008). ISBN 978-1-60558-007-4. doi: 10.1145/1460563. 1460572. URL http://doi.acm.org/10.1145/1460563.1460572.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">C. Pentzold, Fixing the Floating Gap: The Online Encyclopedia Wikipedia as a Global Memory Place, Memory Studies. 2(2), 255–272 (2009).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">L. Flekova, O. Ferschke, and I. Gurevych. What makes a good biography?: Multidimensional quality analysis based on wikipedia article feedback data. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pp. 855–866, ACM, New York, NY, USA (2014). ISBN 978-1- 4503-2744-2. doi: 10.1145/2566486.2567972. URL http://doi.acm.org/10. 1145/2566486.2567972.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Maass. Linguistic Intergroup Bias: Stereotype Perpetuation through Language. In ed. M. Zanna, Advanced in Experimental Social Psychology, pp. 79–121. Academic Press, San Diego, CA (1999).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">W. von Hippel, D. Sekaquaptewa, and P. Vargas, The Linguistic Intergroup Bias as an Implicit Indicator of Prejudice, Journal of Experimental Social Psychology. 33, 490–509 (1997).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">C. Beukeboom. Mechanisms of Linguistic Bias: How Words Reflect and Maintain Stereotypic Expectations. In eds. J. Laszlo, J. Forgas, and O. Vincze, Social Cognition and Communication, pp. 313–330. Psychology Press, New York, NY (2013).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. Otterbacher. Linguistic bias in collaboratively produced biographies: crowdsourcing social stereotypes? In ICWSM, pp. 298–307 (2015).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Maass, D. Salvi, L. Arcuri, and G. Semin, Language Use in Intergroup Context: The Linguistic Intergroup Bias, Journal of Personality and Social Psychology. 57(6), 981–993 (1989).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Agathangelou, I. Katakis, I. Koutoulakis, F. Kokkoras, and D. Gunopulos, Learning patterns for discovering domain-oriented opinion words, Knowledge and Information Systems. pp. 1–33 (2017).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">G. Semin and K. Fiedler, The Cognitive Functions of Linguistic Categories in Describing Persons: Social Cognition and Language, Journal of Personality and Social Psychology. 54, 558–568 (1988).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">L. Coenen, L. Hedebouw, and G. Semin. Measuring Language Abstraction: The Linguistic Category Model Manual. Technical report, Free University Amsterdam, Amsterdam, The Netherlands (June, 2006). URL http://www. cratylus.org/resources/uploadedFiles/1151434261594-8567.pdf.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Winkielman, J. Halberstadt, T. Fazendeiro, and S. Catty, Prototypes are Attractive because they are Easy on the Mind, Psychological Science. 17(9), 799–806 (2006).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">D. Wigboldus, R. Spears, and G. Semin, When do We Communicate Stereotypes? Influence of the Social Context on the Linguistic Expectancy Bias, Group Processes &amp; Intergroup Relations. 8(3), 215–230 (2005).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">A. Hunt. The Linguistic Expectancy Bias and the American Mass Media. PhD thesis, Temple University, Philadelphia, PA (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. Otterbacher. Crowdsourcing stereotypes: Linguistic bias in metadata generated via gwap. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1955–1964 (2015).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Devine and A. Elliot, Are Racial Stereotypes Really Fading? The Princeton Trilogy Revisited, Personality and Social Psychology Bulletin. 21(11), 1139–1150 (1995).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Agathangelou, I. Katakis, I. Koutoulakis, F. Kokkoras, and D. Gunopulos, Learning patterns for discovering domain oriented opinion words, Knowledge and Information Systems (2017).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Agathangelou, I. Katakis, F. Kokkoras, and K. Ntonas, Mining Domain- Specific Dictionaries of Opinion Words, In eds. B. Benatallah, A. Bestavros, Y. Manolopoulos, A. Vakali, and Y. Zhang, Web Information Systems Engineering – WISE 2014: 15th International Conference, Thessaloniki, Greece, October 12-14, 2014, Proceedings, Part I, pp. 47–62. Springer International Publishing (2014).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pp. 168–177, ACM, New York, NY, USA (2004). ISBN 1-58113-888-1. doi: 10.1145/1014052.1014073. URL http://doi.acm.org/10.1145/1014052.1014073.</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. D. Gibbons and S. Chakraborti. Nonparametric statistical inference. In International encyclopedia of statistical science, pp. 977–979. Springer (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">P. Willett, The porter stemming algorithm: then and now, Program. 40(3), 219–223 (2006).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">S. T. Fiske, A. J. Cuddy, and P. Glick, Universal dimensions of social cognition: Warmth and competence, Trends in cognitive sciences. 11(2), 77–83 (2007).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">S. T. Fiske, A. J. Cuddy, P. Glick, and J. Xu, A model of (often mixed) stereotype content: competence and warmth respectively follow from perceived status and competition., Journal of personality and social psychology. 82(6), 878 (2002).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">C. Wagner, D. Garcia, M. Jadidi, and M. Strohmaier. It's a man's wikipedia? assessing gender inequality in an online encyclopedia. In Proceedings of the AAAI International Conference on Web and Social Media (ICWSM), pp. 454–463 (2015).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">E. Graells-Garrido, M. Lalmas, and F. Menczer. First women, second sex: gender bias in wikipedia. In Proceedings of the 26th ACM Conference on Hypertext &amp; Social Media, pp. 165–174 (2015).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. Cohen, P. Cohen, S. G. West, and L. S. Aiken, Applied multiple regression/ correlation analysis for the behavioral sciences. Routledge (2013).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">H. Abdi and L. J. Williams, Tukey's honestly significant difference (hsd) test, Encyclopedia of Research Design. Thousand Oaks, CA: Sage. pp. 1–5 (2010).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. M. Hilbe. Logistic regression. In International Encyclopedia of Statistical Science, pp. 755–758. Springer (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">D. Radev, J. Otterbacher, A. Winkel, and S. Blair-Goldensohn, Newsinessence: summarizing online news topics, Communications of the ACM. 48(10), 95–98 (2005).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">J. Antin, R. Yee, C. Cheshire, and O. Nov. Gender differences in wikipedia editing. In Proceedings of the 7th international symposium on Wikis and open collaboration, pp. 11–14 (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">B. Collier and J. Bear. Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions. In Proceedings of the ACM 2012 conference on computer supported cooperative work, pp. 383–392 (2012).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">C. Wagner, E. Graells-Garrido, D. Garcia, and F. Menczer, Women through the glass ceiling: gender asymmetries in wikipedia, EPJ Data Science. 5(1), 5 (2016).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">E. S. Callahan and S. C. Herring, Cultural bias in wikipedia content on famous persons, Journal of the Association for Information Science and Technology. 62(10), 1899–1915 (2011).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">I. Protonotarios, V. Sarimpei, and J. Otterbacher. Similar gaps, different origins? women readers and editors at greek wikipedia. In Wiki@ ICWSM (2016).</subfield>
  </datafield>
  <datafield tag="999" ind1="C" ind2="5">
    <subfield code="x">T. Wilson, J. Wiebe, and P. Hoffman. Recognizing Contextual Polarity in Phrase-level Sentiment Analysis. In Proceedings of the ACL HLT / EMNLP (2005).</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Linguistic Bias</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Crowdsourced Biographies</subfield>
  </datafield>
  <controlfield tag="005">20190512090957.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">This work has been partly supported by the project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 739578 (RISE – Call: H2020-WIDESPREAD-01-2016-2017-TeamingPhase2)  and the Government of the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development.

Electronic version of a book chapter article published as Multilingual Text Analysis Challenges, Models, and Approaches, 2019, 411–440, https://doi.org/10.1142/9789813274884_0012] © 2019 World Scientific Publishing Company, https://www.worldscientific.com/worldscibooks/10.1142/11116 .</subfield>
  </datafield>
  <controlfield tag="001">2671672</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Computer Science Department University of Nicosia Nicosia, Cyprus</subfield>
    <subfield code="a">Ioannis Katakis</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Computer Science Department University of Nicosia Nicosia, Cyprus</subfield>
    <subfield code="0">(orcid)0000-0002-9897-4879</subfield>
    <subfield code="a">Pantelis Agathangelou</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">embargoed</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-03-01</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">user-rise-teaming-cyprus</subfield>
    <subfield code="o">oai:zenodo.org:2671672</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Research Centre on Interactive Media, Smart Systems and Emerging Technologies &amp; Cyprus Center for Algorithmic Transparency Open University of Cyprus Nicosia, Cyprus</subfield>
    <subfield code="0">(orcid)0000-0002-7655-7118</subfield>
    <subfield code="a">Jahna Otterbacher</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Linguistic Bias in Crowdsourced Biographies: A Cross-lingual Examination</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-rise-teaming-cyprus</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">739578</subfield>
    <subfield code="a">Research Center on Interactive Media, Smart System and Emerging Technologies</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="c">810105</subfield>
    <subfield code="a">Cyprus Center for Algorithmic Transparency</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/licenses/by-nc-nd/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution Non Commercial No Derivatives 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Biographies make up a significant portion of Wikipedia entries and are a source of information and inspiration for the public. We examine a threat to their objectivity, linguistic biases, which are pervasive in human communication. Linguistic bias, the systematic asymmetry in the language used to describe people as a function of their social groups, plays a role in the perpetuation of stereotypes. Theory predicts that we describe people who are expected &amp;ndash; because they are members of our own in-groups or are stereotype-congruent &amp;ndash; with more abstract, subjective language, as compared to others. Abstract language has the power to sway our impressions of others as it implies stability over time. Extending our monolingual work, we consider biographies of intellectuals at the English- and Greek-language Wikipedias. We use our recently introduced sentiment analysis tool, DidaxTo, which extracts domain-specific opinion words to build lexicons of subjective words in each language and for each gender, and compare the extent to which abstract language is used. Contrary to expectation, we find evidence of gender-based linguistic bias, with women being described more abstractly as compared&amp;nbsp;to men. However, this is limited to English-language biographies. We discuss the implications of using DidaxTo to monitor linguistic bias in texts produced via crowdsourcing.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="g">411–440</subfield>
    <subfield code="b">World Scientific</subfield>
    <subfield code="z">978-981-3274-87-7</subfield>
    <subfield code="t">Multilingual Text Analysis Challenges, Models, and Approaches</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.1142/9789813274884_0012</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">section</subfield>
  </datafield>
</record>
49
7
views
downloads
Views 49
Downloads 7
Data volume 3.2 MB
Unique views 39
Unique downloads 5

Share

Cite as