Crowdsourcing for Language Resources and Evaluation

Ustalov, Dmitry

doi:10.5281/zenodo.3510160

Published November 20, 2019 | Version ainl2019

Presentation Restricted

Crowdsourcing for Language Resources and Evaluation

Ustalov, Dmitry¹

1. Yandex

Crowdsourcing is an efficient approach for knowledge acquisition and data annotation that enables building impressive human-computer systems. In this tutorial we will discuss the relations between Crowdsourcing and Natural Language Processing, focusing on its practical use for Language Resource construction and evaluation. We will describe the established genres of crowdsourcing, show their strengths and weaknesses on real-world examples and case studies, and provide recommendations for ensuring the high quality of the crowdsourced annotation.

Notes

These materials are published under a CC BY-NC-SA license. Please feel welcome to share them! For viewer convenience, the slides published on Zenodo do not include interactive step-by-step examples.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

von Ahn L. and Dabbish L. (2004). Labeling Images with a Computer Game. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI '04. Vienna, Austria: ACM, pp. 319–326. DOI: 10.1145/985692.985733.
von Ahn L. et al. (2008). reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science, vol. 321, no. 5895, pp. 1465–1468. DOI: 10.1126/science.1160379.
Alonso O. (2019). The Practice of Crowdsourcing. Ed. by G. Marchionini. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool Publishers. DOI: 10.2200/S00904ED1V01Y201903ICR066.
Alonso O., Rose D. E., and Stewart B. (2008). Crowdsourcing for Relevance Evaluation. SIGIR Forum, vol. 42, no. 2, pp. 9–15. DOI: 10.1145/1480506.1480508.
Artstein R. and Poesio M. (2008). Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, vol. 34, no. 4, pp. 555–596. DOI: 10.1162/coli.07-034-R2.
Auer S. et al. (2007). DBpedia: A Nucleus for a Web of Open Data. The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, 2007. Proceedings. Vol. 4825. Lecture Notes in Computer Science. Berlin and Heidelberg, Germany: Springer Berlin Heidelberg, pp. 722–735. DOI: 10.1007/978-3-540-76298-0_52.
Bernstein M. S. et al. (2010). Soylent: A Word Processor with a Crowd Inside. Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology. UIST '10. New York, NY, USA: ACM, pp. 313–322. DOI: 10.1145/1866029.1866078.
Biemann C. (2013). Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation, vol. 47, no. 1, pp. 97–122. DOI: 10.1007/s10579-012-9180-5.
Bird S., Klein E., and Loper E. (2017). Natural Language Processing with Python. 2nd Edition. O'Reilly Media.
Blumenstock J. E. (2008). Size Matters: Word Count As a Measure of Quality on Wikipedia. Proceedings of the 17th International Conference on World Wide Web. WWW '08. Beijing, China: ACM, pp. 1095–1096. DOI: 10.1145/1367497.1367673.
Bocharov V. et al. (2013). Crowdsourcing morphological annotation. Computational Linguistics and Intellectual Technologies: papers from the Annual conference "Dialogue". RGGU, pp. 109–124. URL: http://www.dialog- 21.ru/media/1227/bocharovvv.pdf.
Bragg J., Mausam, and Weld D. S. (2018). Sprout: Crowd-Powered Task Design for Crowdsourcing. Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. UIST '18. Berlin, Germany: ACM, pp. 165–176. DOI: 10.1145/3242587.3242598.
Braslavski P., Ustalov D., and Mukhin M. (2014). A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus. Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: Association for Computational Linguistics, pp. 101–104. DOI: 10.3115/v1/E14- 2026.
Callison-Burch C. (2009). Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. EMNLP 2009. Singapore: Association for Computational Linguistics and Asian Federation of Natural Language Processing, pp. 286–295. DOI: 10.3115/1699510.1699548.
Chang J. et al. (2009). Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems 22. NIPS 2009. Vancouver, BC, Canada: Curran Associates, Inc., pp. 288–296. URL: https://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf.
Daniel F. et al. (2018). Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM Computing Surveys, vol. 51, no. 1, 7:1–7:40. DOI: 10.1145/3148148.
Dawid A. P. and Skene A. M. (1979). Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 20–28. DOI: 10.2307/2346806.
DifallahD.E.,DemartiniG.,andCudre ́-MaurouxP.(2013).Pick-A-Crowd:TellMeWhatYouLike,andI'llTellYouWhattoDo.Proceedings of the 22Nd International Conference on World Wide Web. WWW '13. Rio de Janeiro, Brazil: ACM, pp. 367–374. DOI: 10.1145/2488388.2488421.
Estelle ́s-ArolasE.andGonza ́lez-Ladro ́n-de-GuevaraF.(2012).Towardsanintegratedcrowdsourcingdefinition.JournalofInformation Science, vol. 38, no. 2, pp. 189–200. DOI: 10.1177/0165551512437638.
Esteves D. et al. (2018). Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis. Journal of Data and Information Quality, vol. 9, no. 3: Special Issue on Improving the Veracity and Value of Big Data, 16:1–16:26. DOI: 10.1145/3177873.
Finin T. et al. (2010). Annotating Named Entities in Twitter Data with Crowdsourcing. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. CSLDAMT '10. Los Angeles, CA, USA: Association for Computational Linguistics, pp. 80–88. URL: https://aclweb.org/anthology/W10-0713.
Gadiraju U. et al. (2019). Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection. Computer Supported Cooperative Work (CSCW), vol. 28, no. 5, pp. 815–841. DOI: 10.1007/s10606-018-9336-y.
Geiger R. S. and Halfaker A. (2013). When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? Proceedings of the 9th International Symposium on Open Collaboration. WikiSym '13. Hong Kong: ACM, 6:1–6:6. DOI: 10.1145/2491055.2491061.
Gurevych I. and J. Kim, eds. (2013). The People's Web Meets NLP: Collaboratively Constructed Language Resources. Berlin and Heidelberg, Germany: Springer-Verlag Berlin Heidelberg. DOI: 10.1007/978-3-642-35085-6.
Halfaker A. and Geiger R. S. (2019). ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. In review. arXiv: 1909.05189 [cs.HC].
Halfaker A. et al. (2013). The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist, vol. 57, no. 5, pp. 664–688. DOI: 10.1177/0002764212469365.
Hosseini M. et al. (2014). The Four Pillars of Crowdsourcing: a Reference: a Reference Model. 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS). Marrakech, Morocco: IEEE, pp. 1–12. DOI: 10.1109/RCIS.2014.6861072.
Howe J. (2009). Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. New York, NY, USA: Crown Publishing Group.
Jurgens D. and Navigli R. (2014). It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation. Transactions of the Association for Computational Linguistics, vol. 2, pp. 449–464. DOI: 10.1162/tacl_a_00195.
Karger D. R., Oh S., and Shah D. (2014). Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems. Operations Research, vol. 62, no. 1, pp. 1–24. DOI: 10.1287/opre.2013.1235.
Kittur A. and Kraut R. E. (2008). Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work. CSCW '08. San Diego, CA, USA: ACM, pp. 37–46. DOI: 10.1145/1460563.1460572.
Krippendorff K. (2018). Content Analysis: An Introduction to Its Methodology. Fourth Edition. Thousand Oaks, CA, USA: SAGE Publications, Inc.
Kumar S., Spezzano F., and Subrahmanian V. (2015). VEWS: A Wikipedia Vandal Early Warning System. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '15. Sydney, NSW, Australia: ACM, pp. 607–616. DOI: 10.1145/2783258.2783367.
Meyer C. M. et al. (2014). DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. COLING 2014. Dublin, Ireland: Dublin City University and Association for Computational Linguistics, pp. 105–109. URL: https://aclweb.org/anthology/C14-2023.
Navigli R. and Ponzetto S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, vol. 193, pp. 217–250. DOI: 10.1016/j.artint.2012.07.001.
Oleson D. et al. (2011). Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human Computation: Papers from the 2011 AAAI Workshop (WS-11-11). San Francisco, CA, USA: Association for the Advancement of Artificial Intelligence, pp. 43–48. URL: https://www.aaai.org/ocs/index.php/WS/AAAIW11/paper/view/3995.
Panchenko A. et al. (2018a). Improving Hypernymy Extraction with Distributional Semantic Classes. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA), pp. 1541–1551. URL: http://www.lrec-conf.org/proceedings/lrec2018/summaries/234.html.
Panchenko A. et al. (2018b). RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language. Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue". Moscow, Russia: RSUH, pp. 547–564. URL: http://www.dialog-21.ru/media/4539/panchenkoaplusetal.pdf.
Poesio M. et al. (2013). Phrase Detectives: Utilizing Collective Intelligence for Internet-scale Language Resource Creation. ACM Transactions on Interactive Intelligent Systems, vol. 3, no. 1: Special section on internet-scale human problem solving and regular papers, 3:1–3:44. DOI: 10.1145/2448116.2448119.
RodrigoE.G.,AledoJ.A.,andGa ́mezJ.A.(2019).spark-crowd:ASparkPackageforLearningfromCrowdsourcedBigData.Journalof Machine Learning Research, vol. 20, pp. 1–5. URL: http://jmlr.org/papers/v20/17- 743.html.
Sheng V. S., Provost F., and Ipeirotis P. G. (2008). Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '08. Las Vegas, NV, USA: ACM, pp. 614–622. DOI: 10.1145/1401890.1401965.
Sheshadri A. and Lease M. (2013). SQUARE: A Benchmark for Research on Computing Crowd Consensus. First AAAI Conference on Human Computation and Crowdsourcing. HCOMP 2013. Association for the Advancement of Artificial Intelligence, pp. 156–164. URL: https://www.aaai.org/ocs/index.php/HCOMP/HCOMP13/paper/view/7550.
Shishkin A. et al. (2020). Text Recognition Using Anonymous CAPTCHA Answers. Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining. WSDM '20. Forthcoming. Houston, TX, USA: ACM.
Snow R. et al. (2008). Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2008. Honolulu, HI, USA: Association for Computational Linguistics, pp. 254–263. DOI: 10.3115/1613715.1613751.
Stvilia B. et al. (2008). Information Quality Work Organization in Wikipedia. Journal of the American Society for Information Science and Technology, vol. 59, no. 6, pp. 983–1001. DOI: 10.1002/asi.20813.
Ustalov D. (2014). Words Worth Attention: Predicting Words of the Week on the Russian Wiktionary. Knowledge Engineering and the Semantic Web, 5th International Conference, KESW 2014, Kazan, Russia, September 29–October 1, 2014. Proceedings. Vol. 468. Communications in Computer and Information Science. Cham, Switzerland: Springer International Publishing, pp. 196–207. DOI: 10.1007/978-3-319-11716-4_17.
Ustalov D. (2015a). A Crowdsourcing Engine for Mechanized Labor. Proceedings of the Institute for System Programming, vol. 27, no. 3, pp. 351–364. DOI: 10.15514/ISPRAS-2015-27(3)-25.
Ustalov D. (2015b). Teleboyarin—Mechanized Labor for Telegram. Proceedings of the AINL-ISMW FRUCT 2015, pp. 195–197. URL: https://www.fruct.org/publications/ainl-abstract/files/Ust.pdf.
Ustalov D. (2015c). Towards Crowdsourcing and Cooperation in Linguistic Resources. Information Retrieval: 8th Russian Summer School, RuSSIR 2014, Nizhniy Novgorod, Russia, August 18-22, 2014, Revised Selected Papers. Vol. 505. Communications in Computer and Information Science. Cham, Switzerland: Springer International Publishing, pp. 348–358. DOI: 10.1007/978-3-319-25485-2_14.
Ustalov D. et al. (2019). Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction. Computational Linguistics, vol. 45, no. 3, pp. 423–479. DOI: 10.1162/COLI_a_00354.
Vannella D. et al. (2014). Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Baltimore, MD, USA: Association for Computational Linguistics, pp. 1294–1304. DOI: 10.3115/v1/P14-1122.
Wang A., Hoang C. D. V., and Kan M.-Y. (2013a). Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation, vol. 47, no. 1, pp. 9–31. DOI: 10.1007/s10579-012-9176-1.
Wang J., Ipeirotis P. G., and Provost F. (2013b). Quality-Based Pricing for Crowdsourced Workers. NYU Working Paper No. 2451/31833. URL: https://ssrn.com/abstract=2283000.
Wang P., Li X., and Wu R. (2019). A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia. Journal of Information Science, pp. 1–16. DOI: 10.1177/0165551519877646.
Whitehill J. et al. (2009). Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. Advances in Neural Information Processing Systems 22. NIPS 2009. Vancouver, BC, Canada: Curran Associates, Inc., pp. 2035–2043. URL: https://papers.nips.cc/paper/3644-whose-vote-should-count-more-optimal-integration-of-labels-from-labelers-of-unknown-expertise.pdf.
Wilkinson D. M. and Huberman B. A. (2007). Cooperation and Quality in Wikipedia. Proceedings of the 2007 International Symposium on Wikis.WikiSym'07.Montre ́al,QC,Canada:ACM,pp.157–164.DOI:10.1145/1296951.1296968.
Yang J. et al. (2018). Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa. Proceedings of the 2018 World Wide Web Conference. WWW '18. Lyon, France: International World Wide Web Conferences Steering Committee, pp. 23–32. DOI: 10.1145/3178876.3186033.
Zheng L. et al. (2019). The Roles Bots Play in Wikipedia. Proceedings of the ACM on Human-Computer Interaction, vol. 3, no. CSCW, 215:1–215:20. DOI: 10.1145/3359317.

	All versions	This version
Views	603	373
Downloads	280	175
Data volume	3.0 GB	2.0 GB

Crowdsourcing for Language Resources and Evaluation

Notes

Files

Restricted

Additional details

Related works

References

Crowdsourcing for Language Resources and Evaluation

Creators

Description

Notes

Files

Restricted

Additional details

Related works

References