Crowdsourcing for Language Resources and Evaluation

Ustalov, Dmitry

doi:10.5281/zenodo.3960805

Published July 28, 2020 | Version urfu2020

Presentation Open

Crowdsourcing for Language Resources and Evaluation

Ustalov, Dmitry¹

1. Yandex

Crowdsourcing is an efficient approach for knowledge acquisition and data annotation that enables building impressive human-computer systems. In this tutorial we will discuss the relations between Crowdsourcing and Natural Language Processing, focusing on its practical use for Language Resource construction and evaluation. We will describe the established genres of crowdsourcing, show their strengths and weaknesses on real-world examples and case studies, and provide recommendations for ensuring the high quality of the crowdsourced annotation.

Notes

These materials are published under a CC BY-NC-SA license. Please feel welcome to share them! For viewer convenience, the slides published on Zenodo do not include interactive step-by-step examples.

Files

Crowdsourcing.pdf

Files (9.5 MB)

Name	Size	Download all
Crowdsourcing.pdf md5:f669754e54c9b498a71e7cbf04691305	9.5 MB	Preview Download

Additional details

Cites: Journal article: 10.1162/COLI_a_00354 (DOI); Conference paper: 10.1007/978-3-319-11716-4_17 (DOI); Journal article: 10.15514/ISPRAS-2015-27(3)-25 (DOI); Conference paper: 10.1007/978-3-319-25485-2_14 (DOI); Software: https://mtsar.nlpub.org/ (URL); Dataset: https://russe.nlpub.org/ (URL); Conference paper: 10.3115/v1/E14-2026 (DOI); Dataset: 10.5281/zenodo.1117228 (DOI)
Continues: Presentation: 10.5281/zenodo.1161505 (DOI)
Is previous version of: Presentation: 10.5281/zenodo.4291121 (DOI); Presentation: 10.5281/zenodo.4698904 (DOI)
Is supplement to: Book: 10.1007/978-3-030-34518-1 (DOI)

von Ahn, L., Dabbish, L.: Labeling Images with a Computer Game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 319–326. ACM, Vienna, Austria (2004). https://doi.org/10.1145/985692.985733.
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science. 321, 1465–1468 (2008). https://doi.org/10.1126/science.1160379.
Alonso, O.: The Practice of Crowdsourcing. Morgan & Claypool Publishers (2019). https://doi.org/10.2200/S00904ED1V01Y201903ICR066.
Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for Relevance Evaluation. SIGIR Forum. 42, 9–15 (2008). https://doi.org/10.1145/1480506.1480508.
Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F.M., Weber, G.: Common Voice: A Massively-Multilingual Speech Corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4218–4222. European Language Resources Association (ELRA), Marseille, France (2020).
Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics. 34, 555–596 (2008). https://doi.org/10.1162/coli.07-034-R2.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, 2007. Proceedings. pp. 722–735. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2007). https://doi.org/10.1007/978-3-540-76298-0_52.
Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: A Word Processor with a Crowd Inside. In: Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology. pp. 313–322. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1866029.1866078.
Biemann, C.: Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation. 47, 97–122 (2013). https://doi.org/10.1007/s10579-012-9180-5.
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O'Reilly Media (2017).
Blumenstock, J.E.: Size Matters: Word Count As a Measure of Quality on Wikipedia. In: Proceedings of the 17th International Conference on World Wide Web. pp. 1095–1096. ACM, Beijing, China (2008). https://doi.org/10.1145/1367497.1367673.
Bocharov, V., Alexeeva, S., Granovsky, D., Protopopova, E., Stepanova, M., Surikov, A.: Crowdsourcing morphological annotation. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference "Dialogue". pp. 109–124. RGGU (2013).
Bragg, J., Mausam, Weld, D.S.: Sprout: Crowd-Powered Task Design for Crowdsourcing. In: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. pp. 165–176. ACM, Berlin, Germany (2018). https://doi.org/10.1145/3242587.3242598.
Braslavski, P., Ustalov, D., Mukhin, M.: A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. pp. 101–104. Association for Computational Linguistics, Gothenburg, Sweden (2014). https://doi.org/10.3115/v1/E14-2026.
Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. pp. 286–295. Association for Computational Linguistics; Asian Federation of Natural Language Processing, Singapore (2009). https://doi.org/10.3115/1699510.1699548.
Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading Tea Leaves: How Humans Interpret Topic Models. In: Advances in Neural Information Processing Systems 22. pp. 288–296. Curran Associates, Inc., Vancouver, BC, Canada (2009).
Daniel, F., Kucherbaev, P., Cappiello, C., Benatallah, B., Allahbakhsh, M.: Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM Computing Surveys. 51, 7:1–7:40 (2018). https://doi.org/10.1145/3148148.
Dawid, A.P., Skene, A.M.: Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics). 28, 20–28 (1979). https://doi.org/10.2307/2346806.
Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Pick-A-Crowd: Tell Me What You Like, and I'll Tell You What to Do. In: Proceedings of the 22Nd International Conference on World Wide Web. pp. 367–374. ACM, Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2488388.2488421.
Estellés-Arolas, E., González-Ladrón-de-Guevara, F.: Towards an integrated crowdsourcing definition. Journal of Information Science. 38, 189–200 (2012). https://doi.org/10.1177/0165551512437638.
Esteves, D., Rula, A., Reddy, A.J., Lehmann, J.: Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis. Journal of Data and Information Quality. 9, 16:1–16:26 (2018). https://doi.org/10.1145/3177873.
Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., Dredze, M.: Annotating Named Entities in Twitter Data with Crowdsourcing. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. pp. 80–88. Association for Computational Linguistics, Los Angeles, CA, USA (2010).
Finnerty, A., Kucherbaev, P., Tranquillini, S., Convertino, G.: Keep It Simple: Reward and Task Design in Crowdsourcing. In: Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI. ACM, Trento, Italy (2013). https://doi.org/10.1145/2499149.2499168.
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S.: Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection. Computer Supported Cooperative Work (CSCW). 28, 815–841 (2019). https://doi.org/10.1007/s10606-018-9336-y.
Geiger, R.S., Halfaker, A.: When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? In: Proceedings of the 9th International Symposium on Open Collaboration. pp. 6:1–6:6. ACM, Hong Kong (2013). https://doi.org/10.1145/2491055.2491061.
Gurevych, I., Kim, J. eds: The People's Web Meets NLP: Collaboratively Constructed Language Resources. Springer-Verlag Berlin Heidelberg, Berlin; Heidelberg, Germany (2013). https://doi.org/10.1007/978-3-642-35085-6.
Halfaker, A., Geiger, R.S.: ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia, http://arxiv.org/abs/1909.05189.
Halfaker, A., Geiger, R.S., Morgan, J.T., Riedl, J.: The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist. 57, 664–688 (2013). https://doi.org/10.1177/0002764212469365.
Hosseini, M., Phalp, K., Taylor, J., Ali, R.: The Four Pillars of Crowdsourcing: a Reference: a Reference Model. In: 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS). pp. 1–12. IEEE, Marrakech, Morocco (2014). https://doi.org/10.1109/RCIS.2014.6861072.
Howe, J.: Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. Crown Publishing Group, New York, NY, USA (2009).
Jurgens, D., Navigli, R.: It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation. Transactions of the Association for Computational Linguistics. 2, 449–464 (2014). https://doi.org/10.1162/tacl_a_00195.
Karger, D.R., Oh, S., Shah, D.: Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems. Operations Research. 62, 1–24 (2014). https://doi.org/10.1287/opre.2013.1235.
Kittur, A., Kraut, R.E.: Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work. pp. 37–46. ACM, San Diego, CA, USA (2008). https://doi.org/10.1145/1460563.1460572.
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE Publications, Inc, Thousand Oaks, CA, USA (2018).
Kumar, S., Spezzano, F., Subrahmanian, V.S.: VEWS: A Wikipedia Vandal Early Warning System. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 607–616. ACM, Sydney, NSW, Australia (2015). https://doi.org/10.1145/2783258.2783367.
Meyer, C.M., Mieskes, M., Stab, C., Gurevych, I.: DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. pp. 105–109. Dublin City University; Association for Computational Linguistics, Dublin, Ireland (2014).
Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence. 193, 217–250 (2012). https://doi.org/10.1016/j.artint.2012.07.001.
Oleson, D., Sorokin, A., Laughlin, G.P., Hester, V., Le, J., Biewald, L.: Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. In: Human Computation: Papers from the 2011 AAAI Workshop (WS-11-11). pp. 43–48. Association for the Advancement of Artificial Intelligence, San Francisco, CA, USA (2011).
Panchenko, A., Ustalov, D., Faralli, S., Ponzetto, S.P., Biemann, C.: Improving Hypernymy Extraction with Distributional Semantic Classes. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation. pp. 1541–1551. European Language Resources Association (ELRA), Miyazaki, Japan (2018).
Panchenko, A., Lopukhina, A., Ustalov, D., Lopukhin, K., Arefyev, N., Leontyev, A., Loukachevitch, N.: RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue". pp. 547–564. RSUH, Moscow, Russia (2018).
Poesio, M., Chamberlain, J., Kruschwitz, U., Robaldo, L., Ducceschi, L.: Phrase Detectives: Utilizing Collective Intelligence for Internet-scale Language Resource Creation. ACM Transactions on Interactive Intelligent Systems. 3, 3:1–3:44 (2013). https://doi.org/10.1145/2448116.2448119.
Rajpurkar, P., Jia, R., Liang, P.: Know What You Don't Know: Unanswerable Questions for SQuAD. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 784–789. Association for Computational Linguistics, Melbourne, VIC, Australia (2018). https://doi.org/10.18653/v1/P18-2124.
Rodrigo, E.G., Aledo, J.A., Gámez, J.A.: spark-crowd: A Spark Package for Learning from Crowdsourced Big Data. Journal of Machine Learning Research. 20, 1–5 (2019).
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 614–622. ACM, Las Vegas, NV, USA (2008). https://doi.org/10.1145/1401890.1401965.
Sheshadri, A., Lease, M.: SQUARE: A Benchmark for Research on Computing Crowd Consensus. In: First AAAI Conference on Human Computation and Crowdsourcing. pp. 156–164. Association for the Advancement of Artificial Intelligence (2013).
Shishkin, A., Bezzubtseva, A., Fedorova, V., Drutsa, A., Gusev, G.: Text Recognition Using Anonymous CAPTCHA Answers. In: Proceedings of the 13th International Conference on Web Search and Data Mining. pp. 537–545. ACM, Houston, TX, USA (2020). https://doi.org/10.1145/3336191.3371795.
Snow, R., O'Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 254–263. Association for Computational Linguistics, Honolulu, HI, USA (2008). https://doi.org/10.3115/1613715.1613751.
Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Information Quality Work Organization in Wikipedia. Journal of the American Society for Information Science and Technology. 59, 983–1001 (2008). https://doi.org/10.1002/asi.20813.
Ustalov, D.: Words Worth Attention: Predicting Words of the Week on the Russian Wiktionary. In: Knowledge Engineering and the Semantic Web, 5th International Conference, KESW 2014, Kazan, Russia, September 29–October 1, 2014. Proceedings. pp. 196–207. Springer International Publishing, Cham, Switzerland (2014). https://doi.org/10.1007/978-3-319-11716-4_17.
Ustalov, D.: A Crowdsourcing Engine for Mechanized Labor. Proceedings of the Institute for System Programming. 27, 351–364 (2015). https://doi.org/10.15514/ISPRAS-2015-27(3)-25.
Ustalov, D.: Teleboyarin—Mechanized Labor for Telegram. In: Proceedings of the AINL-ISMW FRUCT 2015. pp. 195–197 (2015).
Ustalov, D.: Towards Crowdsourcing and Cooperation in Linguistic Resources. In: Information Retrieval: 8th Russian Summer School, RuSSIR 2014, Nizhniy Novgorod, Russia, August 18-22, 2014, Revised Selected Papers. pp. 348–358. Springer International Publishing, Cham, Switzerland (2015). https://doi.org/10.1007/978-3-319-25485-2_14.
Ustalov, D., Panchenko, A., Biemann, C., Ponzetto, S.P.: Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction. Computational Linguistics. 45, 423–479 (2019). https://doi.org/10.1162/COLI_a_00354.
Vannella, D., Jurgens, D., Scarfini, D., Toscani, D., Navigli, R.: Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1294–1304. Association for Computational Linguistics, Baltimore, MD, USA (2014). https://doi.org/10.3115/v1/P14-1122.
Wang, A., Hoang, C.D.V., Kan, M.-Y.: Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation. 47, 9–31 (2013). https://doi.org/10.1007/s10579-012-9176-1.
Wang, J., Ipeirotis, P.G., Provost, F.: Quality-Based Pricing for Crowdsourced Workers. New York University (2013).
Wang, P., Li, X., Wu, R.: A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia. Journal of Information Science. 1–16 (2019). https://doi.org/10.1177/0165551519877646.
Whitehill, J., Wu, T.-f., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In: Advances in Neural Information Processing Systems 22. pp. 2035–2043. Curran Associates, Inc., Vancouver, BC, Canada (2009).
Wilkinson, D.M., Huberman, B.A.: Cooperation and Quality in Wikipedia. In: Proceedings of the 2007 International Symposium on Wikis. pp. 157–164. ACM, Montréal, QC, Canada (2007). https://doi.org/10.1145/1296951.1296968.
Yang, J., Drake, T., Damianou, A., Maarek, Y.: Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa. In: Proceedings of the 2018 World Wide Web Conference. pp. 23–32. International World Wide Web Conferences Steering Committee, Lyon, France (2018). https://doi.org/10.1145/3178876.3186033.
Zheng, L., Albano, C.M., Vora, N.M., Mai, F., Nickerson, J.V.: The Roles Bots Play in Wikipedia. Proceedings of the ACM on Human-Computer Interaction. 3, 215:1–215:20 (2019). https://doi.org/10.1145/3359317.

	All versions	This version
Views	592	223
Downloads	280	105
Data volume	3.0 GB	1.1 GB

Crowdsourcing for Language Resources and Evaluation

Notes

Files

Crowdsourcing.pdf

Files (9.5 MB)

Additional details

Related works

References

Crowdsourcing for Language Resources and Evaluation

Creators

Description

Notes

Files

Crowdsourcing.pdf

Files (9.5 MB)

Additional details

Related works

References