Graphs, Computation, and Language

doi:10.5281/zenodo.4698904

Published May 22, 2021 | Version compsciclub2021

Presentation Open

Graphs, Computation, and Language

Dmitry Ustalov¹

1. Yandex

Graphs and networks offer a convenient way to study systems around us, including such complex ones as human language. Graph-based representations are proven to be a practical approach for a wide variety of Natural Language Processing (NLP) tasks.

This course has five lectures on Language Graphs, Graph Clustering, Graph Embeddings, Evaluation, and Crowdsourcing. They elaborately go through the corresponding algorithms step-by-step and suggest important linguistic datasets. The target audience of this course is advanced graduate students, data analysts, and researchers in NLP and IR (but it is not limited to them).

The course is held online in Spring 2021 at Computer Science Club in Saint Petersburg, Russia: https://compsciclub.ru/en/courses/graphscomplang/2021-spring/.

Lectures are in Russian, but the slides are in English.

Files

Clustering.pdf

Files (28.0 MB)

Name	Size	Download all
Clustering.pdf md5:4fce3198a8e5eebed5e6945613c55946	5.0 MB	Preview Download
Crowdsourcing.pdf md5:b10d02602dfc28cac19547b6d5a73035	10.1 MB	Preview Download
Embeddings.pdf md5:63c1a15a9f6e0e0361b3fd6cee6e038e	4.1 MB	Preview Download
Evaluation.pdf md5:1882d3f446cf86cb0079df863a5adbd1	4.8 MB	Preview Download
Language.pdf md5:84a59fbd09fad67235ec183d0074b4ba	3.9 MB	Preview Download

Additional details

Is new version of: Presentation: 10.5281/zenodo.3960805 (DOI); Presentation: 10.5281/zenodo.1161505 (DOI); Presentation: 10.5281/zenodo.4291121 (DOI); Presentation: 10.5281/zenodo.3960805 (DOI)
Is supplement to: Lesson: https://compsciclub.ru/en/courses/graphscomplang/2021-spring/ (URL)

Agirre, E., López de Lacalle, O., Soroa, A.: Random Walks for Knowledge-Based Word Sense Disambiguation. Computational Linguistics. 40, 57–84 (2014). https://doi.org/10.1162/COLI_a_00164.
von Ahn, L., Dabbish, L.: Labeling Images with a Computer Game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 319–326. ACM, Vienna, Austria (2004). https://doi.org/10.1145/985692.985733.
von Ahn, L., Dabbish, L.: Designing Games with a Purpose. Communications of the ACM. 51, 58–67 (2008). https://doi.org/10.1145/1378704.1378719.
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science. 321, 1465–1468 (2008). https://doi.org/10.1126/science.1160379.
Alonso, O.: The Practice of Crowdsourcing. Morgan & Claypool Publishers (2019). https://doi.org/10.2200/S00904ED1V01Y201903ICR066.
Alonso, O., Rose, D.E., Stewart, B.: Crowdsourcing for Relevance Evaluation. SIGIR Forum. 42, 9–15 (2008). https://doi.org/10.1145/1480506.1480508.
Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F.M., Weber, G.: Common Voice: A Massively-Multilingual Speech Corpus. In: Proceedings of The 12th Language Resources and Evaluation Conference. pp. 4218–4222. European Language Resources Association (ELRA), Marseille, France (2020).
Artstein, R., Poesio, M.: Inter-Coder Agreement for Computational Linguistics. Computational Linguistics. 34, 555–596 (2008). https://doi.org/10.1162/coli.07-034-R2.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In: The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11–15, 2007. Proceedings. pp. 722–735. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2007). https://doi.org/10.1007/978-3-540-76298-0_52.
Azadani, M.N., Ghadiri, N., Davoodijam, E.: Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of Biomedical Informatics. 84, 42–58 (2018). https://doi.org/10.1016/j.jbi.2018.06.005.
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet Project. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1. pp. 86–90. Association for Computational Linguistics, Montréal, QC, Canada (1998). https://doi.org/10.3115/980845.980860.
Barabási, A.-L., Albert, R.: Emergence of Scaling in Random Networks. Science. 286, 509–512 (1999). https://doi.org/10.1126/science.286.5439.509.
Bavelas, A.: Communication Patterns in Task-Oriented Groups. The Journal of the Acoustical Society of America. 22, 725–730 (1950). https://doi.org/10.1121/1.1906679.
Belkin, M., Niyogi, P.: Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation. 15, 1373–1396 (2003). https://doi.org/10.1162/089976603321780317.
Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: A Word Processor with a Crowd Inside. In: Proceedings of the 23Nd Annual ACM Symposium on User Interface Software and Technology. pp. 313–322. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1866029.1866078.
Biemann, C.: Chinese Whispers: An Efficient Graph Clustering Algorithm and Its Application to Natural Language Processing Problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing. pp. 73–80. Association for Computational Linguistics, New York, NY, USA (2006). https://doi.org/10.3115/1654758.1654774.
Biemann, C.: Structure Discovery in Natural Language. Springer Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-642-25923-4.
Biemann, C.: Creating a system for lexical substitutions from scratch using crowdsourcing. Language Resources and Evaluation. 47, 97–122 (2013). https://doi.org/10.1007/s10579-012-9180-5.
Biemann, C., Riedl, M.: Text: now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling. 1, 55–95 (2013). https://doi.org/10.15398/jlm.v1i1.60.
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O'Reilly Media (2017).
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008, P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008.
Blumenstock, J.E.: Size Matters: Word Count As a Measure of Quality on Wikipedia. In: Proceedings of the 17th International Conference on World Wide Web. pp. 1095–1096. ACM, Beijing, China (2008). https://doi.org/10.1145/1367497.1367673.
Bocharov, V., Alexeeva, S., Granovsky, D., Protopopova, E., Stepanova, M., Surikov, A.: Crowdsourcing morphological annotation. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference "Dialogue". pp. 109–124. RGGU (2013).
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051.
Bonacich, P.: Power and Centrality: A Family of Measures. American Journal of Sociology. 92, 1170–1182 (1987). https://doi.org/10.1086/228631.
Bordea, G., Lefever, E., Buitelaar, P.: SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2). In: Proceedings of the 10th International Workshop on Semantic Evaluation. pp. 1081–1091. Association for Computational Linguistics, San Diego, CA, USA (2016). https://doi.org/10.18653/v1/S16-1168.
Bordes, A., Chopra, S., Weston, J.: Question Answering with Subgraph Embeddings. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pp. 615–620. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1067.
Boudin, F.: A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing. pp. 834–838. Asian Federation of Natural Language Processing, Nagoya, Japan (2013).
Bradley, R.A., Terry, M.E.: Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. 39, 324–345 (1952). https://doi.org/10.2307/2334029.
Bragg, J., Mausam, Weld, D.S.: Sprout: Crowd-Powered Task Design for Crowdsourcing. In: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. pp. 165–176. ACM, Berlin, Germany (2018). https://doi.org/10.1145/3242587.3242598.
Brandes, U.: On variants of shortest-path betweenness centrality and their generic computation. Social Networks. 30, 136–145 (2008). https://doi.org/10.1016/j.socnet.2007.11.001.
Braslavski, P., Ustalov, D., Mukhin, M.: A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus. In: Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. pp. 101–104. Association for Computational Linguistics, Gothenburg, Sweden (2014). https://doi.org/10.3115/v1/E14-2026.
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. 30, 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X.
Buckley, C., Voorhees, E.M.: Evaluating Evaluation Measure Stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 33–40. Association for Computing Machinery, Athens, Greece (2000). https://doi.org/10.1145/345508.345543.
Cai, H., Zheng, V.W., Chen-Chuan Chang, K.: A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge and Data Engineering. 30, 1616–1637 (2018). https://doi.org/10.1109/TKDE.2018.2807452.
Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. pp. 286–295. Association for Computational Linguistics; Asian Federation of Natural Language Processing, Singapore (2009). https://doi.org/10.3115/1699510.1699548.
Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., Blei, D.M.: Reading Tea Leaves: How Humans Interpret Topic Models. In: Advances in Neural Information Processing Systems 22. pp. 288–296. Curran Associates, Inc., Vancouver, BC, Canada (2009).
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected Reciprocal Rank for Graded Relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. pp. 621–630. Association for Computing Machinery, Hong Kong, China (2009). https://doi.org/10.1145/1645953.1646033.
Chen, D., Lin, Y., Li, W., Li, P., Zhou, J., Sun, X.: Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View. Proceedings of the AAAI Conference on Artificial Intelligence. 34, 3438–3445 (2020). https://doi.org/10.1609/aaai.v34i04.5747.
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 21, 6 (2020). https://doi.org/10.1186/s12864-019-6413-7.
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press (2009).
Csárdi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal Complex Systems. 1695, 1–9 (2006).
Dacrema, M.F., Cremonesi, P., Jannach, D.: Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In: Proceedings of the 13th ACM Conference on Recommender Systems. pp. 101–109. Association for Computing Machinery, Copenhagen, Denmark (2019). https://doi.org/10.1145/3298689.3347058.
Daniel, F., Kucherbaev, P., Cappiello, C., Benatallah, B., Allahbakhsh, M.: Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques, and Assurance Actions. ACM Computing Surveys. 51, 7:1–7:40 (2018). https://doi.org/10.1145/3148148.
Davis, J., Goadrich, M.: The Relationship between Precision-Recall and ROC Curves. In: Proceedings of the 23rd International Conference on Machine Learning. pp. 233–240. Association for Computing Machinery, Pittsburgh, PA, USA (2006). https://doi.org/10.1145/1143844.1143874.
Dawid, A.P., Skene, A.M.: Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics). 28, 20–28 (1979). https://doi.org/10.2307/2346806.
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA (2019). https://doi.org/10.18653/v1/N19-1423.
Difallah, D.E., Demartini, G., Cudré-Mauroux, P.: Pick-A-Crowd: Tell Me What You Like, and I'll Tell You What to Do. In: Proceedings of the 22Nd International Conference on World Wide Web. pp. 367–374. ACM, Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2488388.2488421.
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik. 1, 269–271 (1959). https://doi.org/10.1007/BF01386390.
van Dongen, S.: Graph Clustering by Flow Simulation, (2000).
Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. Proceedings of the Royal Society of London B: Biological Sciences. 268, 2603–2606 (2001). https://doi.org/10.1098/rspb.2001.1824.
Dorow, B., Widdows, D.: Discovering Corpus-Specific Word Senses. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - Volume 2. pp. 79–82. Association for Computational Linguistics, Budapest, Hungary (2003). https://doi.org/10.3115/1067737.1067753.
Dror, R., Baumer, G., Shlomov, S., Reichart, R.: The Hitchhiker's Guide to Testing Statistical Significance in Natural Language Processing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1383–1392. Association for Computational Linguistics, Melbourne, VIC, Australia (2018). https://doi.org/10.18653/v1/P18-1128.
Dwivedi, V.P., Joshi, C.K., Laurent, T., Bengio, Y., Bresson, X.: Benchmarking Graph Neural Networks, https://arxiv.org/abs/2003.00982, (2020).
Estellés-Arolas, E., González-Ladrón-de-Guevara, F.: Towards an integrated crowdsourcing definition. Journal of Information Science. 38, 189–200 (2012). https://doi.org/10.1177/0165551512437638.
Esteves, D., Rula, A., Reddy, A.J., Lehmann, J.: Toward Veracity Assessment in RDF Knowledge Bases: An Exploratory Analysis. Journal of Data and Information Quality. 9, 16:1–16:26 (2018). https://doi.org/10.1145/3177873.
Faralli, S., Panchenko, A., Biemann, C., Ponzetto, S.P.: Linked Disambiguated Distributional Semantic Networks. In: The Semantic Web – ISWC 2016, 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part II. pp. 56–64. Springer International Publishing, Cham, Switzerland (2016). https://doi.org/10.1007/978-3-319-46547-0_7.
Fellbaum, C.: WordNet: An Electronic Database. MIT Press (1998).
Fey, M., Lenssen, J.E.: Fast Graph Representation Learning with PyTorch Geometric. In: ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds (2019).
Fillmore, C.J.: Frame Semantics. In: Linguistics in the Morning Calm. pp. 111–137. Hanshin Publishing Co., Seoul, South Korea (1982).
Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., Dredze, M.: Annotating Named Entities in Twitter Data with Crowdsourcing. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. pp. 80–88. Association for Computational Linguistics, Los Angeles, CA, USA (2010).
Finnerty, A., Kucherbaev, P., Tranquillini, S., Convertino, G.: Keep It Simple: Reward and Task Design in Crowdsourcing. In: Proceedings of the Biannual Conference of the Italian Chapter of SIGCHI. pp. 14:1–14:4. ACM, Trento, Italy (2013). https://doi.org/10.1145/2499149.2499168.
Florescu, C., Caragea, C.: PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1105–1115. Association for Computational Linguistics, Vancouver, BC, Canada (2017). https://doi.org/10.18653/v1/P17-1102.
Fortunato, S.: Community detection in graphs. Physics Reports. 486, 75–174 (2010). https://doi.org/10.1016/j.physrep.2009.11.002.
Fowlkes, E.B., Mallows, C.L.: A Method for Comparing Two Hierarchical Clusterings. Journal of the American Statistical Association. 78, 553–569 (1983). https://doi.org/10.1080/01621459.1983.10478008.
Freeman, L.C.: A Set of Measures of Centrality Based on Betweenness. Sociometry. 40, 35–41 (1977). https://doi.org/10.2307/3033543.
Frey, B.J., Dueck, D.: Clustering by Passing Messages Between Data Points. Science. 315, 972–976 (2007). https://doi.org/10.1126/science.1136800.
Gadiraju, U., Demartini, G., Kawase, R., Dietze, S.: Crowd Anatomy Beyond the Good and Bad: Behavioral Traces for Crowd Worker Modeling and Pre-selection. Computer Supported Cooperative Work (CSCW). 28, 815–841 (2019). https://doi.org/10.1007/s10606-018-9336-y.
Gallardo, P.F.: Google's secret and Linear Algebra. EMS Newsletter. 63, 10–15 (2007).
Geiger, R.S., Halfaker, A.: When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? In: Proceedings of the 9th International Symposium on Open Collaboration. pp. 6:1–6:6. ACM, Hong Kong, China (2013). https://doi.org/10.1145/2491055.2491061.
Goldhahn, D., Eckart, T., Quasthoff, U.: Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the Eight International Conference on Language Resources and Evaluation. pp. 759–765. European Language Resources Association (ELRA), Istanbul, Turkey (2012).
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: Graph Processing in a Distributed Dataflow Framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation. pp. 599–613. USENIX Association, Broomfield, CO, USA (2014).
Good, B.H., Montjoye, Y.-A. de, Clauset, A.: Performance of modularity maximization in practical contexts. Physical Review E. 81, 046106 (2010). https://doi.org/10.1103/PhysRevE.81.046106.
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge, MA, USA (2016).
Gorodkin, J.: Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry. 28, 367–374 (2004). https://doi.org/10.1016/j.compbiolchem.2004.09.006.
Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems. 151, 78–94 (2018). https://doi.org/10.1016/j.knosys.2018.03.022.
Grover, A., Leskovec, J.: node2vec: Scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 855–864. ACM, San Francisco, CA, USA (2016). https://doi.org/10.1145/2939672.2939754.
Gurevych, I., Kim, J. eds: The People's Web Meets NLP: Collaboratively Constructed Language Resources. Springer-Verlag Berlin Heidelberg, Berlin; Heidelberg, Germany (2013). https://doi.org/10.1007/978-3-642-35085-6.
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring Network Structure, Dynamics, and Function using NetworkX. In: Proceedings of the 7th Python in Science Conference. pp. 11–15, Pasadena, CA, USA (2008).
Halfaker, A., Geiger, R.S.: ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia. Proceedings of the ACM on Human-Computer Interaction. 1, 148:1–148:37 (2020). https://doi.org/10.1145/3415219.
Halfaker, A., Geiger, R.S., Morgan, J.T., Riedl, J.: The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist. 57, 664–688 (2013). https://doi.org/10.1177/0002764212469365.
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive Representation Learning on Large Graphs. In: Advances in Neural Information Processing Systems 30. pp. 1024–1034. Curran Associates, Inc., Vancouver, BC, Canada (2017).
Han, L., Checco, A., Difallah, D., Demartini, G., Sadiq, S.: Modelling User Behavior Dynamics with Embeddings. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 445–454. Association for Computing Machinery, Virtual Event, Ireland (2020). https://doi.org/10.1145/3340531.3411985.
Hansen, P.C.: The truncatedSVD as a method for regularization. BIT Numerical Mathematics. 27, 534–553 (1987). https://doi.org/10.1007/BF01937276.
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 28, 100–108 (1979). https://doi.org/10.2307/2346830.
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation. 9, 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735.
Hope, D., Keller, B.: MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Computational Linguistics and Intelligent Text Processing, 14th International Conference, CICLing 2013, Samos, Greece, March 24-30, 2013, Proceedings, Part I. pp. 368–381. Springer Berlin Heidelberg, Berlin; Heidelberg, Germany (2013). https://doi.org/10.1007/978-3-642-37247-6_30.
Hope, D., Keller, B.: UoS: A Graph-Based System for Graded Word Sense Induction. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). pp. 689–694. Association for Computational Linguistics, Atlanta, GA, USA (2013).
Hosseini, M., Phalp, K., Taylor, J., Ali, R.: The Four Pillars of Crowdsourcing: a Reference: a Reference Model. In: 2014 IEEE Eighth International Conference on Research Challenges in Information Science (RCIS). pp. 1–12. IEEE, Marrakech, Morocco (2014). https://doi.org/10.1109/RCIS.2014.6861072.
Howe, J.: Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. Crown Publishing Group, New York, NY, USA (2009).
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification. 2, 193–218 (1985). https://doi.org/10.1007/BF01908075.
Hunter, D.R.: MM algorithms for generalized Bradley-Terry models. Annals of Statistics. 32, 384–406 (2004). https://doi.org/10.1214/aos/1079120141.
Jana, A., Goyal, P.: Can Network Embedding of Distributional Thesaurus Be Combined with Word Vectors for Better Representation? In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 463–473. Association for Computational Linguistics, New Orleans, LA, USA (2018). https://doi.org/10.18653/v1/N18-1043.
Jansen, P., Ustalov, D.: TextGraphs 2020 Shared Task on Multi-Hop Inference for Explanation Regeneration. In: Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs). pp. 85–97. Association for Computational Linguistics, Barcelona, Spain (Online) (2020).
Järvelin, K., Kekäläinen, J.: Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems. 20, 422–446 (2002). https://doi.org/10.1145/582415.582418.
Johnson, D.B.: Efficient Algorithms for Shortest Paths in Sparse Networks. Journal of the ACM. 24, 1–13 (1977). https://doi.org/10.1145/321992.321993.
Jurgens, D., Klapaftis, I.: SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). pp. 290–299. Association for Computational Linguistics, Atlanta, GA, USA (2013).
Jurgens, D., Navigli, R.: It's All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation. Transactions of the Association for Computational Linguistics. 2, 449–464 (2014). https://doi.org/10.1162/tacl_a_00195.
Kapustin, V., Jamsen, A.: Vertex Degree Distribution for the Graph of Word Co-Occurrences in Russian. In: Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing. pp. 89–92. Association for Computational Linguistics, Rochester, NY, USA (2007).
Karger, D.R., Oh, S., Shah, D.: Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems. Operations Research. 62, 1–24 (2014). https://doi.org/10.1287/opre.2013.1235.
Kartsaklis, D., Pilehvar, M.T., Collier, N.: Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 1959–1970. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1221.
Kawahara, D., Peterson, D.W., Palmer, M.: A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Volume 1: Long Papers. pp. 1030–1040. Association for Computational Linguistics, Baltimore, MD, USA (2014). https://doi.org/10.3115/v1/P14-1097.
Kazemi, A., Pérez-Rosas, V., Mihalcea, R.: Biased TextRank: Unsupervised Graph-Based Content Extraction. In: Proceedings of the 28th International Conference on Computational Linguistics. pp. 1642–1652. International Committee on Computational Linguistics, Barcelona, Spain (Online) (2020). https://doi.org/10.18653/v1/2020.coling-main.144.
Kent, A., Berry, M.M., Luehrs Jr., F.U., Perry, J.W.: Machine literature searching VIII. Operational criteria for designing information retrieval systems. American Documentation. 6, 93–101 (1955). https://doi.org/10.1002/asi.5090060209.
Kipf, T.N., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks. In: 5th International Conference on Learning Representations, Conference Track Proceedings. OpenReview.net, Toulon, France (2017).
Kittur, A., Kraut, R.E.: Harnessing the Wisdom of Crowds in Wikipedia: Quality Through Coordination. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work. pp. 37–46. ACM, San Diego, CA, USA (2008). https://doi.org/10.1145/1460563.1460572.
Kohavi, R., Tang, D., Xu, Y.: Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press (2020).
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE Publications, Inc, Thousand Oaks, CA, USA (2018).
Krizhanovsky, A.A., Smirnov, A.V.: An approach to automated construction of a general-purpose lexical ontology based on Wiktionary. Journal of Computer and Systems Sciences International. 52, 215–225 (2013). https://doi.org/10.1134/S1064230713020068.
Kumar, S., Spezzano, F., Subrahmanian, V.S.: VEWS: A Wikipedia Vandal Early Warning System. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 607–616. ACM, Sydney, NSW, Australia (2015). https://doi.org/10.1145/2783258.2783367.
Leskovec, J., Sosič, R.: SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology. 8, 1:1–1:20 (2016). https://doi.org/10.1145/2898361.
Levy, O., Goldberg, Y.: Neural Word Embedding as Implicit Matrix Factorization. In: Advances in Neural Information Processing Systems 27. pp. 2177–2185. Curran Associates, Inc., Montréal, QC, Canada (2014).
Lewis, M., Steedman, M.: Unsupervised Induction of Cross-Lingual Semantic Relations. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 681–692. Association for Computational Linguistics, Seattle, WA, USA (2013).
Li, J.: Crowdsourced Text Sequence Aggregation Based on Hybrid Reliability and Representation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1761–1764. Association for Computing Machinery, Virtual Event, China (2020). https://doi.org/10.1145/3397271.3401239.
Li, W., Lu, Y., Huang, Z., Su, W., Liu, J., Feng, S., Sun, Y.: PGL at TextGraphs 2020 Shared Task: Explanation Regeneration using Language and Graph Learning Methods. In: Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs). pp. 98–102. Association for Computational Linguistics, Barcelona, Spain (Online) (2020).
Litvak, M., Last, M., Kandel, A.: DegExt: a language-independent keyphrase extractor. Journal of Ambient Intelligence and Humanized Computing. 4, 377–387 (2013). https://doi.org/10.1007/s12652-012-0109-z.
Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R., Trani, S.: RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1281–1284. Association for Computing Machinery, Shinjuku, Tokyo, Japan (2017). https://doi.org/10.1145/3077136.3084140.
Lutov, A., Khayati, M., Cudré-Mauroux, P.: Accuracy Evaluation of Overlapping and Multi-Resolution Clustering Algorithms on Large Datasets. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). pp. 1–8. IEEE, Kyoto, Japan (2019). https://doi.org/10.1109/BIGCOMP.2019.8679398.
von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing. 17, 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z.
Lyzinski, V., Sell, G., Jansen, A.: An Evaluation of Graph Clustering Methods for Unsupervised Term Discovery. In: INTERSPEECH-2015. pp. 3209–3213. International Speech Communication Association, Dresden, Germany (2015).
Ma, Y., Yu, D., Wu, T., Wang, H.: PaddlePaddle: An Open-Source Deep Learning Platform from Industrial Practice. Frontiers of Data and Computing. 1, 105–115 (2019). https://doi.org/10.11871/jfdc.issn.2096.742X.2019.01.011.
Manandhar, S., Klapaftis, I., Dligach, D., Pradhan, S.: SemEval-2010 Task 14: Word Sense Induction & Disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation. pp. 63–68. Association for Computational Linguistics, Uppsala, Sweden (2010).
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008).
Marcheggiani, D., Titov, I.: Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 1506–1515. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/D17-1159.
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics. 19, 313–330 (1993).
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure. 405, 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9.
Meyer, C.M., Mieskes, M., Stab, C., Gurevych, I.: DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. pp. 105–109. Dublin City University; Association for Computational Linguistics, Dublin, Ireland (2014).
Michail, D., Kinable, J., Naveh, B., Sichi, J.V.: JGraphT—A Java Library for Graph Data Structures and Algorithms. ACM Transactions on Mathematical Software. 46, 16:1–16:29 (2020). https://doi.org/10.1145/3381449.
Mihalcea, R., Radev, D.: Graph-Based Natural Language Processing and Information Retrieval. Cambridge University Press (2011). https://doi.org/10.1017/CBO9780511976247.
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (2004).
Mihalcea, R., Tarau, P., Figa, E.: PageRank on Semantic Networks, with Application to Word Sense Disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics. pp. 1126–1132. COLING, Geneva, Switzerland (2004). https://doi.org/10.3115/1220355.1220517.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in Neural Information Processing Systems 26. pp. 3111–3119. Curran Associates, Inc., Lake Tahoe, NV, USA (2013).
Moro, A., Raganato, A., Navigli, R.: Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics. 2, 231–244 (2014). https://doi.org/10.1162/tacl_a_00179.
Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence. 193, 217–250 (2012). https://doi.org/10.1016/j.artint.2012.07.001.
Newman, M.E.J.: Analysis of weighted networks. Physical Review E. 70, 056131 (2004). https://doi.org/10.1103/PhysRevE.70.056131.
Ng, A., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: Advances in Neural Information Processing Systems 14. pp. 846–856. MIT Press (2002).
Oleson, D., Sorokin, A., Laughlin, G.P., Hester, V., Le, J., Biewald, L.: Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. In: Human Computation: Papers from the 2011 AAAI Workshop (WS-11-11). pp. 43–48. Association for the Advancement of Artificial Intelligence, San Francisco, CA, USA (2011).
Padó, S.: User's guide to sigf: Significance testing by approximate randomisation. (2006).
Panchenko, A., Ruppert, E., Faralli, S., Ponzetto, S.P., Biemann, C.: Building a Web-Scale Dependency-Parsed Corpus from Common Crawl. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation. pp. 1816–1823. European Language Resources Association (ELRA), Miyazaki, Japan (2018).
Panchenko, A., Ustalov, D., Faralli, S., Ponzetto, S.P., Biemann, C.: Improving Hypernymy Extraction with Distributional Semantic Classes. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation. pp. 1541–1551. European Language Resources Association (ELRA), Miyazaki, Japan (2018).
Panchenko, A., Lopukhina, A., Ustalov, D., Lopukhin, K., Arefyev, N., Leontyev, A., Loukachevitch, N.: RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference "Dialogue". pp. 547–564. RSUH, Moscow, Russia (2018).
Paun, S., Carpenter, B., Chamberlain, J., Hovy, D., Kruschwitz, U., Poesio, M.: Comparing Bayesian Models of Annotation. Transactions of the Association for Computational Linguistics. 6, 571–585 (2018). https://doi.org/10.1162/tacl_a_00040.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 12, 2825–2830 (2011).
Pelevina, M., Arefiev, N., Biemann, C., Panchenko, A.: Making Sense of Word Embeddings. In: Proceedings of the 1st Workshop on Representation Learning for NLP. pp. 174–183. Association for Computational Linguistics, Berlin, Germany (2016). https://doi.org/10.18653/v1/W16-1620.
Peng, Y., Choi, B., Xu, J.: Graph Learning for Combinatorial Optimization: A Survey of State-of-the-Art. Data Science and Engineering. 6, 119–141 (2021). https://doi.org/10.1007/s41019-021-00155-3.
Pennington, J., Socher, R., Manning, C.: GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014). https://doi.org/10.3115/v1/D14-1162.
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: Online Learning of Social Representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 701–710. ACM, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623732.
Poesio, M., Chamberlain, J., Kruschwitz, U., Robaldo, L., Ducceschi, L.: Phrase Detectives: Utilizing Collective Intelligence for Internet-scale Language Resource Creation. ACM Transactions on Interactive Intelligent Systems. 3, 3:1–3:44 (2013). https://doi.org/10.1145/2448116.2448119.
Powers, D.M.W.: Evaluation Evaluation. In: 18th European Conference on Artificial Intelligence, Proceedings. pp. 843–844. IOS Press, Patras, Greece (2008). https://doi.org/10.3233/978-1-58603-891-5-843.
Rajpurkar, P., Jia, R., Liang, P.: Know What You Don't Know: Unanswerable Questions for SQuAD. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 784–789. Association for Computational Linguistics, Melbourne, VIC, Australia (2018). https://doi.org/10.18653/v1/P18-2124.
Rand, W.M.: Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association. 66, 846–850 (1971). https://doi.org/10.1080/01621459.1971.10482356.
Reimers, N., Gurevych, I.: Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 338–348. Association for Computational Linguistics, Copenhagen, Denmark (2017). https://doi.org/10.18653/v1/D17-1035.
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 4902–4912. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.442.
van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, London, UK (1979).
Ristoski, P., Rosati, J., Di Noia, T., De Leone, R., Paulheim, H.: RDF2Vec: RDF graph embeddings and their applications. Semantic Web. 1–32 (2018). https://doi.org/10.3233/SW-180317.
Rodrigo, E.G., Aledo, J.A., Gámez, J.A.: spark-crowd: A Spark Package for Learning from Crowdsourced Big Data. Journal of Machine Learning Research. 20, 1–5 (2019).
Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: An API Oriented Open-Source Python Framework for Unsupervised Learning on Graphs. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 3125–3132. Association for Computing Machinery, Virtual Event, Ireland (2020). https://doi.org/10.1145/3340531.3412757.
Saito, T., Rehmsmeier, M.: The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE. 10, 1–21 (2015). https://doi.org/10.1371/journal.pone.0118432.
Sakai, T.: Evaluating Evaluation Metrics Based on the Bootstrap. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 525–532. Association for Computing Machinery, Seattle, WA, USA (2006). https://doi.org/10.1145/1148170.1148261.
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., Aroyo, L.: "Everyone wants to do the model work, not the data work": Data Cascades in High-Stakes AI. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–15. ACM, Yokohama, Japan (Online) (2021). https://doi.org/10.1145/3411764.3445518.
Scozzafava, F., Maru, M., Brignone, F., Torrisi, G., Navigli, R.: Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 37–46. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-demos.6.
Segalovich, I.: Machine Learning in Search Quality at Yandex, https://www.eurospider.com/images/SIGIR_2010/04_SIGIR-2010-SEGALOVICH.pdf, (2010).
Şenel, L.K., Utlu, İ., Yücesoy, V., Koç, A., Çukur, T.: Semantic Structure and Interpretability of Word Embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 26, 1769–1779 (2018). https://doi.org/10.1109/TASLP.2018.2837384.
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 614–622. ACM, Las Vegas, NV, USA (2008). https://doi.org/10.1145/1401890.1401965.
Sheshadri, A., Lease, M.: SQUARE: A Benchmark for Research on Computing Crowd Consensus. In: First AAAI Conference on Human Computation and Crowdsourcing. pp. 156–164. Association for the Advancement of Artificial Intelligence (2013).
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 22, 888–905 (2000). https://doi.org/10.1109/34.868688.
Snow, R., O'Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 254–263. Association for Computational Linguistics, Honolulu, HI, USA (2008). https://doi.org/10.3115/1613715.1613751.
Steyvers, M., Tenenbaum, J.B.: The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science. 29, 41–78 (2005). https://doi.org/10.1207/s15516709cog2901_3.
Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Information Quality Work Organization in Wikipedia. Journal of the American Society for Information Science and Technology. 59, 983–1001 (2008). https://doi.org/10.1002/asi.20813.
Sun, Y., Wang, S., Li, Y., Feng, S., Tian, H., Wu, H., Wang, H.: ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. Proceedings of the AAAI Conference on Artificial Intelligence. 34, 8968–8975 (2020). https://doi.org/10.1609/aaai.v34i05.6428.
Tauer, G., Date, K., Nagi, R., Sudit, M.: An incremental graph-partitioning algorithm for entity resolution. Information Fusion. 46, 171–183 (2019). https://doi.org/10.1016/j.inffus.2018.06.001.
Turner, H., Firth, D.: Bradley-Terry Models in R: The BradleyTerry2 Package. Journal of Statistical Software, Articles. 48, 1–21 (2012). https://doi.org/10.18637/jss.v048.i09.
Ustalov, D.: Words Worth Attention: Predicting Words of the Week on the Russian Wiktionary. In: Knowledge Engineering and the Semantic Web, 5th International Conference, KESW 2014, Kazan, Russia, September 29–October 1, 2014. Proceedings. pp. 196–207. Springer International Publishing, Cham, Switzerland (2014). https://doi.org/10.1007/978-3-319-11716-4_17.
Ustalov, D.: A Crowdsourcing Engine for Mechanized Labor. Proceedings of the Institute for System Programming. 27, 351–364 (2015). https://doi.org/10.15514/ISPRAS-2015-27(3)-25.
Ustalov, D.: Teleboyarin—Mechanized Labor for Telegram. In: Proceedings of the AINL-ISMW FRUCT 2015. pp. 195–197 (2015).
Ustalov, D.: Towards Crowdsourcing and Cooperation in Linguistic Resources. In: Information Retrieval: 8th Russian Summer School, RuSSIR 2014, Nizhniy Novgorod, Russia, August 18-22, 2014, Revised Selected Papers. pp. 348–358. Springer International Publishing, Cham, Switzerland (2015). https://doi.org/10.1007/978-3-319-25485-2_14.
Ustalov, D., Panchenko, A., Biemann, C., Ponzetto, S.P.: Watset: Local-Global Graph Clustering with Applications in Sense and Frame Induction. Computational Linguistics. 45, 423–479 (2019). https://doi.org/10.1162/COLI_a_00354.
Vannella, D., Jurgens, D., Scarfini, D., Toscani, D., Navigli, R.: Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1294–1304. Association for Computational Linguistics, Baltimore, MD, USA (2014). https://doi.org/10.3115/v1/P14-1122.
Velardi, P., Faralli, S., Navigli, R.: OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics. 39, 665–707 (2013). https://doi.org/10.1162/COLI_a_00146.
Vitter, J.S.: Random Sampling with a Reservoir. ACM Transactions on Mathematical Software. 11, 37–57 (1985). https://doi.org/10.1145/3147.3165.
Vlasblom, J., Wodak, S.J.: Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics. 10, 99 (2009). https://doi.org/10.1186/1471-2105-10-99.
Voorhees, E.M.: The TREC-8 Question Answering Track Report. In: Proceedings of the 8th Text REtrieval Conference. pp. 77–82. NIST, Gaithersburg, MD, USA (1999).
Wang, A., Hoang, C.D.V., Kan, M.-Y.: Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation. 47, 9–31 (2013). https://doi.org/10.1007/s10579-012-9176-1.
Wang, J., Ipeirotis, P.G., Provost, F.: Quality-Based Pricing for Crowdsourced Workers. New York University (2013).
Wang, M., Yu, L., Zheng, D., Gan, Q., Gai, Y., Ye, Z., Li, M., Zhou, J., Huang, Q., Ma, C., Huang, Z., Guo, Q., Zhang, H., Lin, H., Zhao, J., Li, J., Smola, A., Zhang, Z.: Deep Graph Library: Towards Efficient And Scalable Deep Learning on Graphs. In: ICLR 2019 Workshop on Representation Learning on Graphs and Manifolds (2019).
Wang, P., Li, X., Wu, R.: A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia. Journal of Information Science. 1–16 (2019). https://doi.org/10.1177/0165551519877646.
Wang, Q., Mao, Z., Wang, B., Guo, L.: Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering. 29, 2724–2743 (2017). https://doi.org/10.1109/TKDE.2017.2754499.
Wang, Y., Wang, L., Li, Y., He, D., Liu, T.-Y.: A Theoretical Analysis of NDCG Type Ranking Measures. In: Proceedings of the 26th Annual Conference on Learning Theory. pp. 25–54. PMLR, Princeton, NJ, USA (2013).
Whitehill, J., Wu, T., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In: Advances in Neural Information Processing Systems 22. pp. 2035–2043. Curran Associates, Inc., Vancouver, BC, Canada (2009).
Wilkinson, D.M., Huberman, B.A.: Cooperation and Quality in Wikipedia. In: Proceedings of the 2007 International Symposium on Wikis. pp. 157–164. ACM, Montréal, QC, Canada (2007). https://doi.org/10.1145/1296951.1296968.
Wu, L., Fisch, A., Chopra, S., Adams, K., Bordes, A., Weston, J.: StarSpace: Embed All The Things! In: The Thirty-Second AAAI Conference on Artificial Intelligence. pp. 5569–5577. Association for the Advancement of Artificial Intelligence (2018).
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems. 32, 4–24 (2021). https://doi.org/10.1109/TNNLS.2020.2978386.
Xiong, C., Power, R., Callan, J.: Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. In: Proceedings of the 26th International Conference on World Wide Web. pp. 1271–1279. International World Wide Web Conferences Steering Committee, Perth, WA, Australia (2017). https://doi.org/10.1145/3038912.3052558.
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How Powerful are Graph Neural Networks? In: 7th International Conference on Learning Representations, Conference Track Proceedings. OpenReview.net, New Orleans, LA, USA (2019).
Yang, J., Leskovec, J.: Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. pp. 587–596. Association for Computing Machinery, Rome, Italy (2013). https://doi.org/10.1145/2433396.2433471.
Yang, J., Drake, T., Damianou, A., Maarek, Y.: Leveraging Crowdsourcing Data for Deep Active Learning An Application: Learning Intents in Alexa. In: Proceedings of the 2018 World Wide Web Conference. pp. 23–32. International World Wide Web Conferences Steering Committee, Lyon, France (2018). https://doi.org/10.1145/3178876.3186033.
Yao, L., Mao, C., Luo, Y.: Graph Convolutional Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence. 33, 7370–7377 (2019). https://doi.org/10.1609/aaai.v33i01.33017370.
Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics - Volume 2. pp. 947–953. Association for Computational Linguistics, Saarbrücken, Germany (2000). https://doi.org/10.3115/992730.992783.
You, J., Ying, Z., Leskovec, J.: Design Space for Graph Neural Networks. In: Advances in Neural Information Processing Systems 33. pp. 17009–17021. Curran Associates, Inc., Montréal, QC, Canada (2020).
Zesch, T., Müller, C., Gurevych, I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In: Proceedings of the 6th International Conference on Language Resources and Evaluation. pp. 1646–1652. European Language Resources Association (ELRA), Marrakech, Morocco (2008).
Zheng, L., Albano, C.M., Vora, N.M., Mai, F., Nickerson, J.V.: The Roles Bots Play in Wikipedia. Proceedings of the ACM on Human-Computer Interaction. 3, 215:1–215:20 (2019). https://doi.org/10.1145/3359317.
Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth Inference in Crowdsourcing: Is the Problem Solved? Proceedings of the VLDB Endowment. 10, 541–552 (2017). https://doi.org/10.14778/3055540.3055547.
Zhong, W., Xu, J., Tang, D., Xu, Z., Duan, N., Zhou, M., Wang, J., Yin, J.: Reasoning Over Semantic-Level Graph for Fact Checking. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 6170–6180. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.549.

	All versions	This version
Views	866	136
Downloads	826	379
Data volume	6.7 GB	3.0 GB

Graphs, Computation, and Language

Files

Clustering.pdf

Files (28.0 MB)

Additional details

Related works

References

Graphs, Computation, and Language

Creators

Description

Files

Clustering.pdf

Files (28.0 MB)

Additional details

Related works

References