Published September 17, 2018 | Version Accepted pre-print
Conference paper Open

A Data-Driven Metric of Hardness for WSC Sentences

  • 1. Open University of Cyprus, Nicosia, Cyprus
  • 2. Open University of Cyprus, Nicosia, Cyprus & Research Center on Interactive Media, Smart Systems, and Emerging Technologies


The Winograd Schema Challenge (WSC) | the task of resolving pronouns in certain sentences where shallow parsing techniques seem not to be directly applicable | has been proposed as an alternative to the Turing Test. According to Levesque, having access to a large corpus of text would likely not help much in the WSC. Among a number of attempts to tackle this challenge, one particular approach has demonstrated the plausibility of using commonsense knowledge automatically acquired from raw text in English Wikipedia. Here, we present the results of a large-scale experiment that shows how the performance of that particular automated approach varies with the availability of training material. We compare the results of this experiment with two studies: one from the literature that investigates how adult native speakers tackle the WSC, and one that we design and undertake to investigate how teenager non-native speakers tackle the WSC. We nd that the performance of the automated approach correlates positively with the performance of humans, suggesting that the performance of the particular automated approach could be used as a metric of hardness for WSC instances.


This work has been partly supported by the project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 739578 (RISE – Call: H2020-WIDESPREAD-01-2016-2017-TeamingPhase2) and the Government of the Republic of Cyprus through the Directorate General for European Programmes, Coordination and Development. ©The authors



Files (1.5 MB)

Additional details


RISE – Research Center on Interactive Media, Smart System and Emerging Technologies 739578
European Commission


  • Evan Ackerman. Winograd Schema Challenge Results: AI Common Sense Still a Problem, for Now. Spectrum, 2016.
  • Dan Bailey, Amelia Harrison, Yuliya Lierler, Vladimir Lifschitz, and Julian Michael. The Winograd Schema Challenge and Reasoning about Correlation. In In Working Notes of the Symposium on Logical Formalizations of Commonsense Reasoning, 2015.
  • David Bender. Establishing a Human Baseline for the Winograd Schema Challenge. In MAICS, pages 39{45, 2015.
  • Eric Bengtson and Dan Roth. Understanding the Value of Features for Coreference Resolution. In EMNLP, 10 2008.
  • Tejas Ulhas Budukh. An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities, 2013.
  • Nicos Isaak and Loizos Michael. Tackling the Winograd Schema Challenge Through Machine Logical Inferences. In David Pearce and Helena Soa Pinto, editors, STAIRS, volume 284 of Frontiers in Articial Intelligence and Applications, pages 75{86. IOS Press, 2016.
  • Nicos Isaak and Loizos Michael. Using the Winograd Schema Challenge as a CAPTCHA. In Proceedings of the 4th Global Conference on Articial Intelligence (GCAI 2018). EasyChair, 2018.
  • Hector J. Levesque. The Winograd Schema Challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, number SS-11-06. American Association for Articial Intelligence, 2011.
  • Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55{60, 2014.
  • Loizos Michael. Reading Between the Lines. In Proceedings of the 21st International Joint Con- ference on Articial Intelligence (IJCAI 2009), pages 1525{1530, July 2009.
  • Loizos Michael. Partial observability and learnability. Artif. Intell., 174(11):639{669, 2010.
  • Loizos Michael. Machines with Websense. In Proc. of 11th International Symposium on Logical Formalizations of Commonsense Reasoning (Commonsense 13), 2013.
  • Loizos Michael and Leslie G. Valiant. A First Experimental Demonstration of Massive Knowl- edge Infusion. In Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR 2008), pages 378{388. AAAI Press, September 2008.
  • Haoruo Peng, Daniel Khashabi, and Dan Roth. Solving Hard Coreference Problems. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 809{819, 2015.
  • Altaf Rahman and Vincent Ng. Resolving Complex Cases of Denite Pronouns: The Winograd Schema Challenge. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 777{789, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
  • Adam Richard-Bollans, L Gomez Alvarez, and Anthony G Cohn. The Role of Pragmatics in Solving the Winograd Schema Challenge. In Proceedings of 13th International Symposium on Commonsense Reasoning (Commonsense-2017). CEUR Workshop Proceedings, 2017.
  • Arpit Sharma, Nguyen H Vo, Somak Aditya, and Chitta Baral. Towards Addressing the Winograd Schema Challenge - Building and Using a Semantic Parser and a Knowledge Hunting Module. In Proceedings of the Twenty-Fourth International Joint Conference on Articial Intelligence, IJCAI, pages 25{31, 2015.
  • Leslie G. Valiant. Knowledge Infusion. In Proceedings of the 21st National Conference on Articial Intelligence - Volume 2, AAAI'06, pages 1546{1551. AAAI Press, 2006.