UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Conference paper Open Access

Location Extraction from Social Media: Geoparsing, Location Disambiguation and Geotagging

Stuart E. Middleton; Giorgos Kordopatis-Zilos; Symeon Papadopoulos; Yiannis Kompatsiaris

Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpediabased entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.

Stuart E. Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Yiannis Kompatsiaris, "Location Extraction from Social Media: Geoparsing, Location Disambiguation and Geotagging", in Proc. 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, July 2019.
Files (434.4 kB)
Name Size
Location Extraction from Social Media.pdf
434.4 kB Download
Views 113
Downloads 114
Data volume 49.5 MB
Unique views 104
Unique downloads 108


Cite as