Conference paper Open Access

Location Extraction from Social Media: Geoparsing, Location Disambiguation and Geotagging

Stuart E. Middleton; Giorgos Kordopatis-Zilos; Symeon Papadopoulos; Yiannis Kompatsiaris

Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This paper evaluates five ‘best of class’ location extraction algorithms. We develop a geoparsing algorithm using an OpenStreetMap database, and a geotagging algorithm using a language model constructed from social media tags and multiple gazetteers. Third party work evaluated includes a DBpediabased entity recognition and disambiguation approach, a named entity recognition and Geonames gazetteer approach and a Google Geocoder API approach. We perform two quantitative benchmark evaluations, one geoparsing tweets and one geotagging Flickr posts, to compare all approaches. We also perform a qualitative evaluation recalling top N location mentions from tweets during major news events. The OpenStreetMap approach was best (F1 0.90+) for geoparsing English, and the language model approach was best (F1 0.66) for Turkish. The language model was best (F1@1km 0.49) for the geotagging evaluation. The map-database was best (R@20 0.60+) in the qualitative evaluation. We report on strengths, weaknesses and a detailed failure analysis for the approaches and suggest concrete areas for further research.

Stuart E. Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Yiannis Kompatsiaris, "Location Extraction from Social Media: Geoparsing, Location Disambiguation and Geotagging", in Proc. 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, July 2019.
Files (434.4 kB)
Name Size
Location Extraction from Social Media.pdf
md5:74f9c999a8f03646cf75219ea05575c4
434.4 kB Download
42
50
views
downloads
Views 42
Downloads 50
Data volume 21.7 MB
Unique views 36
Unique downloads 45

Share

Cite as