One of the advantages of Mordecai’s HTTP-based interface is that any language that can make HTTP POST requests can interact with it without needed special Mordecai packages or code. This example demonstrates how to read in a text file, have Mordecai geolocate it to the country level, and then do a full geoparse with Mordecai. It then shows how to format the returned data and easily plot it on a map.
For this demonstration, we need httr
for handing the request to Mordecai, dplyr
for formatting the result, and leaflet
for making a quick interactive map of the results.
library(httr)
library(dplyr)
library(leaflet)
Set the endpoints for the country
and places
endpoints. Here, Mordecai is running locally.
country_url <- "http://localhost:5000/country"
places_url <- "http://localhost:5000/places"
We can then make a GET request to Mordecai to make sure it’s up and running and that we can talk to it.
t <- GET(url = country_url, as = "parsed")
content(t)
## [1] " This service expects a POST in the form '{\"text\":\"On 12\n August, the BBC reported that...\"}' It will return a list of ISO 3 character\n country codes for the country or countries it thinks the text is about. It\n determines the country focus by comparing the word2vec vectors for the\n places mentioned in the text with the vector representation of each country\n in the world, picking the closest."
This response lets us know that it is and gives us some guidance on what data format it expects.
First, let’s test Mordecai’s country coding capability. We can read in one of the human rights texts prepared by Fariss et al.…
bol <- paste(readLines("BOL_2009_Amnesty_International.txt"), collapse = " ")
…and then POST it to the country
endpoint.
bol_country <- POST(url = country_url,
as = "parsed",
body = list("text" = bol),
encode = "json")
content(bol_country)
## [1] "BOL"
Thankfully, since this is indeed a text about Bolivia, Mordecai codes it as BOL
.
Now let’s do a full geoparsing, extracting all the place names in the text and finding their correct entries in the gazetteer. The final line formats the response as a dataframe.
bol_places <- POST(url = places_url,
as = "parsed",
body = list("text" = bol),
encode = "json")
bol_places_df <- bind_rows(content(bol_places))
bol_places_df
## Source: local data frame [12 x 6]
##
## placename countrycode lon admin1 searchterm lat
## (chr) (chr) (dbl) (chr) (chr) (dbl)
## 1 Santa Cruz BOL -62.51667 El Beni Santa Cruz -13.61667
## 2 Pando BOL -67.31667 Pando Pando -11.80000
## 3 Sucre BOL -65.28875 NA Sucre -19.00708
## 4 Pando BOL -67.31667 Pando Pando -11.80000
## 5 Sucre BOL -65.28875 NA Sucre -19.00708
## 6 Pando BOL -67.31667 Pando Pando -11.80000
## 7 Pando BOL -67.31667 Pando Pando -11.80000
## 8 Teoponte BOL -67.81667 La Paz Teoponte -15.46667
## 9 La Paz BOL -67.01667 La Paz La Paz -12.96667
## 10 Cochabamba BOL -64.96667 El Beni Cochabamba -15.35000
## 11 La Paz BOL -67.01667 La Paz La Paz -12.96667
## 12 Pando BOL -67.31667 Pando Pando -11.80000
These locations pass an eyeball test: no placename was located to a completely different looking place. Now, for fun, we can plot these locations on an interactive leaflet map, sized according to their mentions in the text.
bol_places_df %>%
group_by(placename) %>%
mutate(count = n()) %>%
distinct() %>%
leaflet(.) %>%
addTiles() %>%
addCircleMarkers(popup = ~placename, radius = ~3*(count + 2))
A more serious example will use many more texts than this one and would probably wrap the raw POST requests into a function. But hopefully this example will get R users started with Mordecai.
Fariss, Christopher J., Fridolin J. Linder, Charles D. Crabtree, Megan A. Biek, Ana-Sophia M. Ross, Taranamol Kaur, and Michael Tsai. “Human Rights Texts: Converting Human Rights Primary Source Documents into Data.” Harvard Dataverse. doi:10.7910/DVN/IAH8OY.