One of the advantages of Mordecai’s HTTP-based interface is that any language that can make HTTP POST requests can interact with it without needed special Mordecai packages or code. This example demonstrates how to read in a text file, have Mordecai geolocate it to the country level, and then do a full geoparse with Mordecai. It then shows how to format the returned data and easily plot it on a map.

For this demonstration, we need httr for handing the request to Mordecai, dplyr for formatting the result, and leaflet for making a quick interactive map of the results.

library(httr)
library(dplyr)
library(leaflet)

Set the endpoints for the country and places endpoints. Here, Mordecai is running locally.

country_url <- "http://localhost:5000/country"
places_url <- "http://localhost:5000/places"

We can then make a GET request to Mordecai to make sure it’s up and running and that we can talk to it.

t <- GET(url = country_url, as = "parsed")
content(t)
## [1] " This service expects a POST in the form '{\"text\":\"On 12\n    August, the BBC reported that...\"}' It will return a list of ISO 3 character\n    country codes for the country or countries it thinks the text is about. It\n    determines the country focus by comparing the word2vec vectors for the\n    places mentioned in the text with the vector representation of each country\n    in the world, picking the closest."

This response lets us know that it is and gives us some guidance on what data format it expects.

First, let’s test Mordecai’s country coding capability. We can read in one of the human rights texts prepared by Fariss et al.

bol <- paste(readLines("BOL_2009_Amnesty_International.txt"), collapse = " ")

…and then POST it to the country endpoint.

bol_country <- POST(url = country_url, 
                   as = "parsed", 
                   body = list("text" = bol), 
                   encode = "json")

content(bol_country)
## [1] "BOL"

Thankfully, since this is indeed a text about Bolivia, Mordecai codes it as BOL.

Now let’s do a full geoparsing, extracting all the place names in the text and finding their correct entries in the gazetteer. The final line formats the response as a dataframe.

bol_places <- POST(url = places_url, 
                   as = "parsed", 
                   body = list("text" = bol), 
                   encode = "json")

bol_places_df <- bind_rows(content(bol_places))
bol_places_df
## Source: local data frame [12 x 6]
## 
##     placename countrycode       lon  admin1 searchterm       lat
##         (chr)       (chr)     (dbl)   (chr)      (chr)     (dbl)
## 1  Santa Cruz         BOL -62.51667 El Beni Santa Cruz -13.61667
## 2       Pando         BOL -67.31667   Pando      Pando -11.80000
## 3       Sucre         BOL -65.28875      NA      Sucre -19.00708
## 4       Pando         BOL -67.31667   Pando      Pando -11.80000
## 5       Sucre         BOL -65.28875      NA      Sucre -19.00708
## 6       Pando         BOL -67.31667   Pando      Pando -11.80000
## 7       Pando         BOL -67.31667   Pando      Pando -11.80000
## 8    Teoponte         BOL -67.81667  La Paz   Teoponte -15.46667
## 9      La Paz         BOL -67.01667  La Paz     La Paz -12.96667
## 10 Cochabamba         BOL -64.96667 El Beni Cochabamba -15.35000
## 11     La Paz         BOL -67.01667  La Paz     La Paz -12.96667
## 12      Pando         BOL -67.31667   Pando      Pando -11.80000

These locations pass an eyeball test: no placename was located to a completely different looking place. Now, for fun, we can plot these locations on an interactive leaflet map, sized according to their mentions in the text.

bol_places_df %>% 
  group_by(placename) %>% 
  mutate(count = n()) %>% 
  distinct() %>% 
  leaflet(.) %>% 
    addTiles() %>%
    addCircleMarkers(popup = ~placename, radius = ~3*(count + 2))

A more serious example will use many more texts than this one and would probably wrap the raw POST requests into a function. But hopefully this example will get R users started with Mordecai.

References

Fariss, Christopher J., Fridolin J. Linder, Charles D. Crabtree, Megan A. Biek, Ana-Sophia M. Ross, Taranamol Kaur, and Michael Tsai. “Human Rights Texts: Converting Human Rights Primary Source Documents into Data.” Harvard Dataverse. doi:10.7910/DVN/IAH8OY.