OSMtidy - Vignette 2, Walkthrough

Dr Annie Visser-Quinn, a.visser-quinn@hw.ac.uk

2020-07-10

1. Prerequisites

2. Set-up

To use OSMtidy, you should save your R script within the OSMtidy directory. To begin the set-up, set out your script as follows:

# Prepare the environment
rm(list = ls()); cat("/014"); gc() 

# Set working directory to the script's folder
setwd(dirname(rstudioapi::getSourceEditorContext()$path)); getwd() 

# Load the required packages
library(tidyverse)
library(lubridate)
library(osmdata)
library(sf)
library(pbapply)
library(progress)
library(data.table)
library(readxl)
library(openxlsx)

Now you are ready to import the functions that make up OSMtidy V0.0.4. This walkthrough will use nine of the ten functions that make up OSMtidy.

source("functions/functions.R")
ls()
#>  [1] "cookieCutter"   "dataCut"        "dataExport"     "dataExtract"   
#>  [5] "dataFilter"     "dataShapefile"  "dataSummary"    "dataTidy"      
#>  [9] "dataWrangle"    "filterOverview"

3. Using OSMtidy

3.1. Data

Using the function dataShapefile() we can import the shapefile for which data is to be extracted. There are two input arguments:

At each step, you can print a summary of the OSMtidy outputs using the function dataSummary().

You can also export any OSMtidy output using the function dataExport(). All outputs are time stamped and saved in the outputs folder within the OSMtidy directory.

3.2. Extract OSM data via the R package osmdata

The OpenStreetMap data is extracted, via the R package osmdata and the overpass server, using the function dataExtract(). Timestamps and progress are printed when the function is running.

OSMtidy V0.0.4 extracts 47 of 209 available features in osmdata. To change the features extracted, add or delete lines in functions/features.txt. A list of the available features in osmdata can be accessed via the function osmdata::available_features.

The function dataExtract has three input arguments:

3.2.1 DETAILS from the R package osmdata

timeout It may be necessary to increase this value for large queries, because the server may time out before all data are delivered. memsize The default memory size for the ‘overpass’ server in bytes; may need to be increased in order to handle large queries.> See https://wiki.openstreetmap.org/wiki/Overpass_API#Resource_management_options_.28osm-script.29 for explanation of timeout and memsize (or maxsize in overpass terms). Note in particular the comment that queries with arbitrarily large memsize are likely to be rejected.

The code chunk below is not run. This is because (1) the data extraction may take some time depending on the size of the area; and (2), to avoid flooding the overpass server.

The output can be interrogated and exported, using the functions dataSummary and dataExport respectively, as before.

3.3. Cut out data

In step 2, the data was extracted as a “bounding box” (a rectangle). In step 3, the data is cut to the shapefile using the function dataCut(). Timestamps and progress are printed when the function is running. The function dataCut has two input arguments:

Outputs can be interrogated and exported as before.

3.4. Data wrangling

Using the function dataWrangle we can tidy up (or wrangle) the data before filtering. Timestamps and progress are printed when the function is running. There is one input argument:

dlWrangle <- dataWrangle(dataCut = dlCut)
#> 13:12:53 Step one of three
#> 13:12:53 Step two of three
#> 13:12:55 Step three of three
#> 13:12:59 Complete, preparing output

dlWrangle %>% dataSummary
#> $class
#> [1] "list"                "OSMtidy_dataWrangle"
#> 
#> $byGeometry
#>           data            type total percent
#> 1 dataWrangled      linestring  4132    4.87
#> 2 dataWrangled multilinestring    12    0.01
#> 3 dataWrangled    multipolygon    28    0.03
#> 4 dataWrangled           point  5881    6.93
#> 5 dataWrangled         polygon  7017    8.26
#> 6     noDetail      linestring    13    0.02
#> 7     noDetail           point 67505   79.50
#> 8     noDetail         polygon   328    0.39
#> 
#> $byFeature
#>            data          feature total percent
#> 1  dataWrangled          amenity   640    0.75
#> 2  dataWrangled          barrier  2873    3.38
#> 3  dataWrangled         building  4240    4.99
#> 4  dataWrangled            craft    21    0.02
#> 5  dataWrangled         cycleway     1    0.00
#> 6  dataWrangled        emergency    65    0.08
#> 7  dataWrangled generator:source    15    0.02
#> 8  dataWrangled       healthcare     2    0.00
#> 9  dataWrangled          highway  1161    1.37
#> 10 dataWrangled         historic     6    0.01
#> 11 dataWrangled          landuse   383    0.45
#> 12 dataWrangled          leisure  2348    2.77
#> 13 dataWrangled         military     1    0.00
#> 14 dataWrangled          natural  4163    4.90
#> 15 dataWrangled           office    81    0.10
#> 16 dataWrangled            power    11    0.01
#> 17 dataWrangled          railway    12    0.01
#> 18 dataWrangled          service     2    0.00
#> 19 dataWrangled             shop   365    0.43
#> 20 dataWrangled  social_facility     1    0.00
#> 21 dataWrangled            usage     7    0.01
#> 22 dataWrangled            water     1    0.00
#> 23 dataWrangled             <NA>   671    0.79
#> 24     noDetail          amenity  3053    3.60
#> 25     noDetail          barrier  9030   10.63
#> 26     noDetail           bridge    40    0.05
#> 27     noDetail         building 28484   33.54
#> 28     noDetail         cycleway    71    0.08
#> 29     noDetail generator:method    52    0.06
#> 30     noDetail generator:source   188    0.22
#> 31     noDetail          highway  5251    6.18
#> 32     noDetail         historic    31    0.04
#> 33     noDetail          landuse  6182    7.28
#> 34     noDetail          leisure 12464   14.68
#> 35     noDetail          natural  1121    1.32
#> 36     noDetail           office    24    0.03
#> 37     noDetail            power   176    0.21
#> 38     noDetail          railway   168    0.20
#> 39     noDetail      residential    51    0.06
#> 40     noDetail          service   686    0.81
#> 41     noDetail             shop    48    0.06
#> 42     noDetail  social_facility   172    0.20
#> 43     noDetail       substation    59    0.07
#> 44     noDetail  traffic_calming    23    0.03
#> 45     noDetail            usage    31    0.04
#> 46     noDetail          voltage    48    0.06
#> 47     noDetail            water   393    0.46

dataExport(data = dlWrangle, name = locationName)
#> Files saved as:
#> 
#>  outputs/exampleEdinburgh_4_dataWrangle_20200710-131300.RDS
#>  outputs/exampleEdinburgh_4_dataWrangle-noDetail_20200710-131300.xlsx

3.5. Data filtering

The main function of OSMtidy is dataFilter(). Here, the data is filtered based on rules set out in the excel file filters.xlsx; this can be found in the main OSMtidy directory. You may adjust these rules by editting the spreadsheet. See Vignette 3 for further details. Timestamps and progress are printed when the function is running. There are three input arguments to dataFilter():

Depending on the location size, number of filters and computer performance, filters can take anything from a couple of minutes (the example ward) to multiple hours to run (City of London and Boroughs). The code chunk below is not run.

The output may be interrogated and exported as before.

dlFilter %>% dataSummary
#> $class
#> [1] "list"               "OSMtidy_dataFilter"
#> 
#> $summary
#>         data total percent
#> 1   filtered 12194   71.44
#> 2 unfiltered   562    3.29
#> 3   validate  4314   25.27
#> 
#> $summaryFiltered
#> # A tibble: 176 x 2
#>    desc                           total
#>    <chr>                          <int>
#>  1 Amenity; ATM                      14
#>  2 Amenity; Bicycle parking          49
#>  3 Amenity; Bike rental point         6
#>  4 Amenity; Car wash                  2
#>  5 Amenity; Fire hydrant             64
#>  6 Amenity; Flood defence             2
#>  7 Amenity; Fountain (decorative)     1
#>  8 Amenity; Fuel station              1
#>  9 Amenity; Garages and sheds       325
#> 10 Amenity; Information board         1
#> # ... with 166 more rows
#> 
#> $byFeature
#>          data         feature total
#> 1  unfiltered        building   531
#> 2  unfiltered         landuse    12
#> 3  unfiltered            <NA>    19
#> 4    validate         amenity     4
#> 5    validate        building    28
#> 6    validate           craft     3
#> 7    validate         highway     6
#> 8    validate        historic     1
#> 9    validate         leisure     6
#> 10   validate        military     1
#> 11   validate          office     7
#> 12   validate         railway    12
#> 13   validate            shop     3
#> 14   validate social_facility     1
#> 15   validate            <NA>  4242

dataExport(data = dlFilter, name = locationName)
#> Loading required package: xlsx
#> 
#> Attaching package: 'xlsx'
#> The following objects are masked from 'package:openxlsx':
#> 
#>     createWorkbook, loadWorkbook, read.xlsx, saveWorkbook, write.xlsx
#> Files saved as:
#> 
#>  outputs/exampleEdinburgh_5_dataFilter-unfiltered_20200710-131316.xlsx
#>  outputs/exampleEdinburgh_5_dataFilter-filtered_20200710-131317.csv
#>  outputs/exampleEdinburgh_5_dataFilter-filtered_20200710-131317.RDS
#>  outputs/exampleEdinburgh_5_dataFilter-validate_20200710-131318.xlsx

3.6. Data tidy

The final step. The function dataTidy() generates a single tidied output based on any combination of the filtered, validated, unfiltered and no detail data.

Note that multiple outputs from dataWrangle() and dataFilter() were spreadsheets (.xlsx extension). You may manually adjust the desc column in these and reimport them in this step.

The input argument is a list of the objects to be imported. They can either be imported locally, as objects from the R environment, or from the manually adjusted spreadsheets. The code chunk below focusses on the outputs of dataFilter() only. Vignettes 3 and 4 introduce a number of alternative inputs.

The tidied geotagged dataset is saved in .RDS, and .csv for use in a range of applications. To export as a shapefile it is necessary to split the geotagged dataset by geometry type first.

dlTidy <- dataTidy(dlFilter)

dlTidy %>% dataSummary
#> $class
#> [1] "list"             "OSMtidy_dataTidy"
#> 
#> $summary
#> # A tibble: 4 x 3
#>   data                 total percent
#>   <chr>                <int>   <dbl>
#> 1 unfiltered             569    3.33
#> 2 removeKeywordFilters    37    0.22
#> 3 remove                 300    1.76
#> 4 filtered             16164   94.7 
#> 
#> $summaryFiltered
#> # A tibble: 190 x 2
#>    desc                           total
#>    <chr>                          <int>
#>  1 Amenity; ATM                      14
#>  2 Amenity; Bicycle parking          49
#>  3 Amenity; Bike rental point         6
#>  4 Amenity; Car wash                  2
#>  5 Amenity; Fire hydrant             64
#>  6 Amenity; Flood defence             2
#>  7 Amenity; Fountain (decorative)     1
#>  8 Amenity; Fuel station              1
#>  9 Amenity; Garages and sheds       325
#> 10 Amenity; Information board         1
#> # ... with 180 more rows
#> 
#> $unfiltered
#> # A tibble: 5 x 2
#>   feature  total
#>   <chr>    <int>
#> 1 amenity      1
#> 2 building   531
#> 3 highway      6
#> 4 landuse     12
#> 5 <NA>        19

dlTidy$filtered
#> # A tibble: 16,164 x 3
#>    osm_id   desc                                                        geometry
#>    <chr>    <chr>                                                 <GEOMETRY [°]>
#>  1 3195818~ Amenity; Floo~ MULTILINESTRING ((-3.199711 55.9629, -3.199508 55.96~
#>  2 3343407~ Barrier; Wall  LINESTRING (-3.167914 55.96796, -3.16797 55.96798, -~
#>  3 3418179~ Barrier; Wall  LINESTRING (-3.199838 55.96309, -3.199801 55.96312, ~
#>  4 3418179~ Barrier; Wall     LINESTRING (-3.199572 55.9636, -3.199394 55.96358)
#>  5 3418179~ Barrier; Wall      LINESTRING (-3.199402 55.96393, -3.19923 55.9639)
#>  6 3418179~ Barrier; Wall    LINESTRING (-3.199641 55.96347, -3.199465 55.96344)
#>  7 3418179~ Barrier; Wall    LINESTRING (-3.199203 55.96431, -3.199179 55.96431)
#>  8 3418180~ Barrier; Wall    LINESTRING (-3.199269 55.96418, -3.199168 55.96417)
#>  9 3418180~ Barrier; Wall    LINESTRING (-3.199607 55.96354, -3.199422 55.96351)
#> 10 3418180~ Barrier; Wall    LINESTRING (-3.199435 55.96386, -3.199261 55.96384)
#> # ... with 16,154 more rows

dataExport(data = dlTidy, name = locationName)
#> Files saved as:
#> 
#>  outputs/exampleEdinburgh_6_dataTidy-unfiltered_20200710-131322.RDS
#>  outputs/exampleEdinburgh_6_dataTidy-removeKeywordFilters_20200710-131322.RDS
#>  outputs/exampleEdinburgh_6_dataTidy-remove_20200710-131322.RDS
#>  outputs/exampleEdinburgh_6_dataTidy-filtered_20200710-131322.RDS
#>  outputs/exampleEdinburgh_6_dataTidy-filtered_20200710-131322.csv