tidyverse
, lubridate
, osmdata
, sf
, mapedit
, pbapply
, progress
, data.table
, readxl
and openxlsx
To use OSMtidy, you should save your R script within the OSMtidy directory. To begin the set-up, set out your script as follows:
# Prepare the environment
rm(list = ls()); cat("/014"); gc()
# Set working directory to the script's folder
setwd(dirname(rstudioapi::getSourceEditorContext()$path)); getwd()
# Load the required packages
library(tidyverse)
library(lubridate)
library(osmdata)
library(sf)
library(pbapply)
library(progress)
library(data.table)
library(readxl)
library(openxlsx)
Now you are ready to import the functions that make up OSMtidy V0.0.4. This walkthrough will use nine of the ten functions that make up OSMtidy.
Using the function dataShapefile()
we can import the shapefile for which data is to be extracted. There are two input arguments:
locationName <- "exampleEdinburgh"
locationName
#> [1] "exampleEdinburgh"
shp <- dataShapefile(name = locationName)
shp
#> Simple feature collection with 1 feature and 0 fields
#> geometry type: POLYGON
#> dimension: XY
#> bbox: xmin: -3.199973 ymin: 55.95726 xmax: -3.162921 ymax: 55.97507
#> CRS: 4326
#> geometry
#> 1 POLYGON ((-3.162921 55.9633...
At each step, you can print a summary of the OSMtidy outputs using the function dataSummary()
.
shp %>% dataSummary
#> $class
#> [1] "sf" "data.frame" "OSMtidy_dataInput"
#>
#> $shapeProjection
#> Coordinate Reference System:
#> User input: 4326
#> wkt:
#> GEOGCS["GCS_WGS_1984",
#> DATUM["WGS_1984",
#> SPHEROID["WGS_84",6378137,298.257223563]],
#> PRIMEM["Greenwich",0],
#> UNIT["Degree",0.017453292519943295],
#> AUTHORITY["EPSG","4326"]]
#>
#> $shapeArea
#> 2678291 [m^2]
#>
#> $shapePerimeter
#> 9150.657 [m]
#>
#> $shapePlot
You can also export any OSMtidy output using the function dataExport()
. All outputs are time stamped and saved in the outputs folder within the OSMtidy directory.
The OpenStreetMap data is extracted, via the R package osmdata and the overpass server, using the function dataExtract()
. Timestamps and progress are printed when the function is running.
OSMtidy V0.0.4 extracts 47 of 209 available features in osmdata. To change the features extracted, add or delete lines in functions/features.txt
. A list of the available features in osmdata can be accessed via the function osmdata::available_features
.
The function dataExtract
has three input arguments:
timeout It may be necessary to increase this value for large queries, because the server may time out before all data are delivered. memsize The default memory size for the ‘overpass’ server in bytes; may need to be increased in order to handle large queries.> See https://wiki.openstreetmap.org/wiki/Overpass_API#Resource_management_options_.28osm-script.29 for explanation of timeout and memsize (or maxsize in overpass terms). Note in particular the comment that queries with arbitrarily large memsize are likely to be rejected.
The code chunk below is not run. This is because (1) the data extraction may take some time depending on the size of the area; and (2), to avoid flooding the overpass server.
The output can be interrogated and exported, using the functions dataSummary
and dataExport
respectively, as before.
dlExtract %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataExtract"
#>
#> $byGeometry
#> type total
#> 1 lines 7779
#> 2 multilines 1
#> 3 multipolygons 27
#> 4 points 130481
#> 5 polygons 12320
#>
#> $byFeature
#> feature total
#> 1 amenity 6843
#> 2 barrier 21130
#> 3 bridge 99
#> 4 building 56467
#> 5 craft 45
#> 6 cycleway 164
#> 7 dispensing 4
#> 8 emergency 137
#> 9 generator:method 64
#> 10 generator:source 245
#> 11 healthcare 65
#> 12 healthcare:speciality 1
#> 13 highway 14028
#> 14 historic 75
#> 15 landuse 10819
#> 16 leisure 26030
#> 17 military 1
#> 18 natural 9605
#> 19 office 195
#> 20 power 342
#> 21 public_transport 78
#> 22 railway 459
#> 23 recycling_type 36
#> 24 residential 65
#> 25 service 1670
#> 26 shop 729
#> 27 social_facility 202
#> 28 substation 121
#> 29 traffic_calming 58
#> 30 usage 54
#> 31 vending 2
#> 32 voltage 78
#> 33 water 696
#> 34 wholesale 1
dataExport(data = dlExtract, name = locationName)
#> File saved as: outputs/exampleEdinburgh_2_dataExtract_20200710-131239.RDS
In step 2, the data was extracted as a “bounding box” (a rectangle). In step 3, the data is cut to the shapefile using the function dataCut()
. Timestamps and progress are printed when the function is running. The function dataCut
has two input arguments:
Outputs can be interrogated and exported as before.
dlCut <- dataCut(dataExtracted = dlExtract, dataShapefile = shp)
#> 13:12:41 Step one of four
#> 13:12:41 Step two of four
#> 13:12:47 Step three of four
#> 13:12:51 Step four of four
#> 13:12:51 Complete, preparing output
dlCut %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataCut"
#>
#> $byGeometry
#> type total
#> 1 linestring 4457
#> 2 multilinestring 5
#> 3 multipolygon 32
#> 4 point 73727
#> 5 polygon 7460
#>
#> $byFeature
#> feature total
#> 1 amenity 3789
#> 2 barrier 12000
#> 3 bridge 67
#> 4 building 32814
#> 5 craft 29
#> 6 cycleway 110
#> 7 dispensing 2
#> 8 emergency 66
#> 9 generator:method 64
#> 10 generator:source 215
#> 11 healthcare 17
#> 12 healthcare:speciality 1
#> 13 highway 6878
#> 14 historic 44
#> 15 landuse 6591
#> 16 leisure 14851
#> 17 military 1
#> 18 natural 5301
#> 19 office 109
#> 20 power 222
#> 21 public_transport 37
#> 22 railway 200
#> 23 recycling_type 23
#> 24 residential 55
#> 25 service 960
#> 26 shop 428
#> 27 social_facility 184
#> 28 substation 80
#> 29 traffic_calming 47
#> 30 usage 38
#> 31 vending 1
#> 32 voltage 51
#> 33 water 405
#> 34 wholesale 1
dataExport(data = dlCut, name = locationName)
#> File saved as: outputs/exampleEdinburgh_3_dataCut_20200710-131252.RDS
Using the function dataWrangle
we can tidy up (or wrangle) the data before filtering. Timestamps and progress are printed when the function is running. There is one input argument:
dlWrangle <- dataWrangle(dataCut = dlCut)
#> 13:12:53 Step one of three
#> 13:12:53 Step two of three
#> 13:12:55 Step three of three
#> 13:12:59 Complete, preparing output
dlWrangle %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataWrangle"
#>
#> $byGeometry
#> data type total percent
#> 1 dataWrangled linestring 4132 4.87
#> 2 dataWrangled multilinestring 12 0.01
#> 3 dataWrangled multipolygon 28 0.03
#> 4 dataWrangled point 5881 6.93
#> 5 dataWrangled polygon 7017 8.26
#> 6 noDetail linestring 13 0.02
#> 7 noDetail point 67505 79.50
#> 8 noDetail polygon 328 0.39
#>
#> $byFeature
#> data feature total percent
#> 1 dataWrangled amenity 640 0.75
#> 2 dataWrangled barrier 2873 3.38
#> 3 dataWrangled building 4240 4.99
#> 4 dataWrangled craft 21 0.02
#> 5 dataWrangled cycleway 1 0.00
#> 6 dataWrangled emergency 65 0.08
#> 7 dataWrangled generator:source 15 0.02
#> 8 dataWrangled healthcare 2 0.00
#> 9 dataWrangled highway 1161 1.37
#> 10 dataWrangled historic 6 0.01
#> 11 dataWrangled landuse 383 0.45
#> 12 dataWrangled leisure 2348 2.77
#> 13 dataWrangled military 1 0.00
#> 14 dataWrangled natural 4163 4.90
#> 15 dataWrangled office 81 0.10
#> 16 dataWrangled power 11 0.01
#> 17 dataWrangled railway 12 0.01
#> 18 dataWrangled service 2 0.00
#> 19 dataWrangled shop 365 0.43
#> 20 dataWrangled social_facility 1 0.00
#> 21 dataWrangled usage 7 0.01
#> 22 dataWrangled water 1 0.00
#> 23 dataWrangled <NA> 671 0.79
#> 24 noDetail amenity 3053 3.60
#> 25 noDetail barrier 9030 10.63
#> 26 noDetail bridge 40 0.05
#> 27 noDetail building 28484 33.54
#> 28 noDetail cycleway 71 0.08
#> 29 noDetail generator:method 52 0.06
#> 30 noDetail generator:source 188 0.22
#> 31 noDetail highway 5251 6.18
#> 32 noDetail historic 31 0.04
#> 33 noDetail landuse 6182 7.28
#> 34 noDetail leisure 12464 14.68
#> 35 noDetail natural 1121 1.32
#> 36 noDetail office 24 0.03
#> 37 noDetail power 176 0.21
#> 38 noDetail railway 168 0.20
#> 39 noDetail residential 51 0.06
#> 40 noDetail service 686 0.81
#> 41 noDetail shop 48 0.06
#> 42 noDetail social_facility 172 0.20
#> 43 noDetail substation 59 0.07
#> 44 noDetail traffic_calming 23 0.03
#> 45 noDetail usage 31 0.04
#> 46 noDetail voltage 48 0.06
#> 47 noDetail water 393 0.46
dataExport(data = dlWrangle, name = locationName)
#> Files saved as:
#>
#> outputs/exampleEdinburgh_4_dataWrangle_20200710-131300.RDS
#> outputs/exampleEdinburgh_4_dataWrangle-noDetail_20200710-131300.xlsx
The main function of OSMtidy is dataFilter()
. Here, the data is filtered based on rules set out in the excel file filters.xlsx; this can be found in the main OSMtidy directory. You may adjust these rules by editting the spreadsheet. See Vignette 3 for further details. Timestamps and progress are printed when the function is running. There are three input arguments to dataFilter()
:
filterOverview()
for an overview of the filtered objectsDepending on the location size, number of filters and computer performance, filters can take anything from a couple of minutes (the example ward) to multiple hours to run (City of London and Boroughs). The code chunk below is not run.
The output may be interrogated and exported as before.
dlFilter %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataFilter"
#>
#> $summary
#> data total percent
#> 1 filtered 12194 71.44
#> 2 unfiltered 562 3.29
#> 3 validate 4314 25.27
#>
#> $summaryFiltered
#> # A tibble: 176 x 2
#> desc total
#> <chr> <int>
#> 1 Amenity; ATM 14
#> 2 Amenity; Bicycle parking 49
#> 3 Amenity; Bike rental point 6
#> 4 Amenity; Car wash 2
#> 5 Amenity; Fire hydrant 64
#> 6 Amenity; Flood defence 2
#> 7 Amenity; Fountain (decorative) 1
#> 8 Amenity; Fuel station 1
#> 9 Amenity; Garages and sheds 325
#> 10 Amenity; Information board 1
#> # ... with 166 more rows
#>
#> $byFeature
#> data feature total
#> 1 unfiltered building 531
#> 2 unfiltered landuse 12
#> 3 unfiltered <NA> 19
#> 4 validate amenity 4
#> 5 validate building 28
#> 6 validate craft 3
#> 7 validate highway 6
#> 8 validate historic 1
#> 9 validate leisure 6
#> 10 validate military 1
#> 11 validate office 7
#> 12 validate railway 12
#> 13 validate shop 3
#> 14 validate social_facility 1
#> 15 validate <NA> 4242
dataExport(data = dlFilter, name = locationName)
#> Loading required package: xlsx
#>
#> Attaching package: 'xlsx'
#> The following objects are masked from 'package:openxlsx':
#>
#> createWorkbook, loadWorkbook, read.xlsx, saveWorkbook, write.xlsx
#> Files saved as:
#>
#> outputs/exampleEdinburgh_5_dataFilter-unfiltered_20200710-131316.xlsx
#> outputs/exampleEdinburgh_5_dataFilter-filtered_20200710-131317.csv
#> outputs/exampleEdinburgh_5_dataFilter-filtered_20200710-131317.RDS
#> outputs/exampleEdinburgh_5_dataFilter-validate_20200710-131318.xlsx
The final step. The function dataTidy()
generates a single tidied output based on any combination of the filtered, validated, unfiltered and no detail data.
Note that multiple outputs from dataWrangle()
and dataFilter()
were spreadsheets (.xlsx extension). You may manually adjust the desc column in these and reimport them in this step.
The input argument is a list of the objects to be imported. They can either be imported locally, as objects from the R environment, or from the manually adjusted spreadsheets. The code chunk below focusses on the outputs of dataFilter()
only. Vignettes 3 and 4 introduce a number of alternative inputs.
The tidied geotagged dataset is saved in .RDS, and .csv for use in a range of applications. To export as a shapefile it is necessary to split the geotagged dataset by geometry type first.
dlTidy <- dataTidy(dlFilter)
dlTidy %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataTidy"
#>
#> $summary
#> # A tibble: 4 x 3
#> data total percent
#> <chr> <int> <dbl>
#> 1 unfiltered 569 3.33
#> 2 removeKeywordFilters 37 0.22
#> 3 remove 300 1.76
#> 4 filtered 16164 94.7
#>
#> $summaryFiltered
#> # A tibble: 190 x 2
#> desc total
#> <chr> <int>
#> 1 Amenity; ATM 14
#> 2 Amenity; Bicycle parking 49
#> 3 Amenity; Bike rental point 6
#> 4 Amenity; Car wash 2
#> 5 Amenity; Fire hydrant 64
#> 6 Amenity; Flood defence 2
#> 7 Amenity; Fountain (decorative) 1
#> 8 Amenity; Fuel station 1
#> 9 Amenity; Garages and sheds 325
#> 10 Amenity; Information board 1
#> # ... with 180 more rows
#>
#> $unfiltered
#> # A tibble: 5 x 2
#> feature total
#> <chr> <int>
#> 1 amenity 1
#> 2 building 531
#> 3 highway 6
#> 4 landuse 12
#> 5 <NA> 19
dlTidy$filtered
#> # A tibble: 16,164 x 3
#> osm_id desc geometry
#> <chr> <chr> <GEOMETRY [°]>
#> 1 3195818~ Amenity; Floo~ MULTILINESTRING ((-3.199711 55.9629, -3.199508 55.96~
#> 2 3343407~ Barrier; Wall LINESTRING (-3.167914 55.96796, -3.16797 55.96798, -~
#> 3 3418179~ Barrier; Wall LINESTRING (-3.199838 55.96309, -3.199801 55.96312, ~
#> 4 3418179~ Barrier; Wall LINESTRING (-3.199572 55.9636, -3.199394 55.96358)
#> 5 3418179~ Barrier; Wall LINESTRING (-3.199402 55.96393, -3.19923 55.9639)
#> 6 3418179~ Barrier; Wall LINESTRING (-3.199641 55.96347, -3.199465 55.96344)
#> 7 3418179~ Barrier; Wall LINESTRING (-3.199203 55.96431, -3.199179 55.96431)
#> 8 3418180~ Barrier; Wall LINESTRING (-3.199269 55.96418, -3.199168 55.96417)
#> 9 3418180~ Barrier; Wall LINESTRING (-3.199607 55.96354, -3.199422 55.96351)
#> 10 3418180~ Barrier; Wall LINESTRING (-3.199435 55.96386, -3.199261 55.96384)
#> # ... with 16,154 more rows
dataExport(data = dlTidy, name = locationName)
#> Files saved as:
#>
#> outputs/exampleEdinburgh_6_dataTidy-unfiltered_20200710-131322.RDS
#> outputs/exampleEdinburgh_6_dataTidy-removeKeywordFilters_20200710-131322.RDS
#> outputs/exampleEdinburgh_6_dataTidy-remove_20200710-131322.RDS
#> outputs/exampleEdinburgh_6_dataTidy-filtered_20200710-131322.RDS
#> outputs/exampleEdinburgh_6_dataTidy-filtered_20200710-131322.csv