OSMtidy aims to tidy up physical object data extracted from OpenStreetMaps. Steps 1-4 cover the extraction of this data and wrangling into a suitable format. Step 5, data filtering, is central to OSMtidy. Through filters, a multi-column database of physical objects can be converted into a simple two-column geotagged database - describing the name and geometry/ location of the object.
There are approximately 1250 predefined filters in OSMtidy. These can be explored and modified via the spreadsheet filters.xlsx in the OSMtidy directory. The filters are provided in spreadsheet form for ease of use - OSMtidy is primarily intended to be used by non-specialists and non-coders. The following figures provide a brief overview of the spreadsheet layout and navigation.
1.1 filters.xlsx overview and column descriptions.
1.2. Navigating filters.xlsx.
This vignette uses a subset of data from the Edinburgh examplar.
dlWrangle <- readRDS("example/dataWrangled.RDS")
dlWrangle$dataWrangled
#> [[1]]
#> Simple feature collection with 380 features and 15 fields
#> geometry type: GEOMETRY
#> dimension: XY
#> bbox: xmin: -3.199224 ymin: 55.95748 xmax: -3.163603 ymax: 55.97478
#> CRS: EPSG:4326
#> First 10 features:
#> osm_id barrier fence_type feature name
#> 1 22415647 <NA> <NA> leisure London Road Gardens
#> 2 31455917 <NA> <NA> leisure St Mark's Park
#> 3 23165357 <NA> <NA> leisure Pilrig Park
#> 4 32781795 <NA> <NA> leisure Keddie Park
#> 5 39960556 <NA> <NA> leisure Redbraes Park
#> 6 73014417 <NA> <NA> leisure Dalmeny Street Park
#> 7 100007410 <NA> <NA> leisure Broughton Road Park
#> 8 623814626 fence metal <NA> Montgomery Street Park
#> 9 32781878 <NA> <NA> leisure Keddie Gardens Play Area
#> 10 85004696 <NA> <NA> leisure <NA>
#> alt_name leisure amenity operator parking building
#> 1 Royal Terrace Gardens park <NA> <NA> <NA> <NA>
#> 2 <NA> park <NA> <NA> <NA> <NA>
#> 3 <NA> park <NA> <NA> <NA> <NA>
#> 4 Keddie Gardens park <NA> <NA> <NA> <NA>
#> 5 <NA> park <NA> <NA> <NA> <NA>
#> 6 <NA> park <NA> <NA> <NA> <NA>
#> 7 <NA> park <NA> <NA> <NA> <NA>
#> 8 <NA> park <NA> <NA> <NA> <NA>
#> 9 <NA> playground <NA> <NA> <NA> <NA>
#> 10 <NA> playground <NA> <NA> <NA> <NA>
#> bicycle_parking brand social_facility social_facility.for
#> 1 <NA> <NA> <NA> <NA>
#> 2 <NA> <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA>
#> 4 <NA> <NA> <NA> <NA>
#> 5 <NA> <NA> <NA> <NA>
#> 6 <NA> <NA> <NA> <NA>
#> 7 <NA> <NA> <NA> <NA>
#> 8 <NA> <NA> <NA> <NA>
#> 9 <NA> <NA> <NA> <NA>
#> 10 <NA> <NA> <NA> <NA>
#> geometry
#> 1 POLYGON ((-3.173486 55.9575...
#> 2 MULTIPOLYGON (((-3.192518 5...
#> 3 POLYGON ((-3.179487 55.9669...
#> 4 POLYGON ((-3.182531 55.9743...
#> 5 POLYGON ((-3.189146 55.9681...
#> 6 POLYGON ((-3.173181 55.9645...
#> 7 POLYGON ((-3.19336 55.96417...
#> 8 POLYGON ((-3.173887 55.9589...
#> 9 POLYGON ((-3.181027 55.9745...
#> 10 POLYGON ((-3.175766 55.9595...
There are 380 objects in this dataset. In the output above, a header of the dataset is printed. A number of parks are listed, so let’s start by creating a filter for parks.
2.1. First filter, set up to find parks.
Now that the first filter is ready, you can run it in R. Because we set validate to TRUE, we want to export the output to explore the validate spreadsheet.
dlFilter <- dlWrangle %>% dataFilter("example/filters (1).xlsx")
#> 13:04:38 Step one of one
#> 13:04:38 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round1")
#> Files saved as:
#>
#> outputs/filterExample_round1_5_dataFilter-unfiltered_20200710-130438.xlsx
#> outputs/filterExample_round1_5_dataFilter-filtered_20200710-130439.csv
#> outputs/filterExample_round1_5_dataFilter-filtered_20200710-130439.RDS
#> outputs/filterExample_round1_5_dataFilter-validate_20200710-130439.xlsx
The filterExample_round1_5_dataFilter-validate spreadsheet can be found in the outputs folder. It should look something like the figure below.
2.2. Validate spreadsheet for the first filter.
Entries 2-9 show that the filter did capture the parks. Parking has also been picked up because the filter was not specific enough. Section 3 explores the iterative process to fix this.
Open filters (1).xlsx and add a more specific filter for parking (see figure below). Add this filter before the park filter. OSMtidy is iterative and applies filters in the same order as they are in the spreadsheet. Save this new filter as filters (2).xlsx and run the R code again.
3.1. Adding a second more specific filter.
dlFilter <- dlWrangle %>% dataFilter("example/filters (2).xlsx")
#> 13:04:39 Step one of one
#> 13:04:39 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round2")
#> Files saved as:
#>
#> outputs/filterExample_round2_5_dataFilter-unfiltered_20200710-130439.xlsx
#> outputs/filterExample_round2_5_dataFilter-filtered_20200710-130439.csv
#> outputs/filterExample_round2_5_dataFilter-filtered_20200710-130439.RDS
#> outputs/filterExample_round2_5_dataFilter-validate_20200710-130439.xlsx
The new validate spreadsheet has two tabs. Clicking on the drop-down arrow shows that the parking filter picked up three different objects.
3.2. Validate spreadsheet after adding a second filter.
Update the filters spreadsheet again, and rerun the R code.
3.3. Third iteration of the filters spreadsheet. Two new filters are added.
dlFilter <- dlWrangle %>% dataFilter("example/filters (3).xlsx")
#> 13:04:40 Step one of one
#> 13:04:40 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round3")
#> Files saved as:
#>
#> outputs/filterExample_round3_5_dataFilter-unfiltered_20200710-130440.xlsx
#> outputs/filterExample_round3_5_dataFilter-filtered_20200710-130441.csv
#> outputs/filterExample_round3_5_dataFilter-filtered_20200710-130441.RDS
#> outputs/filterExample_round3_5_dataFilter-validate_20200710-130441.xlsx
3.4. Validate spreadsheet after the third iteration. There are now three tabs. Use the drop-down menu to determine if the filter identified unique objects.
3.4. The bicycle_parking column reveals additional information that could be filtered for.
3.5. In the second tab, the parking column reveals additional information that could be filtered for.
These outputs show that the filters have improved, but they aren’t quite specific enough. More detailed bicycle parking filters might be useful, and filters for underground parking are needed. This requires a fourth iteration of filters.
3.6. Fourth iteration of the filters spreadsheet. Three new filters are added and validate is changed for park.
dlFilter <- dlWrangle %>% dataFilter("example/filters (4).xlsx")
#> 13:04:41 Step one of one
#> 13:04:42 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round4")
#> Files saved as:
#>
#> outputs/filterExample_round4_5_dataFilter-unfiltered_20200710-130442.xlsx
#> outputs/filterExample_round4_5_dataFilter-filtered_20200710-130442.csv
#> outputs/filterExample_round4_5_dataFilter-filtered_20200710-130442.RDS
#> outputs/filterExample_round4_5_dataFilter-validate_20200710-130442.xlsx
3.7. Validate spreadsheet after the fourth iteration, tab 1. The object descriptions match two of the new filters.
3.8. Tab 2. With the new bicycle parking filter the object descriptions agree with the filters.
3.9. Tab 3. The same is true for parking.
Through these four iterations, the park filter was refined to become specific enough to capture the range of similar objects.
Whether starting a filter spreadsheet from scratch, or using the pre-determined filters in filters.xlsx, some objects will be missed. To identify new filters, open the unfiltered spreadsheet in the outputs folder.
4.1. Two tabs of unfiltered objects.
These objects can be manually updated, by changing the desc column (see Vignette 4), or by identifying new filters. In the figure below, a number of new filters are added in an attempt to capture some of these objects. The filters which have been validation (in section 3) are changed to validate FALSE.
4.2. Fifth iteration of the filters spreadsheet. Four new filters are added and validate is updated.
dlFilter <- dlWrangle %>% dataFilter("example/filters (5).xlsx")
#> 13:04:43 Step one of one
#> 13:04:43 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round5")
#> Files saved as:
#>
#> outputs/filterExample_round5_5_dataFilter-unfiltered_20200710-130443.xlsx
#> outputs/filterExample_round5_5_dataFilter-filtered_20200710-130443.csv
#> outputs/filterExample_round5_5_dataFilter-filtered_20200710-130443.RDS
#> outputs/filterExample_round5_5_dataFilter-validate_20200710-130443.xlsx
4.3. Validate spreadsheet after the fifth iteration. Filters in the first tab appear successful.
4.4. Filters in the second tab also appear valid.
All four of the new filters are validated. To look at what new filters are required, check the unfiltered spreadsheet again. A final iteration of the filters spreadsheet is explored below.
4.5. Final iteration of the filters spreadsheet. Three new filters are added and validate is updated.
dlFilter <- dlWrangle %>% dataFilter("example/filters (6).xlsx")
#> 13:04:44 Step one of one
#> 13:04:44 Complete, preparing output
dlFilter %>% dataExport(., "filterExample_round6")
#> Files saved as:
#>
#> outputs/filterExample_round6_5_dataFilter-unfiltered_20200710-130444.xlsx
#> outputs/filterExample_round6_5_dataFilter-filtered_20200710-130444.csv
#> outputs/filterExample_round6_5_dataFilter-filtered_20200710-130444.RDS
#> outputs/filterExample_round6_5_dataFilter-validate_20200710-130444.xlsx
In some instances, it may not be possible to identify a specific enough filter. In these instances, you could specify the osm_id as a filter, or, you can manually edit the validate output.
5.1. Validate spreadsheet after the final iteration. The descriptor childcare does not necessarily reflect the object information.
5.2. The validate spreadsheet desc column is updated to reflect the additional information available.
The validate spreadsheet can be editted for a number of reasons:
These changes can be imported as part of the final step in OSMtidy, using the function dataTidy()
.
inputList <- list(dlFilter$filtered,
"outputs/filterExample_round6_5_dataFilter-validate_20200710-130213.xlsx")
dlTidy <- dataTidy(inputList)
dlTidy %>% dataSummary
#> $class
#> [1] "list" "OSMtidy_dataTidy"
#>
#> $summary
#> # A tibble: 2 x 3
#> data total percent
#> <chr> <int> <dbl>
#> 1 remove 3 0.8
#> 2 filtered 374 99.2
#>
#> $summaryFiltered
#> # A tibble: 9 x 2
#> desc total
#> <chr> <int>
#> 1 Amenity; Bicycle parking 49
#> 2 Amenity; Parking 282
#> 3 Amenity; Underground parking 2
#> 4 Childcare 6
#> 5 School 1
#> 6 Sports and games; Outdoor playgrounds and parks 20
#> 7 Vehicle; Rental 3
#> 8 Vehicle; Sharing point 9
#> 9 Vehicle; Wash 2
The output can be saved using dataExport()
. The fourth vignette explores how to edit other spreadsheet outputs from OSMtidy.