OSMtidy - Vignette 3, Filters

Dr Annie Visser-Quinn, a.visser-quinn@hw.ac.uk

2020-07-10

1. Introduction

OSMtidy aims to tidy up physical object data extracted from OpenStreetMaps. Steps 1-4 cover the extraction of this data and wrangling into a suitable format. Step 5, data filtering, is central to OSMtidy. Through filters, a multi-column database of physical objects can be converted into a simple two-column geotagged database - describing the name and geometry/ location of the object.

There are approximately 1250 predefined filters in OSMtidy. These can be explored and modified via the spreadsheet filters.xlsx in the OSMtidy directory. The filters are provided in spreadsheet form for ease of use - OSMtidy is primarily intended to be used by non-specialists and non-coders. The following figures provide a brief overview of the spreadsheet layout and navigation.

1.1 filters.xlsx overview and column descriptions.

1.1 filters.xlsx overview and column descriptions.

 

1.2. Navigating filters.xlsx.

1.2. Navigating filters.xlsx.

 

2. Creating your first filter

This vignette uses a subset of data from the Edinburgh examplar.

dlWrangle <- readRDS("example/dataWrangled.RDS")

dlWrangle$dataWrangled
#> [[1]]
#> Simple feature collection with 380 features and 15 fields
#> geometry type:  GEOMETRY
#> dimension:      XY
#> bbox:           xmin: -3.199224 ymin: 55.95748 xmax: -3.163603 ymax: 55.97478
#> CRS:            EPSG:4326
#> First 10 features:
#>       osm_id barrier fence_type feature                     name
#> 1   22415647    <NA>       <NA> leisure      London Road Gardens
#> 2   31455917    <NA>       <NA> leisure           St Mark's Park
#> 3   23165357    <NA>       <NA> leisure              Pilrig Park
#> 4   32781795    <NA>       <NA> leisure              Keddie Park
#> 5   39960556    <NA>       <NA> leisure            Redbraes Park
#> 6   73014417    <NA>       <NA> leisure      Dalmeny Street Park
#> 7  100007410    <NA>       <NA> leisure      Broughton Road Park
#> 8  623814626   fence      metal    <NA>   Montgomery Street Park
#> 9   32781878    <NA>       <NA> leisure Keddie Gardens Play Area
#> 10  85004696    <NA>       <NA> leisure                     <NA>
#>                 alt_name    leisure amenity operator parking building
#> 1  Royal Terrace Gardens       park    <NA>     <NA>    <NA>     <NA>
#> 2                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 3                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 4         Keddie Gardens       park    <NA>     <NA>    <NA>     <NA>
#> 5                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 6                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 7                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 8                   <NA>       park    <NA>     <NA>    <NA>     <NA>
#> 9                   <NA> playground    <NA>     <NA>    <NA>     <NA>
#> 10                  <NA> playground    <NA>     <NA>    <NA>     <NA>
#>    bicycle_parking brand social_facility social_facility.for
#> 1             <NA>  <NA>            <NA>                <NA>
#> 2             <NA>  <NA>            <NA>                <NA>
#> 3             <NA>  <NA>            <NA>                <NA>
#> 4             <NA>  <NA>            <NA>                <NA>
#> 5             <NA>  <NA>            <NA>                <NA>
#> 6             <NA>  <NA>            <NA>                <NA>
#> 7             <NA>  <NA>            <NA>                <NA>
#> 8             <NA>  <NA>            <NA>                <NA>
#> 9             <NA>  <NA>            <NA>                <NA>
#> 10            <NA>  <NA>            <NA>                <NA>
#>                          geometry
#> 1  POLYGON ((-3.173486 55.9575...
#> 2  MULTIPOLYGON (((-3.192518 5...
#> 3  POLYGON ((-3.179487 55.9669...
#> 4  POLYGON ((-3.182531 55.9743...
#> 5  POLYGON ((-3.189146 55.9681...
#> 6  POLYGON ((-3.173181 55.9645...
#> 7  POLYGON ((-3.19336 55.96417...
#> 8  POLYGON ((-3.173887 55.9589...
#> 9  POLYGON ((-3.181027 55.9745...
#> 10 POLYGON ((-3.175766 55.9595...

There are 380 objects in this dataset. In the output above, a header of the dataset is printed. A number of parks are listed, so let’s start by creating a filter for parks.

2.1. First filter, set up to find parks.

2.1. First filter, set up to find parks.

 

Now that the first filter is ready, you can run it in R. Because we set validate to TRUE, we want to export the output to explore the validate spreadsheet.

dlFilter <- dlWrangle %>% dataFilter("example/filters (1).xlsx")
#> 13:04:38 Step one of one
#> 13:04:38 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round1")
#> Files saved as:
#> 
#>  outputs/filterExample_round1_5_dataFilter-unfiltered_20200710-130438.xlsx
#>  outputs/filterExample_round1_5_dataFilter-filtered_20200710-130439.csv
#>  outputs/filterExample_round1_5_dataFilter-filtered_20200710-130439.RDS
#>  outputs/filterExample_round1_5_dataFilter-validate_20200710-130439.xlsx

 

The filterExample_round1_5_dataFilter-validate spreadsheet can be found in the outputs folder. It should look something like the figure below.

2.2. Validate spreadsheet for the first filter.

2.2. Validate spreadsheet for the first filter.

 

Entries 2-9 show that the filter did capture the parks. Parking has also been picked up because the filter was not specific enough. Section 3 explores the iterative process to fix this.

3. Adjusting filters - An iterative process

Open filters (1).xlsx and add a more specific filter for parking (see figure below). Add this filter before the park filter. OSMtidy is iterative and applies filters in the same order as they are in the spreadsheet. Save this new filter as filters (2).xlsx and run the R code again.

 

3.1. Adding a second more specific filter.

3.1. Adding a second more specific filter.

dlFilter <- dlWrangle %>% dataFilter("example/filters (2).xlsx")
#> 13:04:39 Step one of one
#> 13:04:39 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round2")
#> Files saved as:
#> 
#>  outputs/filterExample_round2_5_dataFilter-unfiltered_20200710-130439.xlsx
#>  outputs/filterExample_round2_5_dataFilter-filtered_20200710-130439.csv
#>  outputs/filterExample_round2_5_dataFilter-filtered_20200710-130439.RDS
#>  outputs/filterExample_round2_5_dataFilter-validate_20200710-130439.xlsx

 

The new validate spreadsheet has two tabs. Clicking on the drop-down arrow shows that the parking filter picked up three different objects.

3.2. Validate spreadsheet after adding a second filter.

3.2. Validate spreadsheet after adding a second filter.

 

Update the filters spreadsheet again, and rerun the R code.

3.3. Third iteration of the filters spreadsheet. Two new filters are added.

3.3. Third iteration of the filters spreadsheet. Two new filters are added.

dlFilter <- dlWrangle %>% dataFilter("example/filters (3).xlsx")
#> 13:04:40 Step one of one
#> 13:04:40 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round3")
#> Files saved as:
#> 
#>  outputs/filterExample_round3_5_dataFilter-unfiltered_20200710-130440.xlsx
#>  outputs/filterExample_round3_5_dataFilter-filtered_20200710-130441.csv
#>  outputs/filterExample_round3_5_dataFilter-filtered_20200710-130441.RDS
#>  outputs/filterExample_round3_5_dataFilter-validate_20200710-130441.xlsx
3.4. Validate spreadsheet after the third iteration. There are now three tabs. Use the drop-down menu to determine if the filter identified unique objects.

3.4. Validate spreadsheet after the third iteration. There are now three tabs. Use the drop-down menu to determine if the filter identified unique objects.

 

3.4. The bicycle_parking column reveals additional information that could be filtered for.

3.4. The bicycle_parking column reveals additional information that could be filtered for.

 

3.5. In the second tab, the parking column reveals additional information that could be filtered for.

3.5. In the second tab, the parking column reveals additional information that could be filtered for.

 

These outputs show that the filters have improved, but they aren’t quite specific enough. More detailed bicycle parking filters might be useful, and filters for underground parking are needed. This requires a fourth iteration of filters.

3.6. Fourth iteration of the filters spreadsheet. Three new filters are added and validate is changed for park.

3.6. Fourth iteration of the filters spreadsheet. Three new filters are added and validate is changed for park.

dlFilter <- dlWrangle %>% dataFilter("example/filters (4).xlsx")
#> 13:04:41 Step one of one
#> 13:04:42 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round4")
#> Files saved as:
#> 
#>  outputs/filterExample_round4_5_dataFilter-unfiltered_20200710-130442.xlsx
#>  outputs/filterExample_round4_5_dataFilter-filtered_20200710-130442.csv
#>  outputs/filterExample_round4_5_dataFilter-filtered_20200710-130442.RDS
#>  outputs/filterExample_round4_5_dataFilter-validate_20200710-130442.xlsx

 

3.7. Validate spreadsheet after the fourth iteration, tab 1. The object descriptions match two of the new filters.

3.7. Validate spreadsheet after the fourth iteration, tab 1. The object descriptions match two of the new filters.

 

3.8. Tab 2. With the new bicycle parking filter the object descriptions agree with the filters.

3.8. Tab 2. With the new bicycle parking filter the object descriptions agree with the filters.

 

3.9. Tab 3. The same is true for parking.

3.9. Tab 3. The same is true for parking.

 

Through these four iterations, the park filter was refined to become specific enough to capture the range of similar objects.

4. Identifying new filters

Whether starting a filter spreadsheet from scratch, or using the pre-determined filters in filters.xlsx, some objects will be missed. To identify new filters, open the unfiltered spreadsheet in the outputs folder.

4.1. Two tabs of unfiltered objects.

4.1. Two tabs of unfiltered objects.

 

These objects can be manually updated, by changing the desc column (see Vignette 4), or by identifying new filters. In the figure below, a number of new filters are added in an attempt to capture some of these objects. The filters which have been validation (in section 3) are changed to validate FALSE.

4.2. Fifth iteration of the filters spreadsheet. Four new filters are added and validate is updated.

4.2. Fifth iteration of the filters spreadsheet. Four new filters are added and validate is updated.

dlFilter <- dlWrangle %>% dataFilter("example/filters (5).xlsx")
#> 13:04:43 Step one of one
#> 13:04:43 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round5")
#> Files saved as:
#> 
#>  outputs/filterExample_round5_5_dataFilter-unfiltered_20200710-130443.xlsx
#>  outputs/filterExample_round5_5_dataFilter-filtered_20200710-130443.csv
#>  outputs/filterExample_round5_5_dataFilter-filtered_20200710-130443.RDS
#>  outputs/filterExample_round5_5_dataFilter-validate_20200710-130443.xlsx

 

4.3. Validate spreadsheet after the fifth iteration. Filters in the first tab appear successful.

4.3. Validate spreadsheet after the fifth iteration. Filters in the first tab appear successful.

 

4.4. Filters in the second tab also appear valid.

4.4. Filters in the second tab also appear valid.

 

All four of the new filters are validated. To look at what new filters are required, check the unfiltered spreadsheet again. A final iteration of the filters spreadsheet is explored below.

4.5. Final iteration of the filters spreadsheet. Three new filters are added and validate is updated.

4.5. Final iteration of the filters spreadsheet. Three new filters are added and validate is updated.

dlFilter <- dlWrangle %>% dataFilter("example/filters (6).xlsx")
#> 13:04:44 Step one of one
#> 13:04:44 Complete, preparing output

dlFilter %>% dataExport(., "filterExample_round6")
#> Files saved as:
#> 
#>  outputs/filterExample_round6_5_dataFilter-unfiltered_20200710-130444.xlsx
#>  outputs/filterExample_round6_5_dataFilter-filtered_20200710-130444.csv
#>  outputs/filterExample_round6_5_dataFilter-filtered_20200710-130444.RDS
#>  outputs/filterExample_round6_5_dataFilter-validate_20200710-130444.xlsx

5. Manually updating the validate spreadsheet

In some instances, it may not be possible to identify a specific enough filter. In these instances, you could specify the osm_id as a filter, or, you can manually edit the validate output.

5.1. Validate spreadsheet after the final iteration. The descriptor childcare does not necessarily reflect the object information.

5.1. Validate spreadsheet after the final iteration. The descriptor childcare does not necessarily reflect the object information.

 

5.2. The validate spreadsheet desc column is updated to reflect the additional information available.

5.2. The validate spreadsheet desc column is updated to reflect the additional information available.

 

The validate spreadsheet can be editted for a number of reasons:

These changes can be imported as part of the final step in OSMtidy, using the function dataTidy().

inputList <- list(dlFilter$filtered,
                  "outputs/filterExample_round6_5_dataFilter-validate_20200710-130213.xlsx")

dlTidy <- dataTidy(inputList)
dlTidy %>% dataSummary
#> $class
#> [1] "list"             "OSMtidy_dataTidy"
#> 
#> $summary
#> # A tibble: 2 x 3
#>   data     total percent
#>   <chr>    <int>   <dbl>
#> 1 remove       3     0.8
#> 2 filtered   374    99.2
#> 
#> $summaryFiltered
#> # A tibble: 9 x 2
#>   desc                                            total
#>   <chr>                                           <int>
#> 1 Amenity; Bicycle parking                           49
#> 2 Amenity; Parking                                  282
#> 3 Amenity; Underground parking                        2
#> 4 Childcare                                           6
#> 5 School                                              1
#> 6 Sports and games; Outdoor playgrounds and parks    20
#> 7 Vehicle; Rental                                     3
#> 8 Vehicle; Sharing point                              9
#> 9 Vehicle; Wash                                       2

The output can be saved using dataExport(). The fourth vignette explores how to edit other spreadsheet outputs from OSMtidy.