OSMtidy - Vignette 4, Editting outputs

Dr Annie Visser-Quinn, a.visser-quinn@hw.ac.uk

2020-07-10

Data wrangling (step 4) and filtering (step 5) both provide a number of outputs which can be editted. This is entirely optional. For ease of use, these outputs are .xlsx spreadsheets. This vignette provides a step-by-step guide on how to go about editting these spreadsheet outputs. The final section shows how these editted spreadsheets may be combined into a final output using the function dataTidy().

Everything in this vignette is completely optional - you may edit all of the spreadsheets, none of them, or some and not others.

1. Data wrangle output - no detail

A number of OpenStreetMap physical objects have no detail associated with them. In step 4, data wrangling, these objects are gathered together and form a separate output.

dlWrangle %>% names
#> [1] "dataWrangled" "noDetail"

dlWrangle$noDetail
#> Simple feature collection with 67846 features and 2 fields
#> geometry type:  GEOMETRY
#> dimension:      XY
#> bbox:           xmin: -3.199864 ymin: 55.95727 xmax: -3.163322 ymax: 55.97505
#> CRS:            EPSG:4326
#> First 10 features:
#>       osm_id  feature                       geometry
#> 1  660544809  landuse LINESTRING (-3.192594 55.96...
#> 2   73012903 building POLYGON ((-3.171117 55.9574...
#> 3   90126476 building POLYGON ((-3.164129 55.9634...
#> 4   95632391 building POLYGON ((-3.171437 55.9574...
#> 5   95632392 building POLYGON ((-3.171962 55.9574...
#> 6  102306810 building POLYGON ((-3.186394 55.9604...
#> 7  122225343 building POLYGON ((-3.170505 55.9573...
#> 8  128181094 building POLYGON ((-3.17094 55.95741...
#> 9  155809285 building POLYGON ((-3.195319 55.9617...
#> 10 168769267 building POLYGON ((-3.173747 55.9691...

Using the function dataExport(), noDetail is saved as an .xlsx spreadsheet in the outputs folder with the suffix 4_dataWrangle-noDetail.xlsx. In the spreadsheet, each tab contains all the objects associated with a specific OpenStreetMaps feature. There are 5 columns in each tab:

For some features, it is clear what type of physical object they are and a descriptor can be added. Update the desc column for these objects. For other features, there is a range of viable options and no such distinction may be made. For these, either leave the desc column blank or enter remove into the cells. The following figures go through, step-by-step, editting the exampleEdinburgh noDetail spreadsheet.

1.1 Setup of the noDetail spreadsheet.

1.1 Setup of the noDetail spreadsheet.

 

1.2. First feature, amenity.

1.2. First feature, amenity.

 

1.3. Using descTerms.xlsx to help in renaming the second feature, barrier.

1.3. Using descTerms.xlsx to help in renaming the second feature, barrier.

 

1.4. Updating desc for the second feature, barrier.

1.4. Updating desc for the second feature, barrier.

 

5. Repeat this process for all features (tabs) and save.

5. Repeat this process for all features (tabs) and save.

2. Data filter outputs

Step 5 of OSMtidy, data filtering, can output up to three different spreadsheets:

2.1 Unfiltered

The unfiltered spreadsheet is a collection of all the physical objects which were not filtered in step 5, data filtering, using the filters (filters.xlsx). The spreadsheet is laid out in the same way as the data wrangling noDetail output. The only difference is that there are additional columns providing detail on the physical object. The editting process is the same, change the desc column. The following figures go through, step-by-step, editting the exampleEdinburgh unfiltered spreadsheet.

2.1.1. The unfiltered spreadsheet for exampleEdinburgh.

2.1.1. The unfiltered spreadsheet for exampleEdinburgh.

 

2.1.2. Interpreting the spreadsheet information to update desc.

2.1.2. Interpreting the spreadsheet information to update desc.

 

2.1.3. Using filters.xlsx to inform the desc column.

2.1.3. Using filters.xlsx to inform the desc column.

 

2.1.4. Using an internet search to inform the desc column.

2.1.4. Using an internet search to inform the desc column.

 

2.1.5. Updating desc based on filters.xlsx and an internet search.

2.1.5. Updating desc based on filters.xlsx and an internet search.

2.2 Validate

The validate spreadsheet is contains all the filtered outputs which were required to be validated - specified in the filters.xlsx column validate. The spreadsheet is laid out in the same way as the unfiltered output. Here, the only difference is that the desc column is filled in. The objective here is to confirm (validate) the desc input. In some instances, the entry may say Keyword filter - this means the filter used a keyword to identify objects. The desc must be change if these outputs are to be included in the final output from OSMtidy. As before, objects to be removed can be specified with remove. The following figures go through, step-by-step, editting the exampleEdinburgh validate spreadsheet.

2.2.1. The validate spreadsheet for exampleEdinburgh.

2.2.1. The validate spreadsheet for exampleEdinburgh.

 

2.2.2. Updating desc for a Keyword filter.

2.2.2. Updating desc for a Keyword filter.

 

2.2.3. Validating by eye and checking Keyword filter.

2.2.3. Validating by eye and checking Keyword filter.

 

2.2.4. Keyword filter updated, informed by filters.xlsx.

2.2.4. Keyword filter updated, informed by filters.xlsx.

2.3 Filtered

The filtered spreadsheet functions differently to the previous three. These are the objects which have been successfully filtered and did not require validation. The spreadsheet contains only one tab and three columns:

Once an object has been succesfully filtered, the additional OpenStreetMap identifying information is removed as it is no longer needed. However, the desc column in this spreadsheet may still be updated if desired.

3. Read data in using dataTidy()

Vignette 2 introduced the function dataTidy() - it gathers all the data outputs together to form a single summary output. It can be used without changing any of the spreadsheets, changing them all, or any combination.

inputList <- list(dlFilter$filtered,
                  "example/exampleEdinburgh_4_dataWrangle-noDetail-editted.xlsx",
                  "example/exampleEdinburgh_5_dataFilter-unfiltered-editted.xlsx",
                  "example/exampleEdinburgh_5_dataFilter-validate-editted.xlsx")

dlTidy <- dataTidy(inputList)
dlTidy %>% dataSummary
#> $class
#> [1] "list"             "OSMtidy_dataTidy"
#> 
#> $summary
#> # A tibble: 2 x 3
#>   data     total percent
#>   <chr>    <int>   <dbl>
#> 1 remove   59207    69.8
#> 2 filtered 25683    30.2
#> 
#> $summaryFiltered
#> # A tibble: 200 x 2
#>    desc                           total
#>    <chr>                          <int>
#>  1 Amenity; ATM                      14
#>  2 Amenity; Bicycle parking          49
#>  3 Amenity; Bike rental point         6
#>  4 Amenity; Car wash                  2
#>  5 Amenity; Fire hydrant             64
#>  6 Amenity; Flood defence             2
#>  7 Amenity; Fountain (decorative)     1
#>  8 Amenity; Fuel station              1
#>  9 Amenity; Garages and sheds       325
#> 10 Amenity; Information board         1
#> # ... with 190 more rows

As with each output from OSMtidy, this final output can be exported using the function dataExport().

dlTidy %>% names
#> [1] "remove"   "filtered"

dlTidy %>% dataExport
#> Files saved as:
#> 
#>  outputs/_6_dataTidy-remove_20200710-100534.RDS
#>  outputs/_6_dataTidy-filtered_20200710-100534.RDS
#>  outputs/_6_dataTidy-filtered_20200710-100534.csv