Hi @cderv and @pegeler, thank you both of you very much for your helpful comments and suggestions, sorry for not responding more quickly.

Response to @cderv

List-columns

I find it kind of necessary because you also used list-columns in tibbles. This is not something easy for some newcomers to handles. Even for some dplyr users who do not used that a lot, it is not something easy. Adding some example in the vignettes to show how to retrieve embedded data.frame inside some tibble’s list column could be a good idea. To illustrate:

library(nomisr) library(dplyr, warn.conflicts = F) y <- nomis_data_info(“NM_893_1”) y$annotations.annotation %>% class() #> [1] “list” y$annotations.annotation[[1]] %>% class() #> [1] “data.frame” y %>% pull(annotations.annotation) %>% class() #> [1] “list” y %>% pull(annotations.annotation)%>% .[[1]] %>% class() #> [1] “data.frame” y %>% pull(annotations.annotation) %>% purrr::pluck(1) %>% class() #> [1] “data.frame” y %>% pull(annotations.annotation) %>% purrr::map(1) %>% class() #> [1] “list” y %>% tidyr::unnest(annotations.annotation) %>% glimpse() #> Observations: 14 #> Variables: 11 #> $ agencyid “NOMIS”, “NOMIS”, “NOMIS”… #> $ id “NM_893_1”, “NM_893_1”, “… #> $ uri “Nm-893d1”, “Nm-893d1”, “… #> $ version 1, 1, 1, 1, 1, 1, 1, 1, 1… #> $ components.primarymeasure.conceptref “OBS_VALUE”, “OBS_VALUE”,… #> $ components.timedimension.codelist “CL_893_1_TIME”, “CL_893_… #> $ components.timedimension.conceptref “TIME”, “TIME”, “TIME”, “… #> $ name.value “LC4408EW - Tenure by num… #> $ name.lang “en”, “en”, “en”, “en”, “… #> $ annotationtext “Current (being actively …

#> $ annotationtitle “Status”, “Keywords”, “Un…

Response: I’ve added a similar example to the vignette, which is similar enough to your example that I feel I should cite you as a contributor if that is okay.

Following are my review comments, for improvement:

Improvement for the packaging guidelines

see below parts for details

  • Adding a code of conduct Response: I a code of conduct in https://github.com/evanodell/nomisr/commit/ede35580b8369aba016b8375f0d6569b58063e9c and
  • Improve README considering it is the first entry point for discovering NOMIS API
  • Improve DESCRIPTION for authorship
  • [minor] follow versionning advice with ..*.9000 for in development version Response: Now following versioning best practice in development version
  • [question] I did not find why you need curl package in suggest… Response: For reasons I’m not entirely clear on, the package was failing on Travis without the curl suggestion, I think due to a bug in jsonlite.

Response: For reasons I’m not entirely clear on, the package was failing on Travis without the curl suggestion, I think due to a bug in jsonlite.

README improvement

  • I would be more precise on what the NOMIS database is and for who it is. For example, I found that this submission issue mentioned UK database but not the README. You help file for the package as pretty clear description of what it is for. While reviewing, I had a doubt and went to check on NOMIS. This is why I think the statement of need is not complete.
  • It feels like the README could get more examples on what your package can do. Only one of your functions is shown in the readme.
  • Put package documentation link in the title of the repo like in the reprex repo. By the way, you can add a more descriptive description on your github repo and some tags.
  • could precise that it uses the tibble format (from the tidyverse)

Response: I’ve added more details to the README, including more details on Nomis itself, a simple example and citation information.

Vignettes improvement

  • Vignette was not built when I installed the package with devtools::install_github. Did not know if this intended behavior or not.
  • I tried to build the vignette locally then: it worked.
    • However, there is a problem on result formatting with the tibble. (I think because you use result = ‘asis’ in code chunks) and because every results are not tables (y$annotations.annotation is a list for example.) so it does not print nicely.
      • The vignette do not show any code to get the results - I would show them for someone to learn how to use your function, as examples. (with echo = TRUE)
      • The formatting issue is visible on your website too
  • I think the vignette contains necessary information but does not show enough examples on how to retrieve data in the objects you return. You mentioned the three list columns with nested data.frame but not show how correctly extracts the info.
  • It is pretty clear that you package allows to retrieve a large amount of data from the NOMIS API. it seems necessary to always use nomis_data_info, then nomis_codes then nomis_get_data to be sure to not retrieve to much. The vignette helps you understand that. It could appears to in your readme, and it could be more clear how to construct a pipeline for good practice with this API.

These improvement are the reason why I did not check the vignettes box, even if there is one in the github repo and doc. If you need help on all this, do not hesitate to ask.

Response: As mentioned above, I’ve added your suggestions on exploring list-columns to the vignette, and fixed the online formatting issues. I’ve changed echo to TRUE to show code

About documentation of functions and examples

  • All functions are well documented.
  • I encountered some issues running the examples because of a timeout. I believe this is what you meant in your by “there are some limits to the amount of data that can be retrieved within a certain period of time, although those are not published.”. The maybe some improvement to do on this. (see under)
  • Is this the reason why you enclosed each example in \dontrun{} statement ? am I not sure on what is the recommendation for API package like this one for example ? \dontrun{} does allow anyone to run examples with something like example(“nomis_codes”, package = “nomisr”). Idea : Maybe \donttest{} is better here: does show in help and runs with example, but not run by R CMD check (so not run by CRAN ?) - see rd vignette from roxygen2

Response: I’ve changed \dontrun{} to \donttest{}

I ran all tests with devtools::run_examples(run = FALSE) (in several time, due to timeout constraint). Everything is working. It works well for almost all, but your example assign systematically to a variable (named x or y - room for improvement) but never print to show head of results. I feel that is missing.

Response: I have changed the names in examples to be more descriptive of the data they are actually retrieving, and added a print call so the head is actually dixplayed.

Response: I’ve removed the very long examples from nomis_get_data - I considered wrapping them in \dontrun{} but I think dropping them is the better option.

Response: I keep all the source code for my documentation website in my docs repo. I did this so I could host on netlify with a custom domain and https encryption which - at least when I set it up - was not available through github pages. I’m not sure if that is the best option or not.

Various comments on the code

Response: I’ve turned the API URL into its own variable.

Response: I’ve dropped suppressMessage and used col_types = readr::cols(.default = readr::col_character()) to make columns characters by default.

query <- dplyr::if_else(keywords==FALSE, paste0(“/def.sdmx.json?search=“, search,””), paste0(“/def.sdmx.json?search=keywords-”, search, “*“)

could be rewritten without dplyr::if_else

base_query <- “/def.sdmx.json?search=” if (keyword) { keyword_term <- “keywords-”) } else { keyword_term <- “*" }

About the timeout and size of data

Retrieving additional pages 1 of 208888 Retrieving additional pages 2 of 208888

I don’t know if some other API packages are doing something to help user or if it is all the user responsibility to know what is he doing. (users will surely be already NOMIS data users).

a <- nomis_codes(“NM_1_1”)

Error in open.connection(con, “rb”) :

Timeout was reached: Resolving timed out after 10000 milliseconds

You could use some tryCatch mechanism (some useful info in Advanced R).

Response: I’ve used tryCatch to hopefully provide more useful information in the case of time outs. I’ve also written, but not implemented, a warning and yes/no option in interactive sesssions with requests over 350,000 cells which are likely to cause rate limiting issues.

About SMDX format on NOMIS

Response: I was not aware of it, but it is something I should explore for the future, thank you for pointing me towards it.

Response to pegeler

Package Review for nomisr

Onboarding submission: ropensci/onboarding#190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 10 Review Comments

Excellent work! This is a fine example of the level of quality and professionalism that I have come to expect from rOpenSci contributions. This package is a great addition to the complement of R functionality and provides a facile portal to a useful data source.

Many of the comments that I had have been addressed as a result of @cderv’s prompt and thorough review. In addition, development has continued to be very active over the past few weeks! Issues are fixed as quickly as I find them! Below are a few more comments that I would like to add. Basic Package Structure Overview README.md

The README.md does not contain citation information, per rOpenSci packaging guide. I am not sure how necessary that is since there is a CITATION file in the inst/ directory, however, it should be an easy add.

I would concur with @cderv in suggesting additional usage examples. DESCRIPTION

Response: I’ve added a minimum version of R (>= 3.4.0) to the DESCRIPTION.

Response: Not including curl was causing errors with jsonlite on travis - I’m not sure why, as jsonlite is supposed to import curl itself.

CITATION

The CITATION file uses meta$Date to dynamically fill in date for the citation. However, the Date entry was removed from the DESCRIPTION file per gp suggestion so the date is NULL.

Since it is considered good practice to omit the Date entry because it may be prone to falling out of date, it seems that hard-coding the date in the citation would be silly. I briefly considered the use of a different field to dynamically update the citation date, such as Date/Publication or Packaged. However, those will only work for users who are downloading from a CRAN repos. Therefore I’m a little bit at a loss as to what the most elegant solution might be. I might suggest breaking with good practice here and adding the Date field back in since there is a reason to do so. Suggestions are welcome.

The CITATION file should also end in a newline character :)

Response: I’ve hardcoded 2018 into the CITATION file, which I think is the easiest solution, as its the year the package was first released, although I’m open to other suggestions as well.

Documentation

Response: I’ve expanded the documentation and used roxygen tags to improve organisation. I’ve also added an expanded explanation to the package vignette.

Examples

Response: I’ve added glimpsing to the examples in different functions.

Vignettes

Response: I’ve now pre-built the vignette.

Functionality

Response: I have re-written nomis_search in keeping with your suggestions. It now accepts five parameters - name, description, keywords, content_type, units - and can search in any or all of them at the same time, with or without wildcards. I’ve also added a nomis_content_type() function to assist in looking up specific content types, which may be used with nomis_search().

nomis_codes()
    Generally, it is helpful to mirror the language of the source material when possible.
    The nomis_codes() function accepts a dimension (concept) and the function returns codes for that dimension.
    Therefore, the argument code might be renamed to dimension or concept, since the
    function searches for codes availalbe within a dimension.
    Related to above, I think the @title could be more descriptive.
    
    [suggestion] The query is currently built using a hierarchal structure. E.g., /api/v01/dataset/<ID>/<DIMENSION>/<CODE>/def.sdmx.json. However,
    I think there is an opportunity to make this query a little more flexible and extensible. Instead of enforcing the hierarchy,
    the funciton could use the GET verb to provide a little more control over how the data is filtered. A user could
    query on criteria from different dimensions than the one which they are interested in results from. For example,
    even if the user is interested in the 'item' dimension, they may want to filter on 'geography' and 'sex'. Using GET would allow
    for this. As an example: /api/v01/dataset/NM_1_1/item.def.sdmx.json?geography=2038432081&sex=7. This would also expose the user
    to options for specifying times and dates, which cannot otherwise be done. I fully acknowledge that this might be over-building,
    since this function only provides a support role and will probably not be used that way.
    This is a minor concern, but I feel it's cleaner to specify value if a test function returns TRUE first when using ifelse().
    The code ifelse(is.null(type), "", paste0("/", type)) reads cleaner than ifelse(is.null(type) == FALSE, paste0("/", type), "") IMO.
nomis_data_info()
    The documentation could be more explicit in telling the user that the data returned by this function is metadata.
    

Response: I’ve updated the documentation to be more explicit about what the function actually does, and improved the code readability. I changed the code parameter to concept, as you suggested, to be more in keeping with the syntax used in the original API. I’ve also added a specific search parameter, and a generic additional_queries parameter, as used in nomis_get_data()

Response: I’ve renamed the nomis_codes() function to nomis_get_metadata(), which I agree is more descriptive of what the function actually does. I’ve also added the nomis_overview() function, which returns a more general overview of all available metadata, without the flexibility and specificity offered by nomis_get_metadata().

Response: The 25k limit refers to the number of observation values, the documentation is rather confusing on that point.

Response: I’ve added a select parameter as an enhancement, as it is straightforward to do and reduces the amount of information that could possibly need to be included under additional_queries.

Conclusion

This is a great package that will be valuable to anyone interested in analysis of this data. Nice work!

I appreciate the opportunity to review this package and hope that my comments are constructive and useful. This package needs just a few tweaks (namely getting the examples up and running) and then I will gladly approve!