R packages that serve datasets come in several flavors. This post explores several aspects of these packages and their included datasets in order to:
Survey existing R datasets
Summarize best practices
Our primary objective is to increase dataset discoverability especially to increase the diversity of datasets used in R tutorials.
The number of packages on CRAN that include data
Does the DESCRIPTION contain LazyData: true?
How many authors are on packages that include data?
How big is each dataset?
What topic are the datasets related to?
How are datasets licensed?
We did not look at datasets provided as raw files or on Bioconductor.