pkg_utils.RdPutting CWB indexed corpora into R data packages is a convenient way to ship and share corpora, and to keep documentation and supplementary functionality with the data.
pkg_create_cwb_dirs(pkg = ".", verbose = TRUE) pkg_add_corpus(pkg = ".", corpus, registry = Sys.getenv("CORPUS_REGISTRY"), verbose = TRUE) pkg_add_configure_scripts(pkg = ".") pkg_add_description(pkg = ".", package = NULL, version = "0.0.1", date = Sys.Date(), author, maintainer = NULL, description = "", license = "", verbose = TRUE) pkg_add_creativecommons_license(pkg = ".", license = "CC-BY-NC-SA", file = system.file(package = "cwbtools", "txt", "licenses", "CC_BY-NC-SA_3.0.txt")) pkg_add_gitattributes_file(pkg = ".")
| pkg | Path to directory of data package. |
|---|---|
| verbose | Logical, whether to be verbose. |
| corpus | Name of the CWB corpus to insert into the package. |
| registry | Registry directory. |
| package | The package name ( |
| version | The version number of the corpus (defaults to "0.0.1") |
| date | The date of creation, defaults to |
| author | The author of the package, either character vector or object of class |
| maintainer | Maintainer, R package style, either |
| description | description of the data package. |
| license | The license. |
| file | Path to file with fulltext of Creative Commons license. |
pkg_creage_cwb_dirs will create the standard directory
structure for storing registry files and indexed corpora within a package
(./inst/extdata/cwb/registry and
./inst/extdata/cwb/indexed_corpora, respectively).
pkg_add_corpus will add the corpus described in registry directory to
the package defined by pkg.
add_configure_script will add standardized and tested
configure scripts configure for Linux and macOS, and
configure.win for Windows to the top level directory of the data
package, and file setpaths.R to tools subdirectory. The
configuration mechanism ensures that the data directory is specified
correctly in the registry files during the installation of the data
package.
pkg_add_description will add a description file to the package.
pkg_add_creativecommons_license will license information to
the DESCRIPTION file, and move file LICENSE to top level directory of the
package.
pkg_add_gitattributes_file will add a file '.gitattributes'
to the package. The file defines types of files that will be tracked by Git
LFS, i.e. they will not be under conventional version control. This is
suitable for large binary files, which is the scenario applicable for
indexed corpus data.
The use_description function in the usethis-package will also create a DESCRIPTION file.
#>#>#>#>#>#>#>pkg_add_description( pkg = pkgdir, package = "reuters", author = "cwbtools", description = "Reuters data package" )#>pkg_add_corpus( pkg = pkgdir, corpus = "REUTERS", registry = system.file(package = "RcppCWB", "extdata", "cwb", "registry") )#>#>#>#>pkg_add_gitattributes_file(pkg = pkgdir) pkg_add_configure_scripts(pkg = pkgdir) pkg_add_creativecommons_license(pkg = pkgdir)