The goal of arkdb is to provide a convienent way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much to large to read into memory all at once.
Consider the nycflights database in SQLite:
tmp <- tempdir() # Or can be your working directory, "."
db <- dbplyr::nycflights13_sqlite(tmp)
#> Caching nycflights db at /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpJKwt0k/nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weatherCreate an archive of the database:
dir <- fs::dir_create(fs::path(tmp, "nycflights"))
ark(db, dir, lines = 50000)
#> Exporting airlines in 50000 line chunks:
#> ...Done! (in 0.01664805 secs)
#> Exporting airports in 50000 line chunks:
#> ...Done! (in 0.02630186 secs)
#> Exporting flights in 50000 line chunks:
#> ...Done! (in 16.01034 secs)
#> Exporting planes in 50000 line chunks:
#> ...Done! (in 0.05012608 secs)
#> Exporting weather in 50000 line chunks:
#> ...Done! (in 1.065444 secs)Import a list of compressed tabular files (i.e. *.csv.bz2) into a local SQLite database:
files <- fs::dir_ls(dir)
new_db <- src_sqlite(fs::path(tmp, "local.sqlite"), create=TRUE)
unark(files, new_db, lines = 50000)
#> Importing /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/nycflights/airlines.tsv.bz2 in 50000 line chunks:
#> ...Done! (in 0.01576591 secs)
#> Importing /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/nycflights/airports.tsv.bz2 in 50000 line chunks:
#> ...Done! (in 0.04689217 secs)
#> Importing /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/nycflights/flights.tsv.bz2 in 50000 line chunks:
#> ...Done! (in 10.02195 secs)
#> Importing /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/nycflights/planes.tsv.bz2 in 50000 line chunks:
#> ...Done! (in 0.04633403 secs)
#> Importing /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/nycflights/weather.tsv.bz2 in 50000 line chunks:
#> ...Done! (in 0.5779071 secs)
new_db
#> src: sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T/RtmpJKwt0k/local.sqlite]
#> tbls: airlines, airports, flights, planes, weatherPlease note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.