Unarchive a single tsv file into an existing database

unark_file(filename, db_con, lines = 10000L)

Arguments

filename

a *.tsv.bz2 file to uncompress

db_con

a database src (src_dbi object from dplyr)

lines

number of lines to read in a chunk.

Value

the database connection (src_dbi, invisibly)

Details

unark_file will read in a file in chunks and write them into a database. This is essential for processing large compressed tables which may be too large to read into memory before writing into a database. In general, increasing the lines parameter will result in a faster total transfer but require more free memory for working with these larger chunks.

Examples

## set up example files and database tsv <- tempfile("flights", fileext=".tsv.bz2") sqlite <- tempfile("nycflights", fileext=".sql") readr::write_tsv(nycflights13::flights, tsv) db <- src_sqlite(sqlite, create = TRUE) ## and here we go: db_con <- unark_file(tsv, db)
#> Importing in 10000 line chunks: #> /var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpPQJ76l/flightsfcd158cf3482.tsv.bz2
#> ...Done! (in 7.794183 secs)
## display tables in database: db_con
#> src: sqlite 3.22.0 [/var/folders/y8/0wn724zs10jd79_srhxvy49r0000gn/T//RtmpPQJ76l/nycflightsfcd1b80705b.sql] #> tbls: flightsfcd158cf3482
unlink(tsv) unlink(sql)
#> Error in as.character(x): cannot coerce type 'closure' to vector of type 'character'