This post has referred to a vignette from dplyr, you can find it in https://dplyr.tidyverse.org/articles/two-table.html. We’ll try to display how to join data tables in this vignette. First, load the packages we need and get some data.

library(tidyfst)
#> 
#> Life's short, use R.
library(nycflights13)

flights2 <- flights %>% 
  select_dt(year,month,day, hour, origin, dest, tailnum, carrier)

Do a left join with a simple:

Controlling how the tables are matched

Join works the same as dplyr:

flights2 %>% left_join_dt(weather)
#> Joining by: year,month,day,hour,origin
#>          year month   day  hour origin   dest tailnum carrier  temp  dewp humid
#>         <int> <int> <int> <int> <char> <char>  <char>  <char> <num> <num> <num>
#>      1:  2013     1     1     5    EWR    IAH  N14228      UA 39.02 28.04 64.43
#>      2:  2013     1     1     5    EWR    ORD  N39463      UA 39.02 28.04 64.43
#>      3:  2013     1     1     5    JFK    MIA  N619AA      AA 39.02 26.96 61.63
#>      4:  2013     1     1     5    JFK    BQN  N804JB      B6 39.02 26.96 61.63
#>      5:  2013     1     1     5    JFK    BOS  N708JB      B6 39.02 26.96 61.63
#>     ---                                                                        
#> 336772:  2013    12    31    23    EWR    SJU  N651JB      B6    NA    NA    NA
#> 336773:  2013    12    31    23    JFK    BQN  N566JB      B6    NA    NA    NA
#> 336774:  2013    12    31    23    JFK    SJU  N713TW      DL    NA    NA    NA
#> 336775:  2013    12    31    23    JFK    SJU  N509JB      B6    NA    NA    NA
#> 336776:  2013    12    31    23    JFK    PSE  N665JB      B6    NA    NA    NA
#>         wind_dir wind_speed wind_gust precip pressure visib           time_hour
#>            <num>      <num>     <num>  <num>    <num> <num>              <POSc>
#>      1:      260   12.65858        NA      0   1011.9    10 2013-01-01 05:00:00
#>      2:      260   12.65858        NA      0   1011.9    10 2013-01-01 05:00:00
#>      3:      260   14.96014        NA      0   1012.1    10 2013-01-01 05:00:00
#>      4:      260   14.96014        NA      0   1012.1    10 2013-01-01 05:00:00
#>      5:      260   14.96014        NA      0   1012.1    10 2013-01-01 05:00:00
#>     ---                                                                        
#> 336772:       NA         NA        NA     NA       NA    NA                <NA>
#> 336773:       NA         NA        NA     NA       NA    NA                <NA>
#> 336774:       NA         NA        NA     NA       NA    NA                <NA>
#> 336775:       NA         NA        NA     NA       NA    NA                <NA>
#> 336776:       NA         NA        NA     NA       NA    NA                <NA>
flights2 %>% left_join_dt(planes, by = "tailnum")
#>         tailnum year.x month   day  hour origin   dest carrier year.y   type
#>          <char>  <int> <int> <int> <num> <char> <char>  <char>  <int> <char>
#>      1:    <NA>   2013     1     2    15    JFK    LAX      AA     NA   <NA>
#>      2:    <NA>   2013     1     2    16    EWR    ORD      UA     NA   <NA>
#>      3:    <NA>   2013     1     3     8    EWR    MIA      UA     NA   <NA>
#>      4:    <NA>   2013     1     3     6    EWR    DFW      UA     NA   <NA>
#>      5:    <NA>   2013     1     4     8    JFK    DCA      9E     NA   <NA>
#>     ---                                                                     
#> 336772:  N9EAMQ   2013     9    27    16    LGA    ATL      MQ     NA   <NA>
#> 336773:  N9EAMQ   2013     9    29    12    LGA    BNA      MQ     NA   <NA>
#> 336774:  N9EAMQ   2013     9    29    18    LGA    CMH      MQ     NA   <NA>
#> 336775:  N9EAMQ   2013     9    30    11    JFK    DCA      MQ     NA   <NA>
#> 336776:  N9EAMQ   2013     9    30    14    JFK    TPA      MQ     NA   <NA>
#>         manufacturer  model engines seats speed engine
#>               <char> <char>   <int> <int> <int> <char>
#>      1:         <NA>   <NA>      NA    NA    NA   <NA>
#>      2:         <NA>   <NA>      NA    NA    NA   <NA>
#>      3:         <NA>   <NA>      NA    NA    NA   <NA>
#>      4:         <NA>   <NA>      NA    NA    NA   <NA>
#>      5:         <NA>   <NA>      NA    NA    NA   <NA>
#>     ---                                               
#> 336772:         <NA>   <NA>      NA    NA    NA   <NA>
#> 336773:         <NA>   <NA>      NA    NA    NA   <NA>
#> 336774:         <NA>   <NA>      NA    NA    NA   <NA>
#> 336775:         <NA>   <NA>      NA    NA    NA   <NA>
#> 336776:         <NA>   <NA>      NA    NA    NA   <NA>
flights2 %>% left_join_dt(airports, c("dest" = "faa"))
#>           dest  year month   day  hour origin tailnum carrier
#>         <char> <int> <int> <int> <num> <char>  <char>  <char>
#>      1:    ABQ  2013    10     1    20    JFK  N554JB      B6
#>      2:    ABQ  2013    10     2    20    JFK  N607JB      B6
#>      3:    ABQ  2013    10     3    20    JFK  N591JB      B6
#>      4:    ABQ  2013    10     4    20    JFK  N662JB      B6
#>      5:    ABQ  2013    10     5    19    JFK  N580JB      B6
#>     ---                                                      
#> 336772:    XNA  2013     9    29    17    LGA  N725MQ      MQ
#> 336773:    XNA  2013     9    30     7    LGA  N735MQ      MQ
#> 336774:    XNA  2013     9    30     8    EWR  N14117      EV
#> 336775:    XNA  2013     9    30    15    LGA  N725MQ      MQ
#> 336776:    XNA  2013     9    30    17    LGA  N720MQ      MQ
#>                                      name      lat        lon   alt    tz
#>                                    <char>    <num>      <num> <num> <num>
#>      1: Albuquerque International Sunport 35.04022 -106.60919  5355    -7
#>      2: Albuquerque International Sunport 35.04022 -106.60919  5355    -7
#>      3: Albuquerque International Sunport 35.04022 -106.60919  5355    -7
#>      4: Albuquerque International Sunport 35.04022 -106.60919  5355    -7
#>      5: Albuquerque International Sunport 35.04022 -106.60919  5355    -7
#>     ---                                                                  
#> 336772:              NW Arkansas Regional 36.28187  -94.30681  1287    -6
#> 336773:              NW Arkansas Regional 36.28187  -94.30681  1287    -6
#> 336774:              NW Arkansas Regional 36.28187  -94.30681  1287    -6
#> 336775:              NW Arkansas Regional 36.28187  -94.30681  1287    -6
#> 336776:              NW Arkansas Regional 36.28187  -94.30681  1287    -6
#>            dst           tzone
#>         <char>          <char>
#>      1:      A  America/Denver
#>      2:      A  America/Denver
#>      3:      A  America/Denver
#>      4:      A  America/Denver
#>      5:      A  America/Denver
#>     ---                       
#> 336772:      A America/Chicago
#> 336773:      A America/Chicago
#> 336774:      A America/Chicago
#> 336775:      A America/Chicago
#> 336776:      A America/Chicago
flights2 %>% left_join_dt(airports, c("origin" = "faa"))
#>         origin  year month   day  hour   dest tailnum carrier
#>         <char> <int> <int> <int> <num> <char>  <char>  <char>
#>      1:    EWR  2013     1     1     5    IAH  N14228      UA
#>      2:    EWR  2013     1     1     5    ORD  N39463      UA
#>      3:    EWR  2013     1     1     6    FLL  N516JB      B6
#>      4:    EWR  2013     1     1     6    SFO  N53441      UA
#>      5:    EWR  2013     1     1     6    LAS  N76515      UA
#>     ---                                                      
#> 336772:    LGA  2013     9    30    18    BNA  N740EV      EV
#> 336773:    LGA  2013     9    30    22    SYR    <NA>      9E
#> 336774:    LGA  2013     9    30    12    BNA  N535MQ      MQ
#> 336775:    LGA  2013     9    30    11    CLE  N511MQ      MQ
#> 336776:    LGA  2013     9    30     8    RDU  N839MQ      MQ
#>                        name      lat       lon   alt    tz    dst
#>                      <char>    <num>     <num> <num> <num> <char>
#>      1: Newark Liberty Intl 40.69250 -74.16867    18    -5      A
#>      2: Newark Liberty Intl 40.69250 -74.16867    18    -5      A
#>      3: Newark Liberty Intl 40.69250 -74.16867    18    -5      A
#>      4: Newark Liberty Intl 40.69250 -74.16867    18    -5      A
#>      5: Newark Liberty Intl 40.69250 -74.16867    18    -5      A
#>     ---                                                          
#> 336772:          La Guardia 40.77725 -73.87261    22    -5      A
#> 336773:          La Guardia 40.77725 -73.87261    22    -5      A
#> 336774:          La Guardia 40.77725 -73.87261    22    -5      A
#> 336775:          La Guardia 40.77725 -73.87261    22    -5      A
#> 336776:          La Guardia 40.77725 -73.87261    22    -5      A
#>                    tzone
#>                   <char>
#>      1: America/New_York
#>      2: America/New_York
#>      3: America/New_York
#>      4: America/New_York
#>      5: America/New_York
#>     ---                 
#> 336772: America/New_York
#> 336773: America/New_York
#> 336774: America/New_York
#> 336775: America/New_York
#> 336776: America/New_York

Types of join

If all you have is a data.frame or tibble, you have no need to change the format. Feed the data directly:

The "_dt" suffix should remind you that this is backed up by data.table and will always return a data.table in the end.

Filtering joins

Filtering joins have also been supported in tidyfst.

Other examples (semi_join_dt() and anti_join_dt() never duplicate; they only ever remove observations.):

Set operations

For set operations, use data.table’s function directly. tidyfst have exported the following functions:

  • fintersect(x, y, all = FALSE)

  • fsetdiff(x, y, all = FALSE)

  • funion(x, y, all = FALSE)

  • fsetequal(x, y, all = TRUE)

    For details, just find the help from data.table using ?setops.