Repertoire overlap

Repertoire overlap is the most common approach to measure repertoire similarity. It is achieved by computation of specific statistics on clonotypes shared between given repertoires, also called “public” clonotypes. immunarch provides several indices: - number of public clonotypes (.method = "public") - a classic measure of overlap similarity.

  • overlap coefficient (.method = "overlap") - a normalised measure of overlap similarity. It is defined as the size of the intersection divided by the smaller of the size of the two sets.

  • Jaccard index (.method = "jaccard") - it measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.

  • Tversky index (.method = "tversky") - an asymmetric similarity measure on sets that compares a variant to a prototype. If using default arguments, it’s similar to Dice’s coefficient.

  • cosine similarity (.method = "cosine") - a measure of similarity between two non-zero vectors

  • Morisita’s overlap index (.method = "morisita") - a statistical measure of dispersion of individuals in a population. It is used to compare overlap among samples.

  • incremental overlap - overlaps of the N most abundant clonotypes with incrementally growing N (.method = "inc+METHOD", e.g., "inc+public" or "inc+morisita").

The function that includes described methods is repOverlap. Again the output is easily visualised when passed to vis() function that does all the work:

imm_ov1 = repOverlap(immdata$data, .method = "public", .verbose = F)
imm_ov2 = repOverlap(immdata$data, .method = "morisita", .verbose = F)

grid.arrange(vis(imm_ov1), vis(imm_ov2, .text.size=2), ncol = 2)

vis(imm_ov1, "heatmap2")

You can easily change the number of significant digits:

grid.arrange(vis(imm_ov2, .text.size=2.5, .signif.digits=1), vis(imm_ov2, .text.size=2, .signif.digits=2), ncol = 2)

To analyse the computed overlap measures function apply repOverlapAnalysis.

## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
## 
## Rotation (n x k) = (12 x 2):
##                [,1]       [,2]
## A2-i129 -20.2308709  22.431389
## A2-i131   8.3055445 -26.779321
## A2-i133  45.9341813  -6.893304
## A2-i132 -55.0903957 -18.572513
## A4-i191  23.7461189  -1.118162
## A4-i192  -4.4041243  38.028858
## MS1     -19.5494165 -12.836320
## MS2      -1.9063188  -6.075283
## MS3      -9.8321059  11.217724
## MS4       0.9127103   1.154627
## MS5       5.6552254 -27.415676
## MS6      26.4594518  26.857981
##              DimI       DimII
## A2-i129 -86.97551  130.954046
## A2-i131  17.70935 -214.155255
## A2-i133  51.07917   -9.035797
## A2-i132  64.96704  -21.724328
## A4-i191  63.08461   -9.840291
## A4-i192 -97.31186  125.869656
## MS1     -95.10775  139.112902
## MS2      53.54584  -20.279439
## MS3     -91.78597  131.754829
## MS4      57.08393  -16.406322
## MS5      15.45343 -210.180619
## MS6      48.25773  -26.069381
## attr(,"class")
## [1] "matrix"      "immunr_tsne"

## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
## 
## Rotation (n x k) = (12 x 2):
##                [,1]       [,2]
## A2-i129 -20.2308709  22.431389
## A2-i131   8.3055445 -26.779321
## A2-i133  45.9341813  -6.893304
## A2-i132 -55.0903957 -18.572513
## A4-i191  23.7461189  -1.118162
## A4-i192  -4.4041243  38.028858
## MS1     -19.5494165 -12.836320
## MS2      -1.9063188  -6.075283
## MS3      -9.8321059  11.217724
## MS4       0.9127103   1.154627
## MS5       5.6552254 -27.415676
## MS6      26.4594518  26.857981
##                DimI     DimII
## A2-i129    4.177431  186.7817
## A2-i131 -290.436136  176.5083
## A2-i133   76.478274 -234.9906
## A2-i132   22.482771 -205.3148
## A4-i191   39.140504 -145.5324
## A4-i192   53.155531  153.4945
## MS1       58.759379  209.0864
## MS2       90.451827 -175.5345
## MS3       29.394423  184.1395
## MS4       65.054325 -185.9405
## MS5     -281.100540  198.4464
## MS6      132.442210 -161.1439
## attr(,"class")
## [1] "matrix"      "immunr_tsne"

Public repertoire

In order to build a massive table with all clonotypes from the list of repertoires use the pubRep function.

## Warning in max(res[["Samples"]]): no non-missing arguments to max;
## returning -Inf
## Source: local data table [0 x 14]
## 
## # A tibble: 0 x 14
## # … with 14 variables: CDR3.nt <chr>, Samples <dbl>, `A2-i129` <dbl>,
## #   `A2-i131` <dbl>, `A2-i133` <dbl>, `A2-i132` <dbl>, `A4-i191` <dbl>,
## #   `A4-i192` <dbl>, MS1 <dbl>, MS2 <dbl>, MS3 <dbl>, MS4 <dbl>,
## #   MS5 <dbl>, MS6 <dbl>
## Source: local data table [86,185 x 15]
## 
## # A tibble: 86,185 x 15
##    CDR3.aa V.name Samples `A2-i129` `A2-i131` `A2-i133` `A2-i132` `A4-i191`
##    <chr>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
##  1 CASSLE… TRBV5…       8         2        NA         3         1        NA
##  2 CASSDS… TRBV6…       6         1         1         2        NA         5
##  3 CASSDS… TRBV6…       6        NA        NA        NA         4         1
##  4 CASSFQ… TRBV5…       6         3        NA         1         1        NA
##  5 CASSLG… TRBV1…       6         2        NA        NA         4         3
##  6 CASSLQ… TRBV1…       6        NA         1         1         4        NA
##  7 CSARLA… TRBV2…       6         1         3        NA         2         1
##  8 CASSDS… TRBV6…       5        NA        NA        NA         3        NA
##  9 CASSFG… TRBV1…       5        NA         1        NA         5        NA
## 10 CASSLD… TRBV5…       5         1        NA        NA        NA        NA
## # … with 86,175 more rows, and 7 more variables: `A4-i192` <dbl>,
## #   MS1 <dbl>, MS2 <dbl>, MS3 <dbl>, MS4 <dbl>, MS5 <dbl>, MS6 <dbl>