Repertoire overlap

Repertoire overlap is the most common approach to measure repertoire similarity. It is achieved by computation of specific statistics on clonotypes shared between given repertoires, also called “public” clonotypes. immunarch provides several indices: - number of public clonotypes (.method = "public") - a classic measure of overlap similarity.

The function that includes described methods is repOverlap. Again the output is easily visualised when passed to vis() function that does all the work:

imm_ov1 = repOverlap(immdata$data, .method = "public", .verbose = F)
imm_ov2 = repOverlap(immdata$data, .method = "morisita", .verbose = F)

grid.arrange(vis(imm_ov1), vis(imm_ov2, .text.size=2), ncol = 2)

vis(imm_ov1, "heatmap2")

You can easily change the number of significant digits:

grid.arrange(vis(imm_ov2, .text.size=2.5, .signif.digits=1), vis(imm_ov2, .text.size=2, .signif.digits=2), ncol = 2)

To analyse the computed overlap measures function apply repOverlapAnalysis.

# Apply different analysis algorithms to the matrix of public clonotypes:
# "mds" - Multi-dimensional Scaling
repOverlapAnalysis(imm_ov1, "mds")
## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
## 
## Rotation (n x k) = (12 x 2):
##                [,1]       [,2]
## A2-i129 -20.2308709  22.431389
## A2-i131   8.3055445 -26.779321
## A2-i133  45.9341813  -6.893304
## A2-i132 -55.0903957 -18.572513
## A4-i191  23.7461189  -1.118162
## A4-i192  -4.4041243  38.028858
## MS1     -19.5494165 -12.836320
## MS2      -1.9063188  -6.075283
## MS3      -9.8321059  11.217724
## MS4       0.9127103   1.154627
## MS5       5.6552254 -27.415676
## MS6      26.4594518  26.857981
# "tsne" - t-Stochastic Neighbor Embedding
repOverlapAnalysis(imm_ov1, "tsne")
##                DimI      DimII
## A2-i129 -309.308638  251.22417
## A2-i131  551.123363 -101.15299
## A2-i133   62.186836 -100.85813
## A2-i132  -59.468871 -175.06698
## A4-i191  -26.853840  -92.15216
## A4-i192 -330.025043  335.14260
## MS1     -235.802997  299.91630
## MS2       27.022606 -182.46162
## MS3     -295.471354  285.13559
## MS4        7.473276 -148.13049
## MS5      555.850331 -135.31223
## MS6       53.274330 -236.28406
## attr(,"class")
## [1] "matrix"      "immunr_tsne"
# Visualise the results
vis(repOverlapAnalysis(imm_ov1, "mds"))

# Apply different analysis algorithms to the matrix of public clonotypes:
# "mds" - Multi-dimensional Scaling
repOverlapAnalysis(imm_ov1, "mds")
## Standard deviations (1, .., p=4):
## [1] 0 0 0 0
## 
## Rotation (n x k) = (12 x 2):
##                [,1]       [,2]
## A2-i129 -20.2308709  22.431389
## A2-i131   8.3055445 -26.779321
## A2-i133  45.9341813  -6.893304
## A2-i132 -55.0903957 -18.572513
## A4-i191  23.7461189  -1.118162
## A4-i192  -4.4041243  38.028858
## MS1     -19.5494165 -12.836320
## MS2      -1.9063188  -6.075283
## MS3      -9.8321059  11.217724
## MS4       0.9127103   1.154627
## MS5       5.6552254 -27.415676
## MS6      26.4594518  26.857981
# "tsne" - t-Stochastic Neighbor Embedding
repOverlapAnalysis(imm_ov1, "tsne")
##               DimI       DimII
## A2-i129 -146.02757 -114.517600
## A2-i131   50.42299  264.605027
## A2-i133   90.16784  -12.674060
## A2-i132   86.90945  -26.117080
## A4-i191   73.07017  -25.917212
## A4-i192 -135.88332 -105.975243
## MS1     -150.84908 -102.155023
## MS2       76.93567  -13.249442
## MS3     -144.63938 -109.063342
## MS4       80.64141  -18.029681
## MS5       47.83455  269.211192
## MS6       71.41727   -6.117536
## attr(,"class")
## [1] "matrix"      "immunr_tsne"
# Visualise the results
vis(repOverlapAnalysis(imm_ov1, "mds"))

# Clusterise the MDS resulting components using K-means
vis(repOverlapAnalysis(imm_ov1, "mds+kmeans"))

Public repertoire

In order to build a massive table with all clonotypes from the list of repertoires use the pubRep function.

# Pass "nt" as the second parameter to build the public repertoire table using CDR3 nucleotide sequences
pr.nt = pubRep(immdata$data, "nt", .verbose = F)
## Warning in max(res[["Samples"]]): no non-missing arguments to max;
## returning -Inf
pr.nt
## Source: local data table [0 x 14]
## 
## # A tibble: 0 x 14
## # … with 14 variables: CDR3.nt <chr>, Samples <dbl>, `A2-i129` <dbl>,
## #   `A2-i131` <dbl>, `A2-i133` <dbl>, `A2-i132` <dbl>, `A4-i191` <dbl>,
## #   `A4-i192` <dbl>, MS1 <dbl>, MS2 <dbl>, MS3 <dbl>, MS4 <dbl>,
## #   MS5 <dbl>, MS6 <dbl>
# Pass "aa+v" as the second parameter to build the public repertoire table using CDR3 aminoacid sequences and V alleles
# In order to use only CDR3 aminoacid sequences, just pass "aa"
pr.aav = pubRep(immdata$data, "aa+v", .verbose = F)
pr.aav
## Source: local data table [86,185 x 15]
## 
## # A tibble: 86,185 x 15
##    CDR3.aa V.name Samples `A2-i129` `A2-i131` `A2-i133` `A2-i132` `A4-i191`
##    <chr>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
##  1 CASSLE… TRBV5…       8         2        NA         3         1        NA
##  2 CASSDS… TRBV6…       6         1         1         2        NA         5
##  3 CASSDS… TRBV6…       6        NA        NA        NA         4         1
##  4 CASSFQ… TRBV5…       6         3        NA         1         1        NA
##  5 CASSLG… TRBV1…       6         2        NA        NA         4         3
##  6 CASSLQ… TRBV1…       6        NA         1         1         4        NA
##  7 CSARLA… TRBV2…       6         1         3        NA         2         1
##  8 CASSDS… TRBV6…       5        NA        NA        NA         3        NA
##  9 CASSFG… TRBV1…       5        NA         1        NA         5        NA
## 10 CASSLD… TRBV5…       5         1        NA        NA        NA        NA
## # … with 86,175 more rows, and 7 more variables: `A4-i192` <dbl>,
## #   MS1 <dbl>, MS2 <dbl>, MS3 <dbl>, MS4 <dbl>, MS5 <dbl>, MS6 <dbl>
# You can also pass the ".coding" parameter to filter out all noncoding sequences first:
pr.aav.cod = pubRep(immdata$data, "aa+v", .coding=T)
# Create a public repertoire with coding-only sequences using both CDR3 amino acid sequences and V genes
pr = pubRep(immdata$data, "aa+v", .coding = T, .verbose = F)

# Apply the filter subroutine to leave clonotypes presented only in healthy individuals
pr1 = pubRepFilter(pr, immdata$meta, c(Status = "C"))

# Apply the filter subroutine to leave clonotypes presented only in diseased individuals
pr2 = pubRepFilter(pr, immdata$meta, c(Status = "MS"))

# Divide one by another
pr3 = pubRepApply(pr1, pr2)

# Plot it
p = ggplot() + geom_jitter(aes(x = "Treatment", y = Result), data=pr3)
p