Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.
bed_jaccard(x, y)
x | |
---|---|
y |
tibble with the following columns:
len_i
length of the intersection in base-pairs
len_u
length of the union in base-pairs
jaccard
value of jaccard statistic
n_int
number of intersecting intervals between x
and y
If inputs are grouped, the return value will contain one set of values per group.
The Jaccard statistic takes values of [0,1]
and is measured as:
$$ J(x,y) = \frac{\mid x \bigcap y \mid} {\mid x \bigcup y \mid} = \frac{\mid x \bigcap y \mid} {\mid x \mid + \mid y \mid - \mid x \bigcap y \mid} $$
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
http://bedtools.readthedocs.org/en/latest/content/tools/jaccard.html
Other interval statistics: bed_absdist
,
bed_fisher
, bed_projection
,
bed_reldist
genome <- read_genome(valr_example('hg19.chrom.sizes.gz')) x <- bed_random(genome, seed = 1010486) y <- bed_random(genome, seed = 9203911) bed_jaccard(x, y)#> # A tibble: 1 x 4 #> len_i len_u jaccard n #> <dbl> <dbl> <dbl> <dbl> #> 1 236059495 1709233073 0.160 399478# calculate jaccard per chromosome bed_jaccard(dplyr::group_by(x, chrom), dplyr::group_by(y, chrom))#> # A tibble: 25 x 5 #> chrom len_i len_u jaccard n #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 chr1 19013821 137669498 0.160 32087 #> 2 chr10 10471352 75102550 0.162 17752 #> 3 chr11 10221596 74407682 0.159 17362 #> 4 chr12 10132944 73478730 0.160 17078 #> 5 chr13 8912455 64140662 0.161 15149 #> 6 chr14 8232867 59341464 0.161 13902 #> 7 chr15 7765611 56428414 0.160 13143 #> 8 chr16 6833852 49562929 0.160 11634 #> 9 chr17 6210996 44901313 0.161 10542 #> 10 chr18 5979095 43289534 0.160 10113 #> # ... with 15 more rows