Computes the absolute distance between the midpoint of each x
interval and
the midpoints of each closest y
interval.
bed_absdist(x, y, genome)
x | |
---|---|
y | |
genome |
tbl_interval()
with .absdist
and .absdist_scaled
columns.
Absolute distances are scaled by the inter-reference gap for the
chromosome as follows. For Q
query points and R
reference
points on a chromosome, scale the distance for each query point i
to
the closest reference point by the inter-reference gap for each chromosome.
If an x
interval has no matching y
chromosome,
.absdist
is NA
.
$$d_i(x,y) = min_k(|q_i - r_k|)\frac{R}{Length\ of\ chromosome}$$
Both absolute and scaled distances are reported as .absdist
and
.absdist_scaled
.
Interval statistics can be used in combination with
dplyr::group_by()
and dplyr::do()
to calculate
statistics for subsets of data. See vignette('interval-stats')
for
examples.
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002529
Other interval statistics: bed_fisher
,
bed_jaccard
, bed_projection
,
bed_reldist
genome <- read_genome(valr_example('hg19.chrom.sizes.gz')) x <- bed_random(genome, seed = 1010486) y <- bed_random(genome, seed = 9203911) bed_absdist(x, y, genome)#> # A tibble: 1,000,000 x 5 #> chrom start end .absdist .absdist_scaled #> <chr> <int> <int> <dbl> <dbl> #> 1 chr1 323 1323 302 0.0977 #> 2 chr1 2032 3032 2011 0.651 #> 3 chr1 2475 3475 2454 0.794 #> 4 chr1 2759 3759 2226 0.720 #> 5 chr1 2766 3766 2219 0.718 #> 6 chr1 3528 4528 1457 0.471 #> 7 chr1 8394 9394 207 0.0670 #> 8 chr1 8819 9819 218 0.0705 #> 9 chr1 12963 13963 788 0.255 #> 10 chr1 24939 25939 270 0.0873 #> # ... with 999,990 more rows