Ben is looking at parallelizing the genotype log likelihood calculation using Rcpp parallel, for a version of rubias that updates the allele frequencies and hence get bogged down recomputing those logls.
I looked over the Rcpp parallel documentation, and it looks like there is a lot of C++ templating to get right. I was wondering if it would not be possible to actually do the parallelization within R using mclapply.
It actually won’t be, because Ben is doing all the stuff internally to the mcmc function in Rcpp. But I am still curious The idea is to just chop a matrix up and then operate on each part.
I will do it simply with sqrt.
Here is out matrix
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Registered S3 methods overwritten by 'ggplot2':
method from
[.quosures rlang
c.quosures rlang
print.quosures rlang
[30m── [1mAttaching packages[22m ────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.1.1 [32m✔[30m [34mpurrr [30m 0.3.2
[32m✔[30m [34mtibble [30m 2.1.1 [32m✔[30m [34mdplyr [30m 0.8.1
[32m✔[30m [34mtidyr [30m 0.8.3 [32m✔[30m [34mstringr[30m 1.4.0
[32m✔[30m [34mreadr [30m 1.3.1 [32m✔[30m [34mforcats[30m 0.4.0[39m
[30m── [1mConflicts[22m ───────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(parallel)
mat <- abs(rnorm(1e8)) %>%
matrix(nrow = 10000)
And here is our calculation
system.time(ser <- log(sqrt(mat)))
user system elapsed
1.527 0.251 1.779
So, it takes about 2 seconds. Now, the question is, could we parallelize that in a way that was any more efficient?
We will break it up into segments:
starts <- seq(1, 10000, by = floor(10000/8))
ends <- c(starts[-1] - 1, 10000)
se <- cbind(starts, ends)
system.time({par <- mclapply(1:8, function(x) {
log(sqrt(mat[se[x,1]:se[x,2], ]))
}) %>%
do.call(rbind, .)})
user system elapsed
8.800 2.615 6.121
Check to see that we get the same thing:
all.equal(ser, par)
[1] TRUE
All of which shows that there is a lot of overhead in breaking the problem up into parallelizable chunks. OK.