From Rational Number Reconstruction to Set Reconciliation and File Synchronization
Amarilli, Antoine;
Ben Hamouda, Fabrice;
Bourse, Florian;
Morisset, Robin;
Naccache, David;
Rauzy, Pablo
This work revisits set reconciliation, the problem of synchronizing two multisets of fixed-size values while minimizing transmission complexity. We propose a new number-theoretic reconciliation protocol called Divide and Factor (D&F;) that achieves optimal asymptotic transmission complexity — as do previously known alternative algorithms. We analyze the computational complexities of various D&F; variants, study the problem of synchronizing sets of variable-size files using hash functions and apply D&F; to synchronize file hierarchies taking file locations into account.
We describe btrsync, our open-source D&F; implementation, and benchmark it against the popular software rsync. It appears that btrsync transmits much less data than rsync, at the expense of a relatively modest computational overhead.