There is a newer version of the record available.

Published December 15, 2020 | Version 1.5.2
Software Open

ekzhu/datasketch: Improved performance for MinHash and MinHashLSH

  • 1. @athenianco
  • 2. Six Five Design
  • 3. University of Illinois, Urbana-Champaign
  • 4. Adobe
  • 5. @blindspot-ai
  • 6. Klaviyo

Description

  • Performance improvement for MinHash's update method.
  • Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch)
  • Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
  • Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!

Files

ekzhu/datasketch-1.5.2.zip

Files (795.7 kB)

Name Size Download all
md5:a3d3bce4aa309dab4bcd0ed08870cbc8
795.7 kB Preview Download

Additional details

Related works