Published February 22, 2024
| Version v3.1
Software
Open
LLNL/kosh: Release 3.1
Description
3.1 Release
Description
This release is a minor release with a few bux fixes and new features. We encourage users to upgrade.
New in this release
- When searching for datasets from the store you can request to return them as Kosh Datasets (default no change in behaviour), Sina records, or Python dictionaries, which enables faster returns in loops.
store.find([...],load_type='dictionary'). - Kosh stores have an
alias_featureattribute, that is used to allow users to extract features via an aliased name. - New Auto Epsilon Algorithm for Clustering: The algorithm will find the right epsilon value to use in clustering. The user can specify the amount of allowed information loss due to removing samples from the dataset.
- Requires sina >=1.14
- When opening a mariadb backend, in order to avoid sync error between ranks you should use:
store = kosh.connect(mariadb, execution_options={"isolation_level": "READ COMMITTED"}) mvandcpfrom the command line now have--merge_strategyandmk_dirsoptionscp,mvandtarare now accessible from Python at the store level:store.cp(),store.mv()andstore.tar()- There is a README for the Kosh test suite, including a dedicated one for LC users
- Sina new ingest capabilities are available in Kosh via dataset, but with decoartor to allow the use of functions operating on Sina records.
- Documentation switched to mkdocs.
Improvements
- Some internal cleanups (internal kosh attributes are being moved to their own section under the `user_defined`` section of the sina record).
- Clustering now has a verbose option.
- When using MPI the clustering can be gathered to your prefered rank (rather than 0) with
gather_to - Batch clustering has a more lenient convergence option resulting in faster clustering sampling.
- Getting a warning when a loader cannot be loaded into the store.
- Using bash rather than sh for the sbang
latin1encoding of loaders seems to create issues with mariadb, switching towindows-1252- Test suite gets mariadb from env variable.
- Issue a warning if trying to set an ensemble attribute from a dataset and it matches the existing value. It still produces an error if the values differ.
- KoshCluster is more consistent in what it returns. It will always return a list now even if None is returned.
Bug fixes
- Kosh parallel clustering used to hang when sample size was too small.
- Kosh parallel clustering returned indices as a 1D array rather than a flat array.
- On BlueOS
update_json_file_with_records_and_relationshipsused to fail. - Reassociating a file linked to many datasets used to fail for other datasets if the reassociation was done at the dataset level.
use_lock_filecaused hanging while using mariadb.mvcommand now works with nested dirsmvandcpnow preserve ensemble membership.- KoshClustering
operateuses inputs shape rather than original datasets sizes.
Files
LLNL/kosh-v3.1.zip
Files
(1.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:99a80c46c4ee7ca3edd1c2ecebb35ca3
|
1.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/LLNL/kosh/tree/v3.1 (URL)