There is a newer version of the record available.

Published April 27, 2024 | Version v0.0.17
Software Open

Tensor Extraction of Latent Features (T-ELF)

Description

New Features

  • Introduces a new Vulture subclass VocabularyConsolidator, under TELF.pre_processing.Vulture.tokens_analysis, designed to consolidate vocabularies and textual terms.

  • Refactors NMFk, RESCALk, HNMFk, and SymNMFk to enhance modularity. Helper functions are created under TELF.factorization.utilities to modularize the code.

  • Adds a new search criterion for identifying the optimal rank, or K, to NMFk, HNMFk, WNMFk, and RNMFk. This enhancement introduces a significant speedup to each algorithm. The new criterion utilizes a Binary Search Tree to streamline the process of determining the optimal rank, drastically reducing the search space and the time needed for factorization. Additionally, this K search feature is compatible with High Performance Computing (HPC) systems, ensuring that changes in the K search space by any node are synchronized across all nodes. NMFk has been updated to include new hyper-parameters tailored to these search settings.

    • k_search_method='linear' will linearly visit each K given in Ks hyper-parameter of the fit() function.
    • k_search_method='bst_post' will perform post-order binary search. When an ideal rank is found with min(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space.
    • k_search_method='bst_pre' will perform pre-order binary search. When an ideal rank is found with min(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space.

    H_sill_thresh : float, optional Setting for removing higher ranks from the search space. The default is -1.

    k_search_method : str, optional Which approach to use when searching for the rank or k. The default is "linear".

    When searching for the optimal rank with binary search using k_search='bst_post' or k_search='bst_pre', this hyper-parameter can be used to cut off higher ranks from search space. The cut-off of higher ranks from the search space is based on threshold for H silhouette. When a H silhouette below H_sill_thresh is found for a given rank or K, all higher ranks are removed from the search space. If H_sill_thresh=-1, it is not used.

Bugs

  • Fixes a bug in RESCALk plotting where plotting function was expecting W and H silhouettes.
  • Fixes a bug where k predict would not work if none of the W or H silhouettes are above the sill_thresh hyper-parameter. New fix selects new sill_thresh based on the rule: self.sill_thresh = min([max(sils_min_W), max(sils_min_H)]) when none of the W or H silhouettes are above the sill_thresh hyper-parameter.
  • Fixes a bug in document substitutions of Vulture where an error is raised if no corpus substitutions are passed.

Notes

If you use this software, please cite it as below.

Files

lanl/T-ELF-v0.0.17.zip

Files (23.8 MB)

Name Size Download all
md5:6229fb994358719003aa24ee1c795e57
23.8 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/lanl/T-ELF/tree/v0.0.17 (URL)