Tensor Extraction of Latent Features (T-ELF)
Description
New Features
Introduces a new Vulture subclass
VocabularyConsolidator, underTELF.pre_processing.Vulture.tokens_analysis, designed to consolidate vocabularies and textual terms.Refactors NMFk, RESCALk, HNMFk, and SymNMFk to enhance modularity. Helper functions are created under
TELF.factorization.utilitiesto modularize the code.Adds a new search criterion for identifying the optimal rank, or K, to NMFk, HNMFk, WNMFk, and RNMFk. This enhancement introduces a significant speedup to each algorithm. The new criterion utilizes a Binary Search Tree to streamline the process of determining the optimal rank, drastically reducing the search space and the time needed for factorization. Additionally, this K search feature is compatible with High Performance Computing (HPC) systems, ensuring that changes in the K search space by any node are synchronized across all nodes. NMFk has been updated to include new hyper-parameters tailored to these search settings.
k_search_method='linear'will linearly visit each K given inKshyper-parameter of thefit()function.k_search_method='bst_post'will perform post-order binary search. When an ideal rank is found withmin(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space.k_search_method='bst_pre'will perform pre-order binary search. When an ideal rank is found withmin(W silhouette, H silhouette) >= sill_thresh, all lower ranks are pruned from the search space.
H_sill_thresh : float, optional Setting for removing higher ranks from the search space. The default is -1.
k_search_method : str, optional Which approach to use when searching for the rank or k. The default is "linear".
When searching for the optimal rank with binary search using
k_search='bst_post'ork_search='bst_pre', this hyper-parameter can be used to cut off higher ranks from search space. The cut-off of higher ranks from the search space is based on threshold for H silhouette. When a H silhouette belowH_sill_threshis found for a given rank or K, all higher ranks are removed from the search space. IfH_sill_thresh=-1, it is not used.
Bugs
- Fixes a bug in RESCALk plotting where plotting function was expecting W and H silhouettes.
- Fixes a bug where k predict would not work if none of the
WorHsilhouettes are above thesill_threshhyper-parameter. New fix selects newsill_threshbased on the rule:self.sill_thresh = min([max(sils_min_W), max(sils_min_H)])when none of theWorHsilhouettes are above thesill_threshhyper-parameter. - Fixes a bug in document substitutions of Vulture where an error is raised if no corpus substitutions are passed.
Notes
Files
lanl/T-ELF-v0.0.17.zip
Files
(23.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6229fb994358719003aa24ee1c795e57
|
23.8 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/lanl/T-ELF/tree/v0.0.17 (URL)