Published February 5, 2025
| Version v0.21.0
Software
Open
wi2trier/cbrkit: v0.21.0
Authors/Creators
- 1. Trier University / @DFKI
- 2. @semantic-release
Description
0.21.0 (2025-02-05)
⚠ BREAKING CHANGES
- The entire library has largely been rewritten, so there will be additional breaking changes. Please refer to the Readme and the tests for more information.
- The function
cbrkit.reuse.buildnow expects a retriever function instead of a similarity function so that more logic can be shared between the phases. - To better support the new retrieval functions, the arguments
limit,min_similarity, andmax_similarityof the functioncbrkit.retrieval.buildhave been removed. Instead, wrap your call ofcbrkit.retrieval.buildwith the new functioncbrkit.retrieval.dropoutthat now exposes these arguments. - The functions
applyandmapplyhave been removed to better support processing multiple queries at once. They have been replaced by the functionsapply_queryandapply_queries. Both return the same result object, so the return value ofapply_queriesis not identical to the one of the previousmapplyfunction. The functionsapplyandapply_queryhowever share the same return type. - The number of processes to use for retrieval is no longer passed to the
applyfunctions, but instead given to thebuildfunction. - To better support the new retrieval functions, the arguments
limit,min_similarity, andmax_similarityof the functioncbrkit.retrieval.buildhave been removed. Instead, wrap your call ofcbrkit.retrieval.buildwith the new functioncbrkit.retrieval.dropoutthat now exposes these arguments. - CBRkit now provides additional modules for
adapt,reuse,cycle, andeval. - We added support for logging via the standard library.
- There is a new
synthesismodule that provides tight integration with various LLM providers. This can for instance be used to develop RAG applications using CBR. - Loading and dumping cases has been reworked, we now provide generators to construct serialization and deserialization functions.
- Caching of similarity values has been added, simply wrap your existing similarity function with the new
cbrkit.sim.cachewrapper. - A new embedding module
cbrkit.sim.embedhas been added that provides a better interface to compose string-based similarity functions that rely on vectors. It also includes a cache that can be stored on disk. - Similarity functions for graphs have been overhauled and now provide a more consistent interface.
Features
- adapt: add openai function to adapt cases (38cbb26)
- adapt: add similarity delta to pipe function (9d58252)
- add docstrings to export (3991a51)
- add dumpers module for serializing casebases (475f532)
- add dumpers, anthropic provider, update docs (#215) (0f440c5)
- add generation submodule to handle provider-specific code (068b6ff)
- add global handling of asyncio event loop (0ed704a)
- add initial version of rag module (f803b3b)
- add integration with voyageai (6d7b4eb)
- add logging (32fde3d)
- add methods to perform entire r4 cycles more easily (eb08557)
- add openapi schema generator (611370d)
- add rag support to api and cli (57a0334)
- add support for factories (529aa1a)
- add support for factories to more functions (2794ffc)
- add transpose_value wrappers (ebbabc9)
- api: allow passing paths for casebase/query (8cdcbb4)
- api: support passing files (2993fac)
- api: switch query parameters to request body (9604428)
- convert results to pydantic models (3b1c5e0)
- dumpers: make markdown function generic (33d627c)
- embed/openai: add lazy loading (7e5c783)
- embed: add lazy loading for cache (98790cc)
- eval: add helper for arbitrary scores (79ce192)
- eval: add proper support for relevance levels (c414112)
- eval: allow conversion of retrieval result to qrels (c8a7cb5)
- eval: allow custom metric functions (fde869a)
- generate: add memory to openai (e645fde)
- helpers: add getitem_or_getattr (0b94774)
- helpers: allow conversion of functions to base models (b206c1f)
- improve genai providers (8da6269)
- improve handling of multiprocessing (81ac32c)
- improve logging and multiprocessing (3b78deb)
- integrate processing of query collections into the core of cbrkit (b8df8ee)
- make cbrkit project layout more consistent (b738d6f)
- multiprocessing: allow boolean values (e9e3827)
- openai: add support for tool calling via unions (0b3e29e)
- optimize multiprocessing (87ee55f)
- rag: add model similar to retrieval/reuse (ca76c73)
- retrieval: add dropout function (3f50dbf)
- retrieval: add openai function for estimating the similarity (803ff75)
- retrieval: add sentence transformers reranker (bd05b2e)
- retrieval: add transpose helper to simplify conversion of cases (216eca3)
- retrieval: use async clients for cohere and voyage ai (6b37814)
- reuse: allow passing multiple adaptation functions to builder (6493923)
- reuse: allow passing similarities from earlier steps (6732b38)
- reuse: introduce dropout function similar to retrieval (c3254c6)
- rework reuse phase and update apply helpers (f4a11e8)
- rework type structure and improve genai/rag modules (b94e898)
- sim/embed: add chunking/truncation for openai (406c7bc)
- sim/graphs: add dtw alignment (#214) (bc44cb7)
- sim/graphs: add dtw and smith-waterman functions (#201) (040b702)
- sim/graphs: add initial version of exhaustive mapping (818a356)
- sim/graphs: add local sims, update astar heuristics (fe51441)
- sim/graphs: add precompute function (5ba52d7)
- sim/graphs: add smith (#218) (771fe3b)
- sim/graphs: make it easier to define node similarities (0c95d3a)
- sim/graphs: rewrite astar algorithm (9c788f6)
- sim/strings: add vector database (07727f0)
- sim: add cache method (8c9992f)
- sim: add default sim for attribute value (686f270)
- sim: update interface for embed and taxonomy functions (6af97f9)
- sim: update table helpers and move to wrappers (868bd8d)
- synthesis/providers: add delay parameter (0fad3c5)
- synthesis: add google provider (1724ad0)
- synthesis: allow chunking with overlap (ee5bebd)
- synthesis: run chunk helper in parallel (b91c9c0)
- various improvements (da308f2)
Bug Fixes
- adapt/generic: add strategy to pipe function (d943f59)
- api: response types failed to validate (a5fe024)
- api: simplify definition of retrievers/reusers (371cd78)
- astar: convert to batch sim func (cfc0d42)
- astar: improve logic (1643549)
- astar: make naming and exports more consistent (007dfb0)
- astar: restructure legal mapping funcs, add sim precomputer (8d54903)
- chunkify: check arguments (90b8f38)
- cli: disable pretty exceptions (215070b)
- cli: use dumpers for exporting (2293948)
- convert some lambdas to real functions (10994c3)
- correctly construct pydantic models (fc49b1a)
- correctly set sentence_transformers metadata (d3511cc)
- default to structured outputs for openai (247bb05)
- dumpers: properly get name for markdown code block (40ba442)
- embed: add autodump to cache (f7dc949)
- embed: add lazy loading to sentence transformers (0cf74ec)
- embed: add logging (9156846)
- embed: autodump only if new texts are found (bd9d05b)
- embed: check hash before dumping cache (118611b)
- embed: remove unneeded lazy loading (45738c0)
- embed: use modified time instead of hash to detect changes (4ee3963)
- eval: add kendall tau (95eda58)
- eval: add mean_score function (9ac7c30)
- eval: improve conversion of scores to qrels (579de7f)
- eval: improve metric generation (c14f48e)
- export default aggregator (f4d4473)
- extend support for lazy loading (645cad4)
- formatting and typing improvements (d49f3e5)
- genai/prompts: add transpose function (dc02582)
- graph: enhance serialization (d6111a4)
- graphs: add converter callbacks to dump/load (d96e9d8)
- graphs: add load/dump (e61bd92)
- graphs: drop SerializedNode (b4e0939)
- helpers: add log_batch (cf17591)
- helpers: correctly handle bool values for multiprocessing (b02e749)
- helpers: optimize loading of callable maps (22e9443)
- improve dumpers, especially for graphs (cb9f6df)
- improve eval module (a85f817)
- improve handling of defaults for tables (f082511)
- improve logging during multiprocessing (fb038ba)
- improve synthesis (00e95c2)
- improve typing of generic tables (7ca8740)
- improve vector db (31afcc1)
- keep casebase/query in result object when dumping (3cfd623)
- loaders: correctly handle files in directories (afa26f2)
- loaders: properly handle io (088f3c9)
- loaders: properly load binary data (33c3424)
- log only if more than one batches are processed (10d8371)
- make dumper argument ordering more consistent (5e90214)
- make reuse/retrieval functions more robust (e7437b9)
- minor improvements (f1c25b0)
- model: add default_query to top result class (9d33c17)
- model: store unfiltered casebase as well (73ccb69)
- move from TypedDict to BaseModel/dataclass (2afce5c)
- openai: use not_given where necessary (e39f297)
- prompts: allow giving functions as instructions (1646548)
- prompts: remove dedent (8abce29)
- re-add logging to astar (aa99c92)
- remove factories that are no longer needed (a420695)
- remove synthesis-based retriever/reuser until a better interface is defined (7d813c5)
- restore similarity filtering behavior (7ec627d)
- result export (58ba034)
- retrieval: improve metadata (76d90e4)
- retrieval: optimize sentence transformers (553a018)
- sim/astar: add default to max-calls (e2fc133)
- sim/astar: correctly compute sim and loop over the open set (28cb22b)
- sim/astar: force to map all edges in select2 (87ea326)
- sim/astar: remove optimization for edge expansion (f8c1409)
- sim/collections: allow dtw for arbitrary types (#204) (5f7585e)
- sim/collections: update types for dtw (74737f1)
- sim/embed: correctly convert to float (32925d5)
- sim/embed: correctly load/dump cached store (22c8ffc)
- sim/embed: generalize helper functions (3bb7c10)
- sim/graphs: generalize graph sim (7212929)
- sim/graphs: improve is_sequential and conditionally import alignment metrics (33b25e4)
- sim/graphs: improve isomorphism (a6bac6b)
- sim/graphs: merge node_data_sim and node_obj_sim (9b912ad)
- sim/graphs: swap x and y in some cases (070e8fb)
- sim/graphs: use dicts for graph sim return value (4fffbc0)
- sim/graphs: use optional dependencies for alignment (4ea0005)
- sim/strings: gracefully handle empty batches (6f7e86e)
- sim/strings: optimize computation of semantic similarities (cfa6505)
- sim: add type_equality function (c49a3af)
- sim: do not serialize cache (9ef0b26)
- sim: expand functionality of dynamic table (e79cb33)
- sim: improve table similarities (977d798)
- small improvements for sentence transformers (7877c27)
- synthesis: add logging (c59839a)
- synthesis: openai message construction (959d33e)
- synthesis: properly use init vars (dd4ddf1)
- synthesis: update openai parameters (9eac912)
- taxonomy: allow paths (b35e488)
- typing: use np.float64 (fdbd45b)
- update text loaders (c60219c)
- use rag functions in retrieve/adapt and add chunking (f737a2b)
Miscellaneous Chores
- add notable changes (1f1bd17)
Files
wi2trier/cbrkit-v0.21.0.zip
Files
(623.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6947a842f691c850c6e06ff7435b8ac5
|
623.7 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/wi2trier/cbrkit/tree/v0.21.0 (URL)
Software
- Repository URL
- https://github.com/wi2trier/cbrkit