There is a newer version of the record available.

Published June 21, 2021 | Version v0.53.0
Software Open

NatLibFi/Annif: Annif 0.53

  • 1. @NatLibFi
  • 2. @siilisolutions
  • 3. @niwa
  • 4. Koninklijke Bibliotheek
  • 5. @CSCfi
  • 6. @UB-Mannheim

Description

This release adds two new backends, YAKE and SVC. The YAKE backend is a wrapper around the YAKE library, which performs lexical unsupervised keyword extraction. There is no need for training data. See the YAKE wiki page for more information. In future Annif releases, it would be possible to extend YAKE support so that it can be used to suggest new terms for a vocabulary (the keywords that are not found in the vocabulary).

The SVC backend implements Linear Support Vector Classification. It is well suited for multiclass (but not multilabel) classification, for example classifying documents with the Dewey Decimal Classification or the 20 Newsgroups classification. It requires relatively little training data, and is suitable for classifications of up to around 10,000 classes. See the SVC wiki page for more information.

This release also upgrades many dependencies, which enables all Annif backends to run on Python 3.9 (previously nn_ensemble backend was available only for 3.6-3.8). The Docker image uses now Python 3.8 instead of 3.7.

Note that nn_ensemble models are not compatible across Python versions: e.g. a model trained on Python 3.7 can be used only on Python 3.7. Training the nn_ensemble models shows a CustomMaskWarning, but it is harmless (caused by a TensorFlow bug) and can be ignored.

Due to the update of scikit-learn, using TFIDF, MLLM or Omikuji models trained on older Annif versions will show warnings about the TfidfVectorizer. To the best of our knowledge, these are harmless and can be ignored. You have to retrain the models to get rid of the warnings.

This release includes also many minor improvements and bug fixes.

New features:

486 New SVC (support vector classification) backend using scikit-learn 439/#461 YAKE backend 490/#494 Make --version option show Annif version

Improvements:

488 Add support for ngram setting in omikuji backend

Maintenance:

499 Update dependencies v0.53 487 Upgrade scikit-learn to 0.24.2 498 Update Dockerfile

Bug fixes:

484/#495 Show error when training MLLM on empty corpus 489 Add Codecov Action to GH workflow for uploading reports 491 Raise NotSupportedException for attempt to train YAKE 497 Remove execute permissions of some files

Files

NatLibFi/Annif-v0.53.0.zip

Files (867.7 kB)

Name Size Download all
md5:2c420b0bb983f9d1ca0ba08042496963
867.7 kB Preview Download

Additional details

Related works