Planned intervention: On Thursday March 28th 07:00 UTC Zenodo will be unavailable for up to 5 minutes to perform a database upgrade.

There is a newer version of the record available.

Published October 20, 2022 | Version v3.4.2
Software Open

explosion/spaCy: v3.4.2: Latin and Luganda support, Python 3.11 wheels and more

Description

✨ New features and improvements

  • NEW: Luganda language support (#10847).
  • NEW: Latin language support (#11349).
  • NEW: spacy.ConsoleLogger.v2 optionally saves training logs to JSONL (#11214).
  • NEW: New operators for the DependencyMatcher to include matching parents or children to the left or the right of the node (#10371).
  • Prebuilt Python 3.11 wheels are now available for all spaCy dependencies distributed by @explosion.
  • Support pydantic v1.10 and mypy 0.980+, drop mypy support for Python 3.6 (#11546, #11635).
  • Support CuPy v11 and add extras for cuda11x and cuda-autodetect (using cupy-wheel) (#11279).
  • Support custom attributes for tokens and spans in Doc.to_json() and Doc.from_json() (#11125).
  • Make the enable and disable options for spacy.load() more consistent (#11459).
  • Allow a single string argument for disable/enclude/exclude for spacy.load() (#11406).
  • New --url flag for spacy info to print the direct download URL for a pipeline (#11175).
  • Add a check for missing requirements in the spacy project CLI (#11226).
  • Add a Levenshtein distance function (#11418).
  • Improvements to the spacy debug data CLI for spancat data (#11504).
  • Allow overriding spacy_version in spacy package metadata (#11552).
  • Improve the error message when using the wrong command for spacy project assets (#11458).
  • Ensure parent directories are created when storing the results of the spacy pretrain command (#11210).
  • Extend support to newer versions of natto-py for the ko extra (#11222).
📦 Trained pipelines updates

This release includes updated English pipelines for spaCy v3.4 with improved NER performance. The updates in en_core_web_* v3.4.1 address issues related to training from data with partial named entity annotation, which led to lower NER recall in English pipeline versions v3.0.0–v3.4.0. In particular, entities that appear in the sections of the OntoNotes training data without NER annotation were not predicted consistently by the earlier pipeline versions, such as names and places that are frequent in the Biblical sections, e.g., "David" and "Egypt" (see #7493).

Use spacy download to update your English pipelines to the newest version. If you'd prefer to keep using an earlier version, you can specify the version directly with e.g. spacy download -d en_core_web_sm-3.4.0. You can check that you are using the new version (v3.4.1) with spacy validate:

NAME                     SPACY            VERSION
en_core_web_md           >=3.4.0,<3.5.0   3.4.1     ✔
🔴 Bug fixes
  • #11275: Fix Dutch noun chunks to skip overlapping spans.
  • #11276: Fix regex invalid escape sequences.
  • #11312: Better handling of unexpected types in SetPredicate.
  • #11460: Fix config validation failures caused by NVTX pipeline wrappers.
  • #11506: Avoid unwanted side effects in Doc.__init__.
  • #11540: Preserve missing entity annotation in augmenters.
  • #11592: Fix issues with DVC commands.
  • #11631: Fix initialization for pymorphy2_lookup lemmatizer mode for Russian and Ukrainian.
⚠️ Backwards incompatibilities
  • If you're using a custom component that does not return a Doc type, an error will now be raised (#11424).
  • If you're using a dot in a factory name, an error is raised as this is not supported (#11336).
📖 Documentation and examples 👥 Contributors

@adrianeboyd, @bdura, @danieldk, @diyclassics, @DSLituiev, @GabrielePicco, @honnibal, @ines, @JulesBelveze, @kadarakos, @ljvmiranda921, @ninjalu, @pmbaumgartner, @polm, @radandreicristian, @richardpaulhudson, @rmitsch, @shadeMe, @stefawolf, @svlandeg, @thomashacker, @tobiusaolo, @tzussman , @yasufumy

Files

explosion/spaCy-v3.4.2.zip

Files (11.1 MB)

Name Size Download all
md5:3ffaf729ff140a2d8f8a01d036534cb9
11.1 MB Preview Download

Additional details

Related works