explosion/spaCy: v3.4.2: Latin and Luganda support, Python 3.11 wheels and more
Creators
- Ines Montani1
- Matthew Honnibal1
- Matthew Honnibal1
- Sofie Van Landeghem2
- Adriane Boyd
- Henning Peters
- Paul O'Leary McCann3
- jim geovedi
- Jim O'Regan
- Maxim Samsonov
- Duygu Altinok4
- György Orosz5
- Daniël de Kok6
- Søren Lind Kristiansen
- Raphaël Bournhonesque
- Lj Miranda6
- Madeesh Kannan
- Peter Baumgartner6
- Edward6
- Explosion Bot6
- Richard Hudson
- Roman7
- Leander Fiedler8
- Grégory Howard
- Wannaphong Phatthiyaphaibun9
- Yohei Tamura10
- Raphael Mitsch6
- Sam Bozek
- murat
- 1. Founder @explosion
- 2. Explosion & OxyKodit
- 3. Cotonoha
- 4. @deepgram
- 5. LogMeIn, Meltwater
- 6. @explosion
- 7. @kouchtv
- 8. Nord/LB
- 9. @PyThaiNLP
- 10. @indeedeng
Description
✨ New features and improvements
- NEW: Luganda language support (#10847).
- NEW: Latin language support (#11349).
- NEW:
spacy.ConsoleLogger.v2
optionally saves training logs to JSONL (#11214). - NEW: New operators for the
DependencyMatcher
to include matching parents or children to the left or the right of the node (#10371). - Prebuilt Python 3.11 wheels are now available for all spaCy dependencies distributed by @explosion.
- Support pydantic v1.10 and mypy 0.980+, drop mypy support for Python 3.6 (#11546, #11635).
- Support CuPy v11 and add extras for
cuda11x
andcuda-autodetect
(usingcupy-wheel
) (#11279). - Support custom attributes for tokens and spans in
Doc.to_json()
andDoc.from_json()
(#11125). - Make the
enable
anddisable
options forspacy.load()
more consistent (#11459). - Allow a single string argument for
disable
/enclude
/exclude
forspacy.load()
(#11406). - New
--url
flag forspacy info
to print the direct download URL for a pipeline (#11175). - Add a check for missing requirements in the
spacy project
CLI (#11226). - Add a Levenshtein distance function (#11418).
- Improvements to the
spacy debug data
CLI for spancat data (#11504). - Allow overriding
spacy_version
inspacy package
metadata (#11552). - Improve the error message when using the wrong command for
spacy project assets
(#11458). - Ensure parent directories are created when storing the results of the
spacy pretrain
command (#11210). - Extend support to newer versions of
natto-py
for theko
extra (#11222).
This release includes updated English pipelines for spaCy v3.4 with improved NER performance. The updates in en_core_web_*
v3.4.1 address issues related to training from data with partial named entity annotation, which led to lower NER recall in English pipeline versions v3.0.0–v3.4.0. In particular, entities that appear in the sections of the OntoNotes training data without NER annotation were not predicted consistently by the earlier pipeline versions, such as names and places that are frequent in the Biblical sections, e.g., "David" and "Egypt" (see #7493).
Use spacy download
to update your English pipelines to the newest version. If you'd prefer to keep using an earlier version, you can specify the version directly with e.g. spacy download -d en_core_web_sm-3.4.0
. You can check that you are using the new version (v3.4.1) with spacy validate
:
NAME SPACY VERSION
en_core_web_md >=3.4.0,<3.5.0 3.4.1 ✔
🔴 Bug fixes
- #11275: Fix Dutch noun chunks to skip overlapping spans.
- #11276: Fix regex invalid escape sequences.
- #11312: Better handling of unexpected types in
SetPredicate
. - #11460: Fix config validation failures caused by NVTX pipeline wrappers.
- #11506: Avoid unwanted side effects in
Doc.__init__
. - #11540: Preserve missing entity annotation in augmenters.
- #11592: Fix issues with DVC commands.
- #11631: Fix initialization for
pymorphy2_lookup
lemmatizer mode for Russian and Ukrainian.
- If you're using a custom component that does not return a
Doc
type, an error will now be raised (#11424). - If you're using a dot in a factory name, an error is raised as this is not supported (#11336).
- Added documentation for the new experimental coref component.
- Added Ukrainian trained pipelines to the website.
- Added documentation for the
spacy.models_and_pipes_with_nvtx_range.v1
callback. - Fix English pipeline names in v3.4 release notes.
- Various fixes to the
Example
API documentation. - Extensions and improvements to the
displacy
docs. - Fix the example command for
spacy project dvc
. - Update example code for
spacy-wordnet
. - Improve API documentation around the
initialize()
function for pipeline components. - Fix various typos and inconsistencies.
- spaCy universe additions:
- concepCy: A spaCy wrapper for ConceptNet.
- spaCy partial tagger: build a CRF tagger with a partially annotated dataset.
- Zshot: Zero and Few shot named entity & relationships recognition.
@adrianeboyd, @bdura, @danieldk, @diyclassics, @DSLituiev, @GabrielePicco, @honnibal, @ines, @JulesBelveze, @kadarakos, @ljvmiranda921, @ninjalu, @pmbaumgartner, @polm, @radandreicristian, @richardpaulhudson, @rmitsch, @shadeMe, @stefawolf, @svlandeg, @thomashacker, @tobiusaolo, @tzussman , @yasufumy
Files
explosion/spaCy-v3.4.2.zip
Files
(11.1 MB)
Name | Size | Download all |
---|---|---|
md5:3ffaf729ff140a2d8f8a01d036534cb9
|
11.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v3.4.2 (URL)