Published March 10, 2023
| Version v3.5.1
Software
Open
explosion/spaCy: v3.5.1: spancat for multi-class labeling, fixes for textcat+transformers and more
Creators
- Ines Montani1
- Matthew Honnibal1
- Matthew Honnibal1
- Sofie Van Landeghem2
- Adriane Boyd
- Henning Peters
- Paul O'Leary McCann3
- jim geovedi
- Jim O'Regan
- Maxim Samsonov
- György Orosz4
- Daniël de Kok5
- Marcus Blättermann6
- Duygu Altinok7
- Søren Lind Kristiansen
- Madeesh Kannan
- Raphael Mitsch5
- Raphaël Bournhonesque
- Edward5
- Lj Miranda5
- Peter Baumgartner5
- Richard Hudson
- Explosion Bot5
- Roman8
- Leander Fiedler9
- Ryn Daniels
- Wannaphong Phatthiyaphaibun10
- Grégory Howard
- Yohei Tamura11
- 1. Founder @explosion
- 2. Explosion & OxyKodit
- 3. Cotonoha
- 4. LogMeIn, Meltwater
- 5. @explosion
- 6. essenmitsosse
- 7. @deepgram
- 8. @kouchtv
- 9. Nord/LB
- 10. @PyThaiNLP
- 11. @indeedeng
Description
💥 We'd love to hear more about your experience with spaCy! Take our survey here.
✨ New features and improvements- NEW:
spancat_singlelabel
pipeline component for multi-class and non-overlapping span classification. Thespancat_singlelabel
component predicts at most one label for each suggested span and adds a new settingallow_overlap
to restrict the output to non-overlapping spans (#11365). - Extend to mypy v1.0 (#12245).
- Use
transformer
+ CNN for efficient GPUtextcat
withspacy init config
(#11900). - Support trainable lemmatizer in
spacy debug data
(#11419). - Add new operators to dependency matcher for left/right immediate child/parent nodes (
>+
,>-
,<+
,<-
) (#12334). - Add
spacy.PlainTextCorpusReader.v1
for plain text input (#12122). - Add
alignment_mode
andspan_id
toSpan.char_span()
(#12145, #12196). - Use string formatting types in logging calls (#12215).
- #12017: Improve speed for
top_k>1
in trainable lemmatizer. - #12048: Make
test_cli_find_threshold()
test more robust. - #12227: Fix return type of
registry.find()
. - #12272: Fix speed regression for
Matcher
patterns with extension attributes. - #12287: Add
grc
to languages with lexeme norms inspacy-lookups-data
. - #12320: Make generation of empty
KnowledgeBase
instances configurable. - #12343: Fix error message for displacy
auto_select_port
. - #12347: Fix length check for knowledge base in entity linker, add
InMemoryLookupKB.is_empty
. - #12365: Fix types for
Lexeme.orth
andLexeme.lower
. - #12366: Raise error for non-default vectors with
PretrainVectors
. - #12368: Partially address pending deprecation of
pkg_resources
. - Various improvements and fixes for the test suite (#12148, #12157, #12210, #12303, #12372).
- Many website updates to improve accessibility.
- Various documentation corrections and updates.
- New projects:
- Span labeling datasets
- Comparing embedding layers in spaCy from the technical report Multi hash embeddings in spaCy
@adrianeboyd, @andyjessen, @danieldk, @essenmitsosse, @honnibal, @ines, @itssimon, @kadarakos, @kwhumphreys, @ljvmiranda921, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @shadeMe, @svlandeg, @tanloong, @thomashacker, @victorialslocum
Files
explosion/spaCy-v3.5.1.zip
Files
(10.5 MB)
Name | Size | Download all |
---|---|---|
md5:d12eb8e1ed927b4b50b92d641f445d77
|
10.5 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v3.5.1 (URL)