explosion/spaCy: v3.6.0: New span finder component and pipelines for Slovenian
Creators
- Ines Montani1
- Matthew Honnibal1
- Matthew Honnibal1
- Adriane Boyd
- Sofie Van Landeghem2
- Henning Peters
- Paul O'Leary McCann3
- jim geovedi
- Jim O'Regan
- Maxim Samsonov
- Daniël de Kok4
- György Orosz5
- Marcus Blättermann6
- Duygu Altinok7
- Raphael Mitsch4
- Madeesh Kannan
- Søren Lind Kristiansen
- Edward
- Lj Miranda4
- Raphaël Bournhonesque
- Peter Baumgartner8
- Richard Hudson
- Explosion Bot4
- Roman9
- Leander Fiedler10
- Ryn Daniels
- kadarakos
- Wannaphong Phatthiyaphaibun11
- Schero1994
- 1. Founder @explosion
- 2. Explosion & OxyKodit
- 3. Cotonoha
- 4. @explosion
- 5. LogMeIn, Meltwater
- 6. essenmitsosse
- 7. @deepgram
- 8. RTI International
- 9. @kouchtv
- 10. Nord/LB
- 11. @PyThaiNLP
Description
- NEW:
span_finder
pipeline component to identify overlapping, unlabeled spans (#12507). - Language updates:
- Add initial support for Malay (#12602).
- Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
- Add option to return scores separately keyed by component name with
spacy evaluate --per-component
,Language.evaluate(per_component=True)
andScorer.score(per_component=True)
(#12540). - Support custom token/lexeme attribute for vectors (#12625).
- Support
spancat_singlelabel
inspacy debug data
CLI (#12749). - Typing updates for
PhraseMatcher
andSpanGroup
(#12642, #12714).
- #12569: Require that all
SpanGroup
spans come from the current doc.
We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
sl_core_news_sm |
96.9 | 82.1 | 62.9 |
sl_core_news_md |
97.6 | 84.3 | 73.5 |
sl_core_news_lg |
97.7 | 84.3 | 79.0 |
sl_core_news_trf |
99.0 | 91.7 | 90.0 |
- 🙏 Special thanks to @orglce for help with the new pipelines!
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.
The Danish pipeline da_core_news_trf
has been updated to use vesteinn/DanskBERT
with performance improvements across the board.
SpanGroup
spans are now required to be from the same doc. When initializing aSpanGroup
, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.
- Various documentation corrections and updates.
- New additions to spaCy Universe:
@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr
Files
explosion/spaCy-v3.6.0.zip
Files
(20.3 MB)
Name | Size | Download all |
---|---|---|
md5:b3b2a18aa658865ae5151260dd6d2456
|
20.3 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v3.6.0 (URL)