explosion/spaCy: v3.4.0: Updated types, speed improvements and pipelines for Croatian
Creators
- Ines Montani1
- Matthew Honnibal1
- Matthew Honnibal1
- Sofie Van Landeghem2
- Adriane Boyd
- Henning Peters
- Paul O'Leary McCann3
- Jim Geovedi
- Jim O'Regan
- Maxim Samsonov
- Duygu Altinok4
- GyΓΆrgy Orosz5
- DaniΓ«l de Kok6
- SΓΈren Lind Kristiansen
- Lj Miranda6
- Explosion Bot6
- Roman7
- Peter Baumgartner6
- Leander Fiedler8
- Richard Hudson
- Madeesh Kannan
- GrΓ©gory Howard
- Edward6
- Wannaphong Phatthiyaphaibun9
- Yohei Tamura
- Sam Bozek
- murat
- Ryn Daniels
- Flusskind
- 1. Founder @explosion
- 2. Explosion & OxyKodit
- 3. Cotonoha
- 4. @deepgram
- 5. LogMeIn, Meltwater
- 6. @explosion
- 7. @kouchtv
- 8. Nord/LB
- 9. @PyThaiNLP
Description
β¨ New features and improvements
- Support for mypy 0.950+ and pydantic v1.9 (#10786).
- Prebuilt linux aarch64 wheels are now available for all spaCy dependencies distributed by @explosion.
- Min/max
{n,m}
operator forMatcher
patterns (#10981). - Language updates:
- Improve tokenization for Cyrillic combining diacritics (#10837).
- Improve English tokenizer exceptions for contractions with this/that/these/those (#10873).
- Improved speed of vector lookups (#10992).
- For the parser, use C
saxpy
/sgemm
provided by theOps
implementation in order to use Accelerate throughthinc-apple-ops
(#10773). - Improved speed of
Example.get_aligned_parse
andExample.get_aligned
(#10952). - Improved speed of
StringStore
lookups (#10938). - Updated
spacy project clone
to try bothmain
andmaster
branches by default (#10843). - Added confidence threshold for named entity linker (#11016).
- Improved handling of Typer optional default values for
init_config_cli
(#10788). - Added cycle detection in parser projectivization methods (#10877).
- Added counts for NER labels in
debug data
(#10960). - Support for adding NVTX ranges to
TrainablePipe
components (#10965). - Support env variable
SPACY_NUM_BUILD_JOBS
to specify the number of build jobs to run in parallel withpip
(#11073).
We have added new pipelines for Croatian that use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
hr_core_news_sm |
96.6 | 77.5 | 76.1 |
hr_core_news_md |
97.3 | 80.1 | 81.8 |
hr_core_news_lg |
97.5 | 80.4 | 83.0 |
π Special thanks to @gtoffoli for help with the new pipelines!
The English pipelines have new word vectors:
Package | Model Version | TAG | Parser LAS | NER F |
---|---|---|---|---|
en_core_news_md |
v3.3.0 | 97.3 | 90.1 | 84.6 |
en_core_news_md |
v3.4.0 | 97.2 | 90.3 | 85.5 |
en_core_news_lg |
v3.3.0 | 97.4 | 90.1 | 85.3 |
en_core_news_lg |
v3.4.0 | 97.3 | 90.2 | 85.6 |
All CNN pipelines have been extended to add whitespace augmentation.
π΄ Bug fixes- Fix issue #10960: Support hyphens in NER labels.
- Fix issue #10994: Fix horizontal spacing for spans in displaCy.
- Fix issue #11013: Check for any token with a vector in
Doc.has_vector
, distinguish 0-vectors and missing vectors insimilarity
warnings. - Fix issue #11056: Don't use
get_array_module
intextcat
. - Fix issue #11092: Fix vertical alignment for spans in displaCy.
Doc.has_vector
now matchesToken.has_vector
andSpan.has_vector
: it returnsTrue
if at least one token in the doc has a vector rather than checking only whether the vocab contains vectors.
- spaCy universe additions:
- Aim-spacy: An Aim-based spaCy experiment tracker.
- Asent: Fast, flexible and transparent sentiment analysis.
- spaCy fishing: Named entity disambiguation and linking on Wikidata in spaCy with Entity-Fishing.
- spacy-report: Generates interactive reports for spaCy models.
@adrianeboyd, @danieldk, @ericholscher, @gorarakelyan, @honnibal, @ines, @jademlc, @kadarakos, @KennethEnevoldsen, @koaning, @Lucaterre, @maxTarlov, @philipvollet, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @sadovnychyi, @shadeMe, @shen-qin, @single-fingal, @svlandeg, @victorialslocum, @Zackere
Files
explosion/spaCy-v3.4.0.zip
Files
(11.0 MB)
Name | Size | Download all |
---|---|---|
md5:bbb77b802250927d792f87248dd48020
|
11.0 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v3.4.0 (URL)