explosion/spaCy: v2.3.3: Alpha support for Macedonian and Sanskrit, updates for many languages and bug fixes
Creators
- Ines Montani1
- Matthew Honnibal1
- Matthew Honnibal1
- Sofie Van Landeghem2
- Adriane Boyd
- Henning Peters
- Maxim Samsonov
- Jim Geovedi
- Jim Regan
- György Orosz3
- Paul O'Leary McCann4
- Søren Lind Kristiansen
- Duygu Altinok5
- Roman6
- Leander Fiedler
- Grégory Howard
- Wannaphong Phatthiyaphaibun7
- Explosion Bot8
- Sam Bozek
- Mark Amery
- Yohei Tamura9
- Björn Böing10
- Pradeep Kumar Tippa
- Leif Uwe Vogelsang
- Ramanan Balakrishnan11
- Vadim Mazaev
- GregDubbin
- jeannefukumaru
- Jens Dahl Møllerhøj12
- Avadh Patel13
- 1. Founder @explosion
- 2. Explosion & OxyKodit
- 3. LogMeIn, Meltwater
- 4. Cotonoha
- 5. German Autolabs
- 6. @kouchtv
- 7. @PyThaiNLP
- 8. @explosion
- 9. PKSHA Technology
- 10. @codecentric
- 11. @Semantics3
- 12. BotXO
- 13. SUNY Binghamton - Computer Science
Description
✨ New features and improvements
- NEW: Add alpha support for Macedonian and Sanskrit.
- Update language data for Croatian, Czech, English, Hebrew, Hindi, Indonesian, Swedish, Thai and Turkish.
- Add support for aarch64 and ppc64le on linux with binary packages available on conda-forge.
- Fix issue #5610: Make sure
sys.argv
exists. - Fix issue #5643: Add
ent_id_
to strings serialized withDoc
. - Fix issue #5727: Clarify warning for misaligned BILUO tags.
- Fix issue #5768: Improve tag map initialization and updating.
- Fix issue #5794: Improve warnings around normalization tables.
- Fix issue #5796: Update invalid tag maps.
- Fix issue #5799: Remove hard-coded GPU ID from
pretrain
. - Fix issue #5802: Mark Japanese documents as tagged.
- Fix issue #5823: Fix typo in unit tests.
- Fix issue #5838: Fix
EntityRenderer
to support break lines (after last entity). - Fix issue #5843: Prefer earlier spans in
EntityRuler
. - Fix issue #5849: Allow
Doc.char_span
to snap to token boundaries. - Fix issue #5853: Fix span boundary handling in Spanish noun chunks.
- Fix issue #5861: Add
Span
index boundary checks. - Fix issue #5904: Fix typos in comments.
- Fix issue #5910: Update default sentencizer characters for Armenian, Greek and Arabic.
- Fix issue #6014: Fix off-by-one error for best iteration calculation.
- Fix issue #6112: Fix overlapping German noun chunks.
- Fix issue #6148: Identify final
Matcher
pattern node by quantifier. - Fix issue #6164: Reorder so tag map is replaced only if a custom file is provided.
- Fix issue #6218: Reproducibility for
TextCategorizer
andTok2Vec
. - Fix issue #6219: Add re-enabled pipe names back to the meta before serializing.
- Fix issue #6300: Fix
on_match
callback and exclude empty match lists from results forDependencyMatcher
. - Fix issue #6347: Memory leak issues with
beam_parse
(requiresthinc>=7.4.3
). - Fix issue #6373: Bugfix textcat reproducibility on GPU (requires
thinc>=7.4.3
). - Fix issue #6405: Add all vectors to vocab before pruning.
- Fix issue #6413: Use int8_t instead of char in
Matcher
.
Thanks to @abchapman93, @baranitharan2020, @bittlingmayer, @bjascob, @borijang, @BramVanroy, @chopeen, @danielvasic, @delzac, @DuyguA, @erip, @florijanstamenkovic, @graue70, @hiroshi-matsuda-rit, @holubvl3, @idoshr, @jgutix, @KKsharma99, @leyendecker, @lizhe2004, @MartinoMensio, @nipunsadvilkar, @Nuccy90, @oculusrepairo, @rahul1990gupta, @rasyidf, @robertsipek, @SamEdwardes, @snsten, @solarmist, @Stannislav, @tamuhey, @tilusnet, @vha14, @wannaphong, @zaibacu for the pull requests and contributions.
Files
explosion/spaCy-v2.3.3.zip
Files
(6.2 MB)
Name | Size | Download all |
---|---|---|
md5:2bec4b2fb5cdcf666d273ac0208b140b
|
6.2 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v2.3.3 (URL)