explosion/spaCy: v2.2.3: Tokenizer.explain, Korean base support, dependency scores per label and bug fixes

doi:10.5281/zenodo.3550036

Published November 21, 2019 | Version v2.2.3

Software Open

explosion/spaCy: v2.2.3: Tokenizer.explain, Korean base support, dependency scores per label and bug fixes

1. Founder @explosion
2. OxyKodit
3. LogMeIn, Meltwater
4. German Autolabs
5. @kouchtv
6. @PyThaiNLP
7. @explosion
8. @Semantics3
9. mollerhoj
10. SUNY Binghamton - Computer Science

✨ New features and improvements

NEW: Tokenizer.explain method to see which rule or pattern was matched.

tok_exp = nlp.tokenizer.explain("(don't)")
assert [t[0] for t in tok_exp] == ["PREFIX", "SPECIAL-1", "SPECIAL-2", "SUFFIX"]
assert [t[1] for t in tok_exp] == ["(", "do", "n't", ")"]

NEW: Official Python 3.8 wheels for spaCy and its dependencies.
Base language support for Korean.
Add Scorer.las_per_type (labelled depdencency scores per label).
Rework Chinese language initialization and tokenization
Improve language data for Luxembourgish.

🔴 Bug fixes

Fix issue #4573, #4645: Improve tokenizer usage docs.
Fix issue #4575: Add error in debug-data if no dev docs are available.
Fix issue #4582: Make as_tuples=True in Language.pipe work with multiprocessing.
Fix issue #4590: Correctly call on_match in DependencyMatcher.
Fix issue #4593: Build wheels for Python 3.8.
Fix issue #4604: Fix realloc in Retokenizer.split.
Fix issue #4656: Fix conllu2json converter when -n > 1.
Fix issue #4662: Fix Language.evaluate for components without .pipe method.
Fix issue #4670: Ensure EntityRuler is deserialized correctly from disk.
Fix issue #4680: Raise error if non-string labels are added to Tagger or TextCategorizer.
Fix issue #4691: Make Vectors.find return keys in correct order.

📖 Documentation and examples

Fix various typos and inconsistencies.

👥 Contributors

Thanks to @yash1994, @walterhenry, @prilopes, @f11r, @questoph, @erip, @richardpaulhudson and @GuiGel for the pull requests and contributions.

Files

explosion/spaCy-v2.2.3.zip

Files (5.8 MB)

Name	Size	Download all
explosion/spaCy-v2.2.3.zip md5:4a60c10af9150ff3a0da33eb863fe69f	5.8 MB	Preview Download

Additional details

Is supplement to: https://github.com/explosion/spaCy/tree/v2.2.3 (URL)

	All versions	This version
Views	22,329	74
Downloads	693	4
Data volume	15.0 GB	23.0 MB

explosion/spaCy: v2.2.3: Tokenizer.explain, Korean base support, dependency scores per label and bug fixes

Creators

Description

Files

explosion/spaCy-v2.2.3.zip

Files (5.8 MB)

Additional details

Related works