explosion/spaCy: v3.0.6: assemble CLI, Matcher alignments, training from streamed corpora and many bug fixes

doi:10.5281/zenodo.4715444

Published April 23, 2021 | Version v3.0.6

Software Open

explosion/spaCy: v3.0.6: assemble CLI, Matcher alignments, training from streamed corpora and many bug fixes

1. Founder @explosion
2. Explosion & OxyKodit
3. Cotonoha
4. LogMeIn, Meltwater
5. German Autolabs
6. @kouchtv
7. @PyThaiNLP
8. PKSHA Technology
9. @explosion
10. @codecentric
11. @Semantics3

✨ New features and improvements

New assemble CLI command for assembling a pipeline from a config without training.
Add support for match alignments in the Matcher to align matched tokens with matcher patterns.
Add support for training from streamed corpora.
Add support for W&B data and model checkpoint logging and versioning in spacy.WandbLogger.v2.
Extend Scorer.score_spans to support overlapping and unlabeled spans.
Update debug data for new v3 components.
Improve language data for Italian.
Various improvements to error handling and UX.

🔴 Bug fixes

Fix issue #7408: Add vocab kwarg to spacy.load.
Fix issue #7419: Exclude user hooks in displacy conversion.
Fix issue #7421: Update --code usage in CLI commands.
Fix issue #7424: Preserve sent starts on retokenization without parse.
Fix issue #7440: Fix pymorphy2 lookup lemmatizer.
Fix issue #7471: Improve warnings related to listening components.
Fix issue #7488: Fix upstream check in pretraining.
Fix issue #7489: Support callbacks entry points.
Fix issue #7497: Merge doc.spans in Doc.from_docs().
Fix issue #7528: Preserve user data for DependencyMatcher on spans.
Fix issue #7557: Fix __add__ method for PRFScore.
Fix issue #7574: Fix conversion of custom extension data in Span.as_doc and Doc.from_docs.
Fix issue #7620: Fix replace_listeners in configs.
Fix issue #7626: Fix vectors data on GPU.
Fix issue #7630: Update NEL for entities crossing sentence boundaries.
Fix issue #7631: Fix parser sourcing in NER converter.
Fix issue #7642: Fix handling of hyphen string value in config files.
Fix issue #7655: Fix sent starts when converting from v2 JSON training format.
Fix issue #7674: Fix handling of unknown tokens in StaticVectors.
Fix issue #7690: Fix pickling of Lemmatizer.
Fix issue #7749: Update Tokenizer.explain for special cases in v3.
Fix issue #7755: Fix config parsing of ints/strings.
Fix issue #7836: Fix tokenizer cache flushing.
Fix issue #7847: Fix handling of boolean values in Example.from_dict for sent starts.

📖 Documentation and examples

Add documentation for legacy functions and architectures.
Add documentation for pretrained pipeline design.
Add more details about pipe and multiprocessing.
Fix various typos and inconsistencies.

👥 Contributors

Thanks to @alvaroabascar, @armsp, @AyushExel, @BramVanroy, @broaddeep, @bryant1410, @bsweileh, @dpalmasan, @Findus23, @graue70, @jaidevd, @koaning, @langdonholmes, @m0canu1, @meghanabhange, @paoloq, @plison, @richardpaulhudson, @SamEdwardes, @Stannislav for the pull requests and contributions!

Files

explosion/spaCy-v3.0.6.zip

Files (10.0 MB)

Name	Size	Download all
explosion/spaCy-v3.0.6.zip md5:649b6bcb7bb22c8f4b1d23e233fc16e8	10.0 MB	Preview Download

Additional details

Is supplement to: https://github.com/explosion/spaCy/tree/v3.0.6 (URL)

	All versions	This version
Views	22,763	195
Downloads	701	25
Data volume	15.1 GB	260.8 MB

explosion/spaCy: v3.0.6: assemble CLI, Matcher alignments, training from streamed corpora and many bug fixes

Creators

Description

Files

explosion/spaCy-v3.0.6.zip

Files (10.0 MB)

Additional details

Related works