There is a newer version of the record available.

Published April 23, 2021 | Version v3.0.6
Software Open

explosion/spaCy: v3.0.6: assemble CLI, Matcher alignments, training from streamed corpora and many bug fixes

Description

✨ New features and improvements

  • New assemble CLI command for assembling a pipeline from a config without training.
  • Add support for match alignments in the Matcher to align matched tokens with matcher patterns.
  • Add support for training from streamed corpora.
  • Add support for W&B data and model checkpoint logging and versioning in spacy.WandbLogger.v2.
  • Extend Scorer.score_spans to support overlapping and unlabeled spans.
  • Update debug data for new v3 components.
  • Improve language data for Italian.
  • Various improvements to error handling and UX.
🔴 Bug fixes
  • Fix issue #7408: Add vocab kwarg to spacy.load.
  • Fix issue #7419: Exclude user hooks in displacy conversion.
  • Fix issue #7421: Update --code usage in CLI commands.
  • Fix issue #7424: Preserve sent starts on retokenization without parse.
  • Fix issue #7440: Fix pymorphy2 lookup lemmatizer.
  • Fix issue #7471: Improve warnings related to listening components.
  • Fix issue #7488: Fix upstream check in pretraining.
  • Fix issue #7489: Support callbacks entry points.
  • Fix issue #7497: Merge doc.spans in Doc.from_docs().
  • Fix issue #7528: Preserve user data for DependencyMatcher on spans.
  • Fix issue #7557: Fix __add__ method for PRFScore.
  • Fix issue #7574: Fix conversion of custom extension data in Span.as_doc and Doc.from_docs.
  • Fix issue #7620: Fix replace_listeners in configs.
  • Fix issue #7626: Fix vectors data on GPU.
  • Fix issue #7630: Update NEL for entities crossing sentence boundaries.
  • Fix issue #7631: Fix parser sourcing in NER converter.
  • Fix issue #7642: Fix handling of hyphen string value in config files.
  • Fix issue #7655: Fix sent starts when converting from v2 JSON training format.
  • Fix issue #7674: Fix handling of unknown tokens in StaticVectors.
  • Fix issue #7690: Fix pickling of Lemmatizer.
  • Fix issue #7749: Update Tokenizer.explain for special cases in v3.
  • Fix issue #7755: Fix config parsing of ints/strings.
  • Fix issue #7836: Fix tokenizer cache flushing.
  • Fix issue #7847: Fix handling of boolean values in Example.from_dict for sent starts.
📖 Documentation and examples
  • Add documentation for legacy functions and architectures.
  • Add documentation for pretrained pipeline design.
  • Add more details about pipe and multiprocessing.
  • Fix various typos and inconsistencies.
👥 Contributors

Thanks to @alvaroabascar, @armsp, @AyushExel, @BramVanroy, @broaddeep, @bryant1410, @bsweileh, @dpalmasan, @Findus23, @graue70, @jaidevd, @koaning, @langdonholmes, @m0canu1, @meghanabhange, @paoloq, @plison, @richardpaulhudson, @SamEdwardes, @Stannislav for the pull requests and contributions!

Files

explosion/spaCy-v3.0.6.zip

Files (10.0 MB)

Name Size Download all
md5:649b6bcb7bb22c8f4b1d23e233fc16e8
10.0 MB Preview Download

Additional details

Related works