There is a newer version of the record available.

Published December 7, 2021 | Version v3.2.1
Software Open

explosion/spaCy: v3.2.1: doc_cleaner component, new Matcher attributes, bug fixes and more

Description

✨ New features and improvements
  • NEW: doc_cleaner component for removing doc.tensor,doc._._trf_data or other Doc attributes at the end of the pipeline to reduce size of output docs.
  • NEW: ENT_ID and ENT_KB_ID to Matcher pattern attributes.
  • Support kb_id for entities in displaCy from Doc input.
  • Add Span.sents property for spans spanning over more than one sentence.
  • Add EntityRuler.remove to remove patterns by id.
  • Make the Tagger neg_prefix configurable.
  • Use Language.pipe in Language.evaluate for more efficient processing.
  • Test suite updates: move regression tests into core test modules with pytest markers for issue numbers, extend tests for languages with alpha support.
🔴 Bug fixes
  • Fix issue #9638: Make JsonlCorpus path optional again.
  • Fix issue #9654: Fix spancat for empty docs and zero suggestions.
  • Fix issue #9658: Improve error message for incorrect .jsonl paths in EntityRuler.
  • Fix issue #9674: Fix language-specific factory handling in package CLI.
  • Fix issue #9694: Convert labels to strings for README in package CLI.
  • Fix issue #9697: Exclude strings from source vector checks.
  • Fix issue #9701: Allow Scorer.score_spans to handle predicted docs with missing annotation.
  • Fix issue #9722: Initialize parser from reference parse rather than aligned example.
  • Fix issue #9764: Set annotations more efficiently in tagger and morphologizer.
📖 Documentation and examples 👥 Contributors

@adrianeboyd, @danieldk, @DuyguA, @honnibal, @ines, @ljvmiranda921, @narayanacharya6, @nrodnova, @Pantalaymon, @polm, @richardpaulhudson, @svlandeg, @thiippal, @Vishnunkumar

Files

explosion/spaCy-v3.2.1.zip

Files (10.8 MB)

Name Size Download all
md5:88d0df7c527b86ca4a3f5af3501adf2e
10.8 MB Preview Download

Additional details

Related works