There is a newer version of this record available.

Software Open Access

explosion/spaCy: v2.0.11: Alpha Vietnamese support, fixes to vectors, improved errors and more

Matthew Honnibal; Ines Montani; Matthew Honnibal; Henning Peters; Maxim Samsonov; Jim Geovedi; Jim Regan; György Orosz; Søren Lind Kristiansen; Roman; Duygu Altinok; Paul O'Leary McCann; Grégory Howard; Alex; Kit; Sam Bozek; Explosion Bot; Mark Amery; Leif Uwe Vogelsang; GregDubbin; Vadim Mazaev; Pradeep Kumar Tippa; wbwseeker; Wannaphong Phatthiyaphaibun; Magnus Burton; mpuels; Yubing Dong (Tom); thomasO; Ramanan Balakrishnan; Avadh Patel

📊 Help us improve spaCy and take the User Survey 2018! ✨ New features and improvements

  • NEW: Alpha Vietnamese support with tokenization via Pyvi.
  • NEW: Improved system for error messages and warnings. Errors now have unique error codes and are referenced in one place, and all unspecified asserts have been replaced with descriptive errors. See #2163 for implementation details, and let us know if you have any suggestions for errors and warnings in #2164!
  • Improve language data for Polish.
  • Tidy up dependencies and drop six, html5lib, ftfy and requests.
  • Improve efficiency (and potentially accuracy) of beam-search training, by randomly using greedy updates for some sentences. This can be controlled by changing the beam_update_prob entry in nlp.parser.cfg. The default value is 0.5, so 50% of beam updates will be done as greedy updates.
🔴 Bug fixes
  • Fix issue #1554, #1752, #2159: Fix Token.ent_iob after Doc.merge(), and ensure consistency in Doc.ents.
  • Fix issue #1660: Fix loading of multiple vector models.
  • Fix issue #1967: Allow entity types with dashes.
  • Fix issue #2032: Fix accidentally quadratic runtime in Vocab.set_vector.
  • Fix issue #2050: Correct mistakes in Italian lemmatizer data.
  • Fix issue #2073: Make Token.set_extension work as expected.
  • Fix issue #2100, #2151, #2181: Drop six and html5lib and prevent dependency conflict with TensorFlow / Keras.
  • Fix issue #2101: Improve error message if token text is empty string.
  • Fix issue #2121: Fix Language.to_bytes and pickling in Thinc.
  • Fix issue #2156: Fix hashtag example in Matcher docs.
  • Fix issue #2177: Don't raise error in set_extension if getter and setter are specified or if default=None, and add error if setter is specified with no getter.
📖 Documentation and examples 👥 Contributors

Thanks to @jimregan, @justindujardin, @trungtv, @katrinleinweber and @skrcode for the pull requests and contributions.

Files (19.0 MB)
Name Size
explosion/spaCy-v2.0.11.zip
md5:26975e7cbd584f8795f42d6bbc70ed69
19.0 MB Download
131
44
views
downloads
All versions This version
Views 13165
Downloads 444
Data volume 1.1 GB75.8 MB
Unique views 11662
Unique downloads 132

Share

Cite as