Published April 4, 2018
| Version v2.0.11
Software
Open
explosion/spaCy: v2.0.11: Alpha Vietnamese support, fixes to vectors, improved errors and more
Creators
- Matthew Honnibal1
- Ines Montani1
- Matthew Honnibal1
- Henning Peters2
- Maxim Samsonov
- Jim Geovedi
- Jim Regan
- GyΓΆrgy Orosz3
- SΓΈren Lind Kristiansen
- Roman4
- Duygu Altinok5
- Paul O'Leary McCann
- GrΓ©gory Howard
- Alex6
- Kit7
- Sam Bozek
- Explosion Bot8
- Mark Amery
- Leif Uwe Vogelsang
- GregDubbin
- Vadim Mazaev
- Pradeep Kumar Tippa9
- wbwseeker
- Wannaphong Phatthiyaphaibun10
- Magnus Burton
- mpuels
- Yubing Dong (Tom)11
- thomasO
- Ramanan Balakrishnan12
- Avadh Patel13
- 1. Founder @explosion
- 2. RiseML
- 3. LogMeIn, Meltwater
- 4. LinguaLeo
- 5. 4Com
- 6. NSU
- 7. Founder @talecamp
- 8. @explosion
- 9. @edgeverve
- 10. @PyThaiNLP
- 11. Quora
- 12. @Semantics3
- 13. SUNY Binghamton - Computer Science
Description
π Help us improve spaCy and take the User Survey 2018! β¨ New features and improvements
- NEW: Alpha Vietnamese support with tokenization via Pyvi.
- NEW: Improved system for error messages and warnings. Errors now have unique error codes and are referenced in one place, and all unspecified
assert
s have been replaced with descriptive errors. See #2163 for implementation details, and let us know if you have any suggestions for errors and warnings in #2164! - Improve language data for Polish.
- Tidy up dependencies and drop
six
,html5lib
,ftfy
andrequests
. - Improve efficiency (and potentially accuracy) of beam-search training, by randomly using greedy updates for some sentences. This can be controlled by changing the
beam_update_prob
entry innlp.parser.cfg
. The default value is 0.5, so 50% of beam updates will be done as greedy updates.
- Fix issue #1554, #1752, #2159: Fix
Token.ent_iob
afterDoc.merge()
, and ensure consistency inDoc.ents
. - Fix issue #1660: Fix loading of multiple vector models.
- Fix issue #1967: Allow entity types with dashes.
- Fix issue #2032: Fix accidentally quadratic runtime in
Vocab.set_vector
. - Fix issue #2050: Correct mistakes in Italian lemmatizer data.
- Fix issue #2073: Make
Token.set_extension
work as expected. - Fix issue #2100, #2151, #2181: Drop
six
andhtml5lib
and prevent dependency conflict with TensorFlow / Keras. - Fix issue #2101: Improve error message if token text is empty string.
- Fix issue #2121: Fix
Language.to_bytes
and pickling in Thinc. - Fix issue #2156: Fix hashtag example in
Matcher
docs. - Fix issue #2177: Don't raise error in
set_extension
ifgetter
andsetter
are specified or ifdefault=None
, and add error ifsetter
is specified with nogetter
.
- Add example for TensorBoard's standalone embedding projector.
- Improve example for training a new entity type.
- Add formal
CITATION
for assigning a DOI via Zenodo.
Thanks to @jimregan, @justindujardin, @trungtv, @katrinleinweber and @skrcode for the pull requests and contributions.
Files
explosion/spaCy-v2.0.11.zip
Files
(19.0 MB)
Name | Size | Download all |
---|---|---|
md5:26975e7cbd584f8795f42d6bbc70ed69
|
19.0 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/explosion/spaCy/tree/v2.0.11 (URL)