huggingface/transformers: ELECTRA, Bad word filters, bugfixes & improvements
Creators
- Thomas Wolf1
- Lysandre Debut2
- Julien Chaumond2
- Victor SANH1
- Patrick von Platen
- Aymeric Augustin3
- Rémi Louf
- Funtowicz Morgan4
- Stefan Schweter
- Denis
- Sam Shleifer5
- erenup
- Manuel Romero
- Matt
- Piero Molino
- Grégory Châtel6
- Bram Vanroy7
- Tim Rault1
- Gunnlaugur Thor Briem8
- Julien Plu9
- Anthony MOI2
- Malte Pietsch10
- Catalin Voss11
- Bilal Khan
- Fei Wang12
- Martin Malmsten
- Louis Martin
- Davide Fiocco
- Clement1
- Ananya Harsh Jha
- 1. @huggingface
- 2. Hugging Face
- 3. @canalplus
- 4. HuggingFace
- 5. Huggingface
- 6. DisAItek & Intel AI Innovators
- 7. @UGent
- 8. Qlik
- 9. Leboncoin Lab
- 10. deepset
- 11. Stanford University
- 12. University of Southern California
Description
ELECTRA Model (@LysandreJik)
ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.
This release comes with 6 ELECTRA checkpoints:
google/electra-small-discriminator
google/electra-small-generator
google/electra-base-discriminator
google/electra-base-generator
google/electra-large-discriminator
google/electra-large-generator
Related:
- Paper
- Official code
- Models available in the community models
- Docs
Thanks to the author @clarkkev for his help during the implementation.
Bad word filters ingenerate
(@patrickvonplaten)
The generate
method now has a bad word filter.
- Decoder input ids are not necessary for T5 training anymore (@patrickvonplaten)
- Update encoder and decoder on set_input_embedding for BART (@sshleifer)
- Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (@ethanjperez)
- Clean summarization and translation example testing files for T5 and Bart (@patrickvonplaten)
- Cleaner examples (@julien-c)
- Extensive testing for T5 model (@patrickvonplaten)
- Force models outputs to always have batch_size as their first dim (@patrickvonplaten)
- Fix for continuing training in some scripts (@xeb)
- Resizing embedding matrix before sending it to the optimizer (@ngarneau)
- BertJapaneseTokenizer accept options for mecab (@tamuhey)
- Speed up GELU computation with torch.jit (@mryab)
- fix argument order of update_mems fn in TF version (@patrickvonplaten, @dmytyar)
- Split generate test function into beam search, no beam search (@patrickvonplaten)
Files
huggingface/transformers-v2.8.0.zip
Files
(3.4 MB)
Name | Size | Download all |
---|---|---|
md5:a809f09fdaceadc5e1191492adbd4078
|
3.4 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/huggingface/transformers/tree/v2.8.0 (URL)