Software Open Access
Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.
One new model is released as part of the TrOCR implementation:
TrOCRForCausalLM, in PyTorch. It comes along a new
VisionEncoderDecoderModel class, which allows to mix-and-match any vision Transformer encoder with any text Transformer as decoder, similar to the existing
The TrOCR model was proposed in TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
The TrOCR model consists of an image transformer encoder and an autoregressive text transformer to perform optical character recognition in an end-to-end manner.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=trocrSEW & SEW-D
SEW and SEW-D (Squeezed and Efficient Wav2Vec) were proposed in Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
SEW and SEW-D models use a Wav2Vec-style feature encoder and introduce temporal downsampling to reduce the length of the transformer encoder. SEW-D additionally replaces the transformer encoder with a DeBERTa one. Both models achieve significant inference speedups without sacrificing the speech recognition quality.
DistilHuBERT was proposed in DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT, by Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee.
DistilHuBERT is a distilled version of the HuBERT model. Using only two transformer layers, the model scores competitively on the SUPERB benchmark tasks.
Compatible checkpoint is available on the Hub: https://huggingface.co/ntu-spml/distilhubertTensorFlow improvements
Several bug fixes and UX improvements for TensorFlowKeras callback
Introduction of a Keras callback to push to the hub each epoch, or after a given number of steps:
The encoder-decoder framework is now available in TensorFlow, allowing mixing and matching different encoders and decoders together into a single encoder-decoder architecture!
Besides this, the
EncoderDecoderModel classes have been updated to work similar to models like BART and T5. From now on, users don't need to pass
decoder_input_ids themselves anymore to the model. Instead, they will be created automatically based on the
labels (namely by shifting them one position to the right, replacing -100 by the
pad_token_id and prepending the
decoder_start_token_id). Note that this may result in training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0 that set the
To make it easier to extend the Transformers library, every Auto class a new
register method, that allows you to register your own custom models, configurations or tokenizers. See more in the documentation
run_glue.py] missing requirements
sklearnby @stas00 in https://github.com/huggingface/transformers/pull/13768
PreTrainedModel.frameworkattribute by @StellaAthena in https://github.com/huggingface/transformers/pull/13817
find_unused_parametersin Trainer when gradient checkpointing is enabled by @patrickvonplaten in https://github.com/huggingface/transformers/pull/13961
pad_to_multiple_ofby @affjljoo3581 in https://github.com/huggingface/transformers/pull/13949
hf-internaltesting ... by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14008
modeling_speech_to_textby @mishig25 in https://github.com/huggingface/transformers/pull/14044
to_tensor()in TF inline example by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14140