Software Open Access
Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.
The Nyströmformer model was proposed in Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.
The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformerREALM
The REALM model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
It's a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realmViTMAE
The ViTMAE model was proposed in Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_maeViLT
The ViLT model was proposed in ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision by Wonjae Kim, Bokyung Son, Ildoo Kim.
ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=viltSwin Transformer
The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swinYOSO
The YOSO model was proposed in You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yosoAdd model like
To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run
transformers-cli add-new-model-like and fill the questionnaire!
New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models. Finally, an image captioning example in Flax gets added to the library.
Adding support for long files on
automatic-speech-recognition (ASR) as well as supporting audio models with LM which increases the WER on many tasks See the blogpost.
Also continuously increasing homogeneity in arguments, framework support on all pipelines.
image-classificationpipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15030
qapipelines. by @Narsil in https://github.com/huggingface/transformers/pull/14225
The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.
ElectraForCausalLM-> Enable Electra encoder-decoder model by @stancld in https://github.com/huggingface/transformers/pull/14729
The vision encoder decoder model can now be used in TensorFlow.
CLIP gets ported to TensorFlow.
RoFormer gets ported to Flax.
--optimby @manuelciosici in https://github.com/huggingface/transformers/pull/14744
The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on how to write good docstrings.
PreTrainedTokenizerFast.decoderby @aphedges in https://github.com/huggingface/transformers/pull/14691
run_namein MLflowCallback by @YangDong2002 in https://github.com/huggingface/transformers/pull/14894
num_return_sequencessupport for text2text generation. by @Narsil in https://github.com/huggingface/transformers/pull/14988
tokenizersupgrade. by @Narsil in https://github.com/huggingface/transformers/pull/14941
_ms. by @Narsil in https://github.com/huggingface/transformers/pull/15029
batch_sizearg (like others enabled everywhere). by @Narsil in https://github.com/huggingface/transformers/pull/15027
with torch.no_grad()to DistilBERT integration test forward pass by @jaketae in https://github.com/huggingface/transformers/pull/14979
tokenize_chinese_charsarg by @SaulLu in https://github.com/huggingface/transformers/pull/15158
np.ndarrayoptional arguments by @gante in https://github.com/huggingface/transformers/pull/15074
is_ctcneeds to be updated to `self.type == "ctc". by @Narsil in https://github.com/huggingface/transformers/pull/15194
from_encoder_decoder_pretrainedin encoder-decoder models by @jsnfly in https://github.com/huggingface/transformers/pull/15056
The community contributors below have significantly contributed to the v4.16.0 release. Thank you!