There is a newer version of this record available.

Software Open Access

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.

TensorFlow XLA Text Generation

The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

# Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

# The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
    f"translate English to {language}: I have four cats and three dogs."
    for language in ["German", "French", "Romanian"]
]
for input_prompt in input_prompts:
    tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
    generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
    print(tokenizer.decode(generated_text[0], skip_special_tokens=True))
  • Generate: deprecate default max_length by @gante in #18018
  • TF: GPT-J compatible with XLA generation by @gante in #17986
  • TF: T5 can now handle a padded past (i.e. XLA generation) by @gante in #17969
  • TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by @gante in #17857
  • TF: generate without tf.TensorArray by @gante in #17801
  • TF: BART compatible with XLA generation by @gante in #17479
New model additions OwlViT

The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

  • Add OWL-ViT model for zero-shot object detection by @alaradirik in #17938
  • Fix OwlViT tests by @sgugger in #18253
NLLB

The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

  • [M2M100] update conversion script by @patil-suraj in #17916
  • NLLB tokenizer by @LysandreJik in #18126
MobileViT

The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

  • add MobileViT model by @hollance in #17354
Nezha

The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

  • Nezha Pytorch implementation by @sijunhe in #17776
GroupViT

The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

  • Adding GroupViT Models by @xvjiarui in #17313
MVP

The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

  • Add MVP model by @StevenTang1998 in #17787
CodeGen

The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.

  • Add CodeGen model by @rooa in #17443
  • [CodeGen] support device_map="auto" for sharded checkpoints by @patil-suraj in #17871
UL2

The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

  • Add UL2 (just docs) by @patrickvonplaten in #17740
Custom pipelines

This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.

  • Custom pipeline by @sgugger in #18079
PyTorch to TensorFlow CLI utility

This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

TensorFlow-specific improvements

The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

  • [SegFormer] TensorFlow port by @sayakpaul in #17910
  • Add TF DeiT implementation by @amyeroberts in #17806
  • Add TF ResNet model by @amyeroberts in #17427
  • TF implementation of RegNets by @ariG23498 in #17554

Additionally, our TF models now support loading sharded checkpoints:

  • TF Sharded by @ArthurZucker in #17713
Flax-specific improvements

The following models have been ported to be used in JAX:

  • Flax t5 Encoder by @crystina-z in #17784

Additionally, our JAX models now support loading sharded checkpoints:

  • Flax sharded by @ArthurZucker in #17760
Additional model heads

The following models now have a brand new head for new tasks:

  • Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by @gilad19 in #17924
  • Adding OPTForSeqClassification class by @oneraghavan in #18123
ONNX support

A continued community effort provides ONNX converters for an increasing number of models.

  • add ONNX support for LeVit by @gcheron in #18154
  • add ONNX support for BLOOM by @NouamaneTazi in #17961
  • Add ONNX support for LayoutLMv3 by @regisss in #17953
  • Mrbean/codegen onnx by @sam-h-bean in #17903
  • Add ONNX support for DETR by @regisss in #17904
  • add onnx support for deberta and debertav2 by @sam-h-bean in #17617
Documentation translation

A community effort aiming to translate the documentation in several languages has been continued.

Portuguese
  • Added translation of index.mdx to Portuguese Issue #16824 by @rzimmerdev in #17565
Spanish
  • Add Spanish translation of custom_models.mdx by @donelianc in #17807
Italian
  • Add Italian translation of sharing_custom_models.mdx by @Xpiri in #17631
  • Add Italian translation of converting_tensorflow_models.mdx by @Xpiri in #18283
  • Add Italian translation of create_model.mdx and serialization.mdx by @F02934 in #17640
  • Italian/accelerate by @mfumanelli in #17698
  • Italian/model sharing by @mfumanelli in #17828
  • Italian translation of run_scripts.mdx gh-17459 by @lorenzobalzani in #17642
  • Translation/debugging by @nickprock in #18230
  • Translation/training: italian translation training.mdx by @nickprock in #17662
  • Translation italian: multilingual.mdx by @nickprock in #17768
  • Added preprocessing.mdx italian translation by @nickprock in #17600
Improvements and bugfixes
  • [EncoderDecoder] Improve docs by @NielsRogge in #18271
  • [DETR] Improve code examples by @NielsRogge in #18262
  • patch for smddp import by @carolynwang in #18244
  • Fix Sylvain's nits on the original KerasMetricCallback PR by @Rocketknight1 in #18300
  • Add PYTEST_TIMEOUT for CircleCI test jobs by @ydshieh in #18251
  • Add PyTorch 1.11 to past CI by @ydshieh in #18302
  • Raise a TF-specific error when importing Torch classes by @Rocketknight1 in #18280
  • [ create_a_model.mdx ] translate to pt by @Fellip15 in #18098
  • Update translation.mdx by @gorkemozkaya in #18169
  • Add TFAutoModelForImageClassification to pipelines.py by @ydshieh in #18292
  • Adding type hints of TF:OpenAIGPT by @Mathews-Tom in #18263
  • Adding type hints of TF:CTRL by @Mathews-Tom in #18264
  • Replace false parameter by a buffer by @sgugger in #18259
  • Fix ORTTrainer failure on gpt2 fp16 training by @JingyaHuang in #18017
  • Owlvit docs test by @alaradirik in #18257
  • Good difficult issue override for the stalebot by @LysandreJik in #18094
  • Fix dtype of input_features in docstring by @ydshieh in #18258
  • Fix command of doc tests for local testing by @oneraghavan in #18236
  • Fix TF bad words filter with XLA by @Rocketknight1 in #18286
  • Allows KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265
  • Skip passes report for --make-reports by @ydshieh in #18250
  • Update serving code to enable saved_model=True by @amyeroberts in #18153
  • Change how take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256
  • Fix torch version check in Vilt by @ydshieh in #18260
  • change bloom parameters to 176B by @muhammad-ahmed-ghani in #18235
  • TF: use the correct config with (...)EncoderDecoder models by @gante in #18097
  • Fix no_trainer CI by @muellerzr in #18242
  • Update notification service by @ydshieh in #17921
  • Make errors for loss-less models more user-friendly by @sgugger in #18233
  • Fix TrainingArguments help section by @sgugger in #18232
  • Better messaging and fix for incorrect shape when collating data. by @CakeCrusher in #18119
  • Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by @viclzhu in #18221
  • Update add_new_pipeline.mdx by @zh-zheng in #18224
  • Add custom config to quicktour by @stevhliu in #18115
  • skip some test_multi_gpu_data_parallel_forward by @ydshieh in #18188
  • Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by @ydshieh in #18213
  • Fix LayoutXLM docstrings by @qqaatw in #17038
  • update cache to v0.5 by @ydshieh in #18203
  • Reduce console spam when using the KerasMetricCallback by @Rocketknight1 in #18202
  • TF: Add missing cast to GPT-J by @gante in #18201
  • Use next-gen CircleCI convenience images by @ydshieh in #18197
  • Typo in readme by @flozi00 in #18195
  • [From pretrained] Allow download from subfolder inside model repo by @patrickvonplaten in #18184
  • Update docs README with instructions on locally previewing docs by @snehankekre in #18196
  • bugfix: div-->dim by @orgoro in #18135
  • Add vision example to README by @sgugger in #18194
  • Remove use_auth_token from the from_config method by @duongna21 in #18192
  • FSDP integration enhancements and fixes by @pacman100 in #18134
  • BLOOM minor fixes small test by @younesbelkada in #18175
  • fix typo inside bloom documentation by @SaulLu in #18187
  • Better default for offload_state_dict in from_pretrained by @sgugger in #18183
  • Fix template for new models in README by @sgugger in #18182
  • FIX: Typo by @ayansengupta17 in #18156
  • Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by @ydshieh in #18073
  • Fix expected loss values in some (m)T5 tests by @ydshieh in #18177
  • [HPO] update to sigopt new experiment api by @sywangyi in #18147
  • Fix incorrect type hint for lang by @JohnGiorgi in #18161
  • Fix check for falsey inputs in run_summarization by @JohnGiorgi in #18155
  • Adding support for device_map directly in pipeline(..) function. by @Narsil in #17902
  • Fixing a hard to trigger bug for text-generation pipeline. by @Narsil in #18131
  • Enable torchdynamo with torch_tensorrt(fx path) by @frank-wei in #17765
  • Make sharded checkpoints work in offline mode by @sgugger in #18125
  • add dataset split and config to model-index in TrainingSummary.from_trainer by @loicmagne in #18064
  • Add summarization name mapping for MultiNews by @JohnGiorgi in #18117
  • supported python versions reference by @CakeCrusher in #18116
  • TF: unpack_inputs decorator independent from main_input_name by @gante in #18110
  • TF: remove graph mode distinction when processing boolean options by @gante in #18102
  • Fix BLOOM dtype by @Muennighoff in #17995
  • CLI: reenable pt_to_tf test by @gante in #18108
  • Report value for a step instead of epoch. by @zhawe01 in #18095
  • speed up test by @sijunhe in #18106
  • Enhance IPEX integration in Trainer by @jianan-gu in #18072
  • Bloom Optimize operations by @younesbelkada in #17866
  • Add filename to info diaplyed when downloading things in from_pretrained by @sgugger in #18099
  • Fix image segmentation and object detection pipeline tests by @sgugger in #18100
  • Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by @duongna21 in #18069
  • Fix torchscript tests for GPT-NeoX by @ydshieh in #18012
  • Fix some typos. by @Yulv-git in #17560
  • [bloom] fix alibi device placement by @stas00 in #18087
  • Make predict() close progress bars after finishing by @neverix in #17952)
  • Update localized READMES when template is filled. by @sgugger in #18062
  • Fix type issue in using bucketing with Trainer by @seopbo in #18051
  • Fix slow CI by pinning resampy by @sgugger in #18077
  • Drop columns after loading samples in prepare_tf_dataset by @Rocketknight1 in #17967
  • [Generate Tests] Make sure no tokens are force-generated by @patrickvonplaten in #18053
  • Added Command for windows VENV activation in installation docs by @darthvader2 in #18008
  • Sort doc toc by @sgugger in #18034
  • Place inputs on device when include_inputs_for_metrics is True by @sgugger in #18046
  • Doc to dataset by @sgugger in #18037
  • Protect TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044
  • Fix T5 incorrect weight decay in Trainer and official summarization example by @ADAning in #18002
  • Squash commits by @NielsRogge in #17981
  • Enable Past CI by @ydshieh in #17919
  • Fix T5/mT5 tests by @Rocketknight1 in #18029
  • [Flax] Bump to v0.4.1 by @sanchit-gandhi in #17966
  • Update expected values in DecisionTransformerModelIntegrationTest by @ydshieh in #18016
  • fixed calculation of ctc loss in TFWav2Vec2ForCTC by @Sreyan88 in #18014
  • Return scalar losses instead of per-sample means by @Rocketknight1 in #18013
  • sort list of models by @hollance in #18011
  • Replace BloomTokenizer by BloomTokenizerFast in doc by @regisss in #18005
  • Fix typo in error message in generation_utils by @regisss in #18000
  • Refactor to inherit from nn.Module instead of nn.ModuleList by @amyeroberts in #17501
  • Add link to existing documentation by @LysandreJik in #17931
  • only a stupid typo, but it can lead to confusion by @Dobatymo in #17930
  • Exclude Databricks from notebook env only if the runtime is below 11.0 by @davidheryanto in #17988
  • Shifting labels for causal LM when using label smoother by @seungeunrho in #17987
  • Restore original task in test_warning_logs by @ydshieh in #17985
  • Ensure PT model is in evaluation mode and lightweight forward pass done by @amyeroberts in #17970
  • XLA train step fixes by @Rocketknight1 in #17973
  • [Flax] Add remat (gradient checkpointing) by @sanchit-gandhi in #17843
  • higher atol to avoid flaky trainer test failure by @ydshieh in #17979
  • Fix FlaxBigBirdEmbeddings by @ydshieh in #17842
  • fixing fsdp autowrap functionality by @pacman100 in #17922
  • fix bias keyword argument in TFDebertaEmbeddings by @WissamAntoun in #17940
  • Update expected values in CodeGen tests by @ydshieh in #17888
  • Fix typo in perf_train_gpu_one.mdx by @aliencaocao in #17983
  • skip some gpt_neox tests that require 80G RAM by @ydshieh in #17923
  • feat: add pipeline registry abstraction by @aarnphm in #17905
  • skip some ipex tests until it works with torch 1.12 by @ydshieh in #17964
  • Fix number of examples for iterable dataset in distributed training by @sgugger in #17951
  • [Pipelines] Add revision tag to all default pipelines by @patrickvonplaten in #17667
  • Unifying training argument type annotations by @jannisborn in #17934
  • Fix GPT-NeoX-20B past handling, attention computation by @zphang in #17811
  • Fix #17893, removed dead code by @clefourrier in #17917
  • Fix prepare_tf_dataset when drop_remainder is not supplied by @Rocketknight1 in #17950
  • ExplicitEnum subclass str (JSON dump compatible) by @BramVanroy in #17933
  • PyTorch 1.12.0 for scheduled CI by @ydshieh in #17949
  • OPT - Fix Softmax NaN in half precision mode by @younesbelkada in #17437
  • Use explicit torch version in deepspeed CI by @ydshieh in #17942
  • fix regexes with escape sequence by @stas00 in #17943
  • Fix all is_torch_tpu_available issues by @muellerzr in #17936
  • Fix img seg tests (load checkpoints from hf-internal-testing) by @mishig25 in #17939
  • Remove imports and use forward references in ONNX feature by @sgugger in #17926
  • Fix job links in Slack report by @ydshieh in #17892
  • Add missing comment quotes by @leondz in #17379
  • Remove render tags by @NielsRogge in #17897
  • Fix the Conda package build by @bryant1410 in #16737
  • Remove DT_DOUBLE from the T5 graph by @szutenberg in #17891
  • Compute min_resolution in prepare_image_inputs by @ydshieh in #17915
  • Fixing a regression with return_all_scores introduced in #17606 by @Narsil in #17906
  • In group_texts function, drop last block if smaller than block_size by @billray0259 in #17908
  • Move logic into pixelshuffle layer by @amyeroberts in #17899
  • Fix loss computation in TFBertForPreTraining by @Rocketknight1 in #17898
  • Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #17918
  • Fix PyTorch/TF Auto tests by @ydshieh in #17895
  • Fix test_number_of_steps_in_training_with_ipex by @ydshieh in #17889
  • Update expected values in constrained beam search tests by @ydshieh in #17887
  • Fix bug in gpt2's (from-scratch) special scaled weight initialization by @karpathy in #17877
  • Update README_zh-hans.md by @mmdjiji in #17861
  • bert: add conversion script for BERT Token Dropping TF2 checkpoints by @stefan-it in #17142
  • Fix add new model like frameworks by @sgugger in #17869
  • Add type annotations for RoFormer models by @donelianc in #17878
  • fix by @ydshieh in #17890
  • fix mask by @younesbelkada in #17837
  • Add a TF in-graph tokenizer for BERT by @Rocketknight1 in #17701
  • Fix TF GPT2 test_onnx_runtime_optimize by @ydshieh in #17874
  • CLI: handle multimodal inputs by @gante in #17839
  • Properly get tests deps in test_fetcher by @sgugger in #17870
  • Fix test_inference_instance_segmentation_head by @ydshieh in #17872
  • Skip test_multi_gpu_data_parallel_forward for MaskFormer by @ydshieh in #17864
  • Use higher value for hidden_size in Flax BigBird test by @ydshieh in #17822
  • Fix: torch.utils.checkpoint import error. by @kumapo in #17849
  • Add type hints for gptneox models by @willtai in #17858
  • Fix Splinter test by @ydshieh in #17854
  • [tests/VisionEncoderDecoder] import to_2tuple from test utils by @patil-suraj in #17865
  • Fix Constrained beam search duplication and weird output issue by @boy2000-007man in #17814
  • Improve encoder decoder model docs by @Threepointone4 in #17815
  • Improve vision models by @NielsRogge in #17731
  • Auto-build Docker images before on-merge if setup.py was changed by @muellerzr in #17573
  • Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts by @muellerzr in #17856
  • Index RNG states by global rank in saves by @sgugger in #17852
  • Change no trainer image_classification test by @muellerzr in #17635
  • Update modeling_cvt.py by @F02934 in #17846
  • Fix broken test for models with batchnorm by @Rocketknight1 in #17841
  • BLOOM minor changes on tokenizer by @younesbelkada in #17823
  • Improve performance docs by @lvwerra in #17750
  • Fix an error message in BigBird by @ydshieh in #17840
  • Fix properties of unset special tokens in non verbose mode by @guillaumekln in #17797
  • change message by @SaulLu in #17836
  • Add missing type hints for QDQBertModel by @willtai in #17783
  • Update type hints modeling_yoso.py by @F02934 in #17827
  • add doctests for DETR by @qherreros in #17786
  • Fix push CI artifact path by @ydshieh in #17788
  • Offload fixes by @sgugger in #17810
  • CLI: use hub's create_commit by @gante in #17755
  • initial commit by @ArthurZucker in #17818
  • Add logits_processor parameter, used by generate, to Seq2SeqTrainer methods evaluate and predict by @eranhirs in #17805
  • Fix top_k_top_p_filtering having unexpected behavior by @unifyh in #17744
  • Remove duplicate code by @lkm2835 in #17708
  • CLI: convert sharded PT models by @gante in #17959
  • Improve error message Union not allowed by @BramVanroy in #17769
  • Add final_layer_norm to OPT model by @thomasw21 in #17785
  • Properly check for a TPU device by @muellerzr in #17802
  • Fix test for BF16 detection by @sgugger in #17803
  • Use 5e-5 For BigBird PT/Flax equivalence tests by @ydshieh in #17780
  • Prepare transformers for v0.8.0 huggingface-hub release by @LysandreJik in #17716
  • Fix forward reference imports in DeBERTa configs by @sgugger in #17800
  • Fix Automatic Download of Pretrained Weights in DETR by @AnugunjNaman in #17712
  • [ViTMAE] Fix docstrings and variable names by @NielsRogge in #17710
  • Add link to notebook by @NielsRogge in #17791
  • [CodeParrot] Near-deduplication with jaccard similarity by @liyongsea in #17054
  • Update modeling_longt5.py by @bjascob in #17777
  • Not use -1e4 as attn mask by @ydshieh in #17306
  • Fix cache for GPT-Neo-X by @sgugger in #17764
  • deprecate is_torch_bf16_available by @stas00 in #17738
  • Attempt to change Push CI to workflow_run by @ydshieh in #17753
  • Save huggingface checkpoint as artifact in mlflow callback by @swethmandava in #17686
  • Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #17623
  • feat: add num_workers arg to DataLoader by @greg2451 in #17751
  • Enable PyTorch nightly build CI by @ydshieh in #17335
Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @donelianc
    • Add Spanish translation of custom_models.mdx (#17807)
    • Add type annotations for RoFormer models (#17878)
  • @Xpiri
    • Add Italian translation of sharing_custom_models.mdx (#17631)
    • Add Italian translation of converting_tensorflow_models.mdx (#18283)
  • @F02934
    • Add Italian translation of create_model.mdx and serialization.mdx (#17640)
    • Update modeling_cvt.py (#17846)
    • Update type hints modeling_yoso.py (#17827)
  • @sayakpaul
    • [SegFormer] TensorFlow port (#17910)
  • @mfumanelli
    • Italian/accelerate (#17698)
    • Italian/model sharing (#17828)
  • @nickprock
    • Translation/debugging (#18230)
    • Translation/training: italian translation training.mdx (#17662)
    • Translation italian: multilingual.mdx (#17768)
    • Added preprocessing.mdx italian translation (#17600)
  • @sijunhe
    • speed up test (#18106)
    • Nezha Pytorch implementation (#17776)
  • @StevenTang1998
    • Add MVP model (#17787)
  • @ariG23498
    • TF implementation of RegNets (#17554)
  • @xvjiarui
    • Adding GroupViT Models (#17313)
  • @rooa
    • Add CodeGen model (#17443)
If you use this software, please cite it using these metadata.
Files (12.4 MB)
Name Size
huggingface/transformers-v4.21.0.zip
md5:6601fe8fdd6085b3f593a1c64f3e47f5
12.4 MB Download
51,519
1,605
views
downloads
All versions This version
Views 51,519612
Downloads 1,6057
Data volume 13.5 GB87.1 MB
Unique views 43,174553
Unique downloads 9257

Share

Cite as