There is a newer version of this record available.

Software Open Access

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.

v4.10.0: LayoutLM-v2, LayoutXLM, BEiT LayoutLM-v2 and LayoutXLM

Four new models are released as part of the LatourLM-v2 implementation: LayoutLMv2ForSequenceClassification, LayoutLMv2Model, LayoutLMv2ForTokenClassification and LayoutLMv2ForQuestionAnswering, in PyTorch.

The LayoutLMV2 model was proposed in LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves LayoutLM to obtain state-of-the-art results across several document image understanding benchmarks:

  • Add LayoutLMv2 + LayoutXLM #12604 (@NielsRogge)

Compatible checkpoints can be found on the Hub:


Three new models are released as part of the BEiT implementation: BeitModel, BeitForMaskedImageModeling, and BeitForImageClassification, in PyTorch.

The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI's DALL-E model given masked patches.

  • Add BEiT #12994 (@NielsRogge)

Compatible checkpoints can be found on the Hub:

Speech improvements

The Wav2Vec2 and HuBERT models now have a sequence classification head available.

  • Add Wav2Vec2 & Hubert ForSequenceClassification #13153 (@anton-l)
DeBERTa in TensorFlow (@kamalkraj)

The DeBERTa and DeBERTa-v2 models have been converted from PyTorch to TensorFlow.

  • Deberta tf #12972 (@kamalkraj)
  • Deberta_v2 tf #13120 (@kamalkraj)
Flax model additions

EncoderDecoder, DistilBERT, and ALBERT, now have support in Flax!

  • FlaxEncoderDecoder allowing Bert2Bert and Bert2GPT2 in Flax #13008 (@ydshieh)
  • FlaxDistilBERT #13324 (@kamalkraj)
  • FlaxAlBERT #13294 (@kamalkraj)
TensorFlow examples

A new example has been added in TensorFlow: multiple choice! Data collators have become framework agnostic and can now work for both TensorFlow and NumPy on top of PyTorch.

  • Add TF multiple choice example #12865 (@Rocketknight1)
  • TF/Numpy variants for all DataCollator classes #13105 (@Rocketknight1)
Auto API refactor

The Auto APIs have been disentangled from all the other mode modules of the Transformers library, so you can now safely import the Auto classes without importing all the models (and maybe getting errors if your setup is not compatible with one specific model). The actual model classes are only imported when needed.

  • Disentangle auto modules from other modeling files #13023 (@sgugger)
  • Fix AutoTokenizer when no fast tokenizer is available #13336 (@sgugger)
Slight breaking change

When loading some kinds of corrupted state dictionaries of models, the PreTrainedModel.from_pretrained method was sometimes silently ignoring weights. This has now become a real error.

  • Fix from_pretrained with corrupted state_dict #12939 (@sgugger)
General improvements and bugfixes
  • Improving pipeline tests #12784 (@Narsil)
  • Pin git python to <3.1.19 #12858 (@patrickvonplaten)
  • [tests] fix logging_steps requirements #12860 (@stas00)
  • [Sequence Feature Extraction] Add truncation #12804 (@patrickvonplaten)
  • add classifier_dropout to classification heads #12794 (@PhilipMay)
  • Fix barrier for SM distributed #12853 (@sgugger)
  • Add possibility to ignore imports in test_fecther #12801 (@sgugger)
  • Add accelerate to examples requirements #12888 (@sgugger)
  • Fix documentation of BigBird tokenizer #12889 (@sgugger)
  • Better heuristic for token-classification pipeline. #12611 (@Narsil)
  • Fix push_to_hub for TPUs #12895 (@sgugger)
  • Seq2SeqTrainer set max_length and num_beams only when non None #12899 (@cchen-dialpad)
  • [FLAX] Minor fixes in CLM example #12914 (@stefan-it)
  • Correct validation_split_percentage argument from int (ex:5) to float (0.05) #12897 (@Elysium1436)
  • Fix typo in the example of MobileBertForPreTraining #12919 (@buddhics)
  • Add option to set max_len in run_ner #12929 (@sgugger)
  • Fix QA examples for roberta tokenizer #12928 (@sgugger)
  • Print defaults when using --help for scripts #12930 (@sgugger)
  • Fix StoppingCriteria ABC signature #12918 (@willfrey)
  • Add missing @classmethod decorators #12927 (@willfrey)
  • fix #12910 (@chutaklee)
  • Update #12901 (@willfrey)
  • Update #12900 (@willfrey)
  • Update #12896 (@willfrey)
  • Fix docstring typo in #12891 (@willfrey)
  • [Flax] Correctly Add MT5 #12988 (@patrickvonplaten)
  • ONNX v2 raises an Exception when using PyTorch < 1.8.0 #12933 (@mfuntowicz)
  • Moving feature-extraction pipeline to new testing scheme #12843 (@Narsil)
  • Add CpmTokenizerFast #12938 (@JetRunner)
  • fix typo in gradient_checkpointing arg #12855 (@21jun)
  • Log Azure ML metrics only for rank 0 #12766 (@harshithapv)
  • Add substep end callback method #12951 (@wulu473)
  • Add multilingual documentation support #12952 (@JetRunner)
  • Fix division by zero in NotebookProgressPar #12953 (@sgugger)
  • [FLAX] Minor fixes in LM example #12947 (@stefan-it)
  • Prevent Trainer.evaluate() crash when using only tensorboardX #12963 (@aphedges)
  • Fix typo in example of DPRReader #12954 (@tadejsv)
  • Place BigBirdTokenizer in sentencepiece-only objects #12975 (@sgugger)
  • fix typo in example/text-classification README #12974 (@fullyz)
  • Fix template for inputs docstrings #12976 (@sgugger)
  • fix Trainer.train(resume_from_checkpoint=False) is causing an exception #12981 (@PhilipMay)
  • Cast logits from bf16 to fp32 at the end of TF_T5 #12332 (@szutenberg)
  • Update CANINE test #12453 (@NielsRogge)
  • pad_to_multiple_of added to DataCollatorForWholeWordMask #12999 (@Aktsvigun)
  • [Flax] Align jax flax device name #12987 (@patrickvonplaten)
  • [Flax] Correct flax docs #12782 (@patrickvonplaten)
  • T5: Create position related tensors directly on device instead of CPU #12846 (@armancohan)
  • Skip ProphetNet test #12462 (@LysandreJik)
  • Create perplexity.rst #13004 (@sashavor)
  • GPT-Neo ONNX export #12911 (@michaelbenayoun)
  • Update generate method - Fix floor_divide warning #13013 (@nreimers)
  • [Flax] Correct pt to flax conversion if from base to head #13006 (@patrickvonplaten)
  • [Flax T5] Speed up t5 training #13012 (@patrickvonplaten)
  • FX submodule naming fix #13016 (@michaelbenayoun)
  • T5 with past ONNX export #13014 (@michaelbenayoun)
  • Fix ONNX test: Put smaller ALBERT model #13028 (@LysandreJik)
  • Tpu tie weights #13030 (@sgugger)
  • Use min version for huggingface-hub dependency #12961 (@lewtun)
  • -> #12565 (@abhishekkrthakur)
  • [Flax] Refactor gpt2 & bert example docs #13024 (@patrickvonplaten)
  • Add MBART to models exportable with ONNX #13049 (@LysandreJik)
  • Add to ONNX docs #13048 (@LysandreJik)
  • Fix small typo in M2M100 doc #13061 (@SaulLu)
  • Add try-except for torch_scatter #13040 (@JetRunner)
  • docs: add HuggingArtists to community notebooks #13050 (@AlekseyKorshuk)
  • Fix ModelOutput instantiation form dictionaries #13067 (@sgugger)
  • Roll out the test fetcher on push tests #13055 (@sgugger)
  • Fix fallback of test_fetcher #13071 (@sgugger)
  • Revert to all tests whil we debug what's wrong #13072 (@sgugger)
  • Use original key for label in DataCollatorForTokenClassification #13057 (@ibraheem-moosa)
  • [Doctest] Setup, quicktour and task_summary #13078 (@sgugger)
  • Add VisualBERT demo notebook #12263 (@gchhablani)
  • Install git #13091 (@LysandreJik)
  • Fix classifier dropout in AlbertForMultipleChoice #13087 (@ibraheem-moosa)
  • Doctests job #13088 (@LysandreJik)
  • Fix VisualBert Embeddings #13017 (@gchhablani)
  • Proper import for unittest.mock.patch #13085 (@sgugger)
  • Reactive test fecthers on scheduled test with proper git install #13097 (@sgugger)
  • Change a parameter name in FlaxBartForConditionalGeneration.decode() #13074 (@ydshieh)
  • [Flax/JAX] Run jitted tests at every commit #13090 (@patrickvonplaten)
  • Rely on huggingface_hub for common tools #13100 (@sgugger)
  • [FlaxCLIP] allow passing params to image and text feature methods #13099 (@patil-suraj)
  • Ci last fix #13103 (@sgugger)
  • Improve type checker performance #13094 (@bschnurr)
  • Fix VisualBERT docs #13106 (@gchhablani)
  • Fix CircleCI nightly tests #13113 (@sgugger)
  • Create py.typed #12893 (@willfrey)
  • Fix flax gpt2 hidden states #13109 (@ydshieh)
  • Moving fill-mask pipeline to new testing scheme #12943 (@Narsil)
  • Fix omitted lazy import for xlm-prophetnet #13052 (@minwhoo)
  • Fix classifier dropout in bertForMultipleChoice #13129 (@mandelbrot-walker)
  • Fix frameworks table so it's alphabetical #13118 (@osanseviero)
  • [Feature Processing Sequence] Remove duplicated code #13051 (@patrickvonplaten)
  • Ci continue through smi failure #13140 (@LysandreJik)
  • Fix missing seq_len in electra model when inputs_embeds is used. #13128 (@sararb)
  • Optimizes ByT5 tokenizer #13119 (@Narsil)
  • Add splinter #12955 (@oriram)
  • [AutoFeatureExtractor] Fix loading of local folders if config.json exists #13166 (@patrickvonplaten)
  • Fix generation docstrings regarding input_ids=None #12823 (@jvamvas)
  • Update namespaces inside to the latest. #13167 (@qqaatw)
  • Fix the loss calculation of ProphetNet #13132 (@StevenTang1998)
  • Fix LUKE tests #13183 (@NielsRogge)
  • Add min and max question length options to TapasTokenizer #12803 (@NielsRogge)
  • SageMaker: Fix sagemaker DDP & metric logs #13181 (@philschmid)
  • correcting group beam search function output score bug #13211 (@sourabh112)
  • Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account #13056 (@SaulLu)
  • remove unwanted control-flow code from DeBERTa-V2 #13145 (@kamalkraj)
  • Fix load_tf_weights alias. #13159 (@qqaatw)
  • Add RemBert to AutoTokenizer #13224 (@LysandreJik)
  • Allow local_files_only for fast pretrained tokenizers #13225 (@BramVanroy)
  • fix AutoModel.from_pretrained(..., torch_dtype=...) #13209 (@stas00)
  • Fix broken links in Splinter documentation #13237 (@oriram)
  • Custom errors and BatchSizeError #13184 (@AmbiTyga)
  • Bump notebook from 6.1.5 to 6.4.1 in /examples/research_projects/lxmert #13226 (@dependabot[bot])
  • Update #12671 (@willfrey)
  • Remove side effects of disabling gradient computaiton #13257 (@LysandreJik)
  • Replace assert statement with if condition and raise ValueError #13263 (@nishprabhu)
  • Better notification service #13267 (@LysandreJik)
  • Fix failing Hubert test #13261 (@LysandreJik)
  • Add CLIP tokenizer to AutoTokenizer #13258 (@LysandreJik)
  • Some model_types cannot be in the mapping #13259 (@LysandreJik)
  • Add require flax to MT5 Flax test #13260 (@LysandreJik)
  • Migrating conversational pipeline tests to new testing format #13114 (@Narsil)
  • fix tokenizer_class_from_name for models with - in the name #13251 (@stas00)
  • Add error message concerning revision #13266 (@BramVanroy)
  • Move image-classification pipeline to new testing #13272 (@Narsil)
  • [Hotfix] Fixing the test (warnings was incorrect.) #13278 (@Narsil)
  • Moving question_answering tests to the new testing scheme. Had to tweak a little some ModelTesterConfig for pipelines. #13277 (@Narsil)
  • Moving summarization pipeline to new testing format. #13279 (@Narsil)
  • Moving table-question-answering pipeline to new testing. #13280 (@Narsil)
  • Moving table-question-answering pipeline to new testing #13281 (@Narsil)
  • Hotfixing master tests. #13282 (@Narsil)
  • Moving text2text-generation to new pipeline testing mecanism #13283 (@Narsil)
  • Add DINO conversion script #13265 (@NielsRogge)
  • Moving text-generation pipeline to new testing framework. #13285 (@Narsil)
  • Moving token-classification pipeline to new testing. #13286 (@Narsil)
  • examples: add keep_linebreaks option to CLM examples #13150 (@stefan-it)
  • Moving translation pipeline to new testing scheme. #13297 (@Narsil)
  • Fix BeitForMaskedImageModeling #13275 (@NielsRogge)
  • Moving zero-shot-classification pipeline to new testing. #13299 (@Narsil)
  • Fixing mbart50 with return_tensors argument too. #13301 (@Narsil)
  • [Flax] Correct all return tensors to numpy #13307 (@patrickvonplaten)

  • examples: only use keep_linebreaks when reading TXT files #13320 (@stefan-it)

  • Slow tests - run rag token in half precision #13304 (@patrickvonplaten)
  • [Slow tests] Disable Wav2Vec2 pretraining test for now #13303 (@patrickvonplaten)
  • Announcing the default model used by the pipeline (with a link). #13276 (@Narsil)
  • use float 16 in causal mask and masked bias #13194 (@hwijeen)
  • ✨ add citation file #13214 (@flaxel)
  • Improve documentation of pooler_output in ModelOutput #13228 (@navjotts)
  • fix: typo spelling grammar #13212 (@slowy07)
  • Check None before going through iteration #13250 (@qqaatw)
  • Use existing functionality for #13251 #13333 (@sgugger)
  • logger: add ability to connect to a run #13319 (@fcakyon)
  • Update label2id in the model config for run_glue #13334 (@sgugger)
  • :bug: fix small model card bugs #13310 (@nateraw)
  • Fall back to observed_batch_size when the dataloader does not know the batch_size. #13188 (@mbforbes)
  • Fixes #12941 where use_auth_token not been set up early enough #13205 (@bennimmo)
  • Correct wrong function signatures on the docs website #13198 (@qqaatw)
  • Fix release utils #13337 (@sgugger)
  • Add missing module spec #13321 (@laurahanu)
  • Use DS callable API to allow hf_scheduler + ds_optimizer #13216 (@tjruwase)
  • Tests fetcher tests #13340 (@sgugger)
  • [Testing] Add Flax Tests on GPU, Add Speech and Vision to Flax & TF tests #13313 (@patrickvonplaten)
  • Fixing a typo in the data_collator documentation #13309 (@Serhiy-Shekhovtsov)
  • Add GPT2ForTokenClassification #13290 (@tucan9389)
  • Doc mismatch fixed #13345 (@Apoorvgarg-creator)
  • Handle nested dict/lists of tensors as inputs in the Trainer #13338 (@sgugger)
  • [doc] correct TP implementation resources #13248 (@stas00)
  • Fix minor typo in parallelism doc #13289 (@jaketae)
  • Set missing seq_length variable when using inputs_embeds with ALBERT & Remove code duplication #13152 (@olenmg)
  • TF CLM example fix typo #13002 (@Rocketknight1)
  • Add generate kwargs to Seq2SeqTrainingArguments #13339 (@sgugger)

If you use this software, please cite it using these metadata.
Files (11.6 MB)
Name Size
11.6 MB Download
All versions This version
Views 27,4553,167
Downloads 69012
Data volume 4.3 GB139.3 MB
Unique views 23,2763,059
Unique downloads 35712


Cite as