There is a newer version of the record available.

Published October 1, 2020 | Version v4.23.0
Software Open

Transformers: State-of-the-Art Natural Language Processing

Description

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

The abstract from the paper is the following:

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results but in a zeroshot transfer setting without the need for any finetuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

  • Add WhisperModel to transformers by @ArthurZucker in #19166
  • Add TF whisper by @amyeroberts in #19378
Time series

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

  • time series forecasting model by @kashif in #17965
Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang. Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

The abstract from the paper is the following:

The recently-developed DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings for localizing the four extremities and predicting the box, which increases the need for high-quality content embeddings and thus the training difficulty. Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box. This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7× faster for the backbones R50 and R101 and 10× faster for stronger backbones DC5-R50 and DC5-R101.

  • Add support for conditional detr by @DeppMeng in #18948
  • Improve conditional detr docs by @NielsRogge in #19154
Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas. The paper presents a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, their method yields excellent performance in the low-shot and extreme low-shot regimes.

The abstract from the paper is the following:

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark.

  • MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815
MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei. MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks:

WebSRC, a dataset for Web-Based Structual Reading Comprehension (a bit like SQuAD but for web pages) SWDE, a dataset for information extraction from web pages (basically named-entity recogntion on web pages) The abstract from the paper is the following:

Multimodal pre-training with text, layout, and image has made significant progress for Visually-rich Document Understanding (VrDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available.

  • Add MarkupLM by @NielsRogge in #19198
Security & safety

We explore a new serialization format that we can leverage in all three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

Support for this is still experimental.

  • Poc to use safetensors by @sgugger in #19175
Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs. :warning: The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

  • Improve DETR post-processing methods by @alaradirik in #19205
  • Beit postprocessing by @alaradirik in #19099
  • Fix BeitFeatureExtractor postprocessing by @alaradirik in #19119
  • Add post_process_semantic_segmentation method to SegFormer by @alaradirik in #19072
  • Add post_process_semantic_segmentation method to DPTFeatureExtractor by @alaradirik in #19107
  • Add semantic segmentation post-processing method to MobileViT by @alaradirik in #19105
  • Detr preprocessor fix by @alaradirik in #19007
  • Improve and fix ImageSegmentationPipeline by @alaradirik in #19367
  • Restructure DETR post-processing, return prediction scores by @alaradirik in #19262
  • Maskformer post-processing fixes and improvements by @alaradirik in #19172
  • Fix MaskFormer failing postprocess tests by @alaradirik in #19354
  • Fix DETR segmentation postprocessing output by @alaradirik in #19363
  • fix docs example, add object_detection to DETR docs by @alaradirik in #19377
🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

  • 🚨🚨🚨 Fix ViT parameter initialization by @alaradirik in #19341

Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

  • 🚨🚨🚨 Optimize Top P Sampler and fix edge case by @ekagra-ranjan in #18984
Model head additions

OPT and BLOOM now have question answering heads available.

  • Add OPTForQuestionAnswering by @clementapa in #19402
  • Add BloomForQuestionAnswering by @younesbelkada in #19310
Pipelines

There is now a zero-shot object detection pipeline.

  • Add ZeroShotObjectDetectionPipeline by @sahamrit in #18445)
TensorFlow architectures

The GroupViT model is now available in TensorFlow.

  • [TensorFlow] Adding GroupViT by @ariG23498 in #18020
Bugfixes and improvements
  • Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001
  • [doc] debug: fix import by @stas00 in #19042
  • [bnb] Small improvements on utils by @younesbelkada in #18646
  • Update image segmentation pipeline test by @amyeroberts in #18731
  • Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040
  • Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046
  • Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034
  • Move cache: expand error message by @sgugger in #19051
  • Fixing OPT fast tokenizer option. by @Narsil in #18753
  • Fix custom tokenizers test by @sgugger in #19052
  • Run torchdynamo tests by @ydshieh in #19056
  • [fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140
  • fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843
  • Adds package and requirement spec output to version check exception by @colindean in #18702
  • fix use_cache by @younesbelkada in #19060
  • FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053
  • [doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065
  • Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746
  • Organize test jobs by @sgugger in #19058
  • Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064
  • Fix LeViT checkpoint by @ydshieh in #19069
  • TF: tests for (de)serializable models with resized tokens by @gante in #19013
  • Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039
  • replace logger.warn by logger.warning by @fxmarty in #19068
  • Fix tokenizer load from one file by @sgugger in #19073
  • Note about developer mode by @LysandreJik in #19075
  • german autoclass by @flozi00 in #19049
  • Add tests for legacy load by url and fix bugs by @sgugger in #19078
  • Add runner availability check by @ydshieh in #19054
  • fix working dir by @ydshieh in #19101
  • Added type hints for TFConvBertModel by @kishore-s-15 in #19088
  • Added Type hints for VIT MAE by @kishore-s-15 in #19085
  • Add type hints for TF MPNet models by @kishore-s-15 in #19089
  • Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084
  • added type hints by @daspartho in #19076
  • Improve vision models docs by @NielsRogge in #19103
  • correct spelling in README by @flozi00 in #19092
  • Don't warn of move if cache is empty by @sgugger in #19109
  • HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096
  • Add documentation of Trainer.create_model_card by @sgugger in #19110
  • Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086
  • Fix the wrong schedule by @ydshieh in #19117
  • Change document question answering pipeline to always return an array by @ankrgyl in #19071
  • german processing by @flozi00 in #19121
  • Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047
  • Add a missing space in a script arg documentation by @bryant1410 in #19113
  • Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122
  • Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722
  • [BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131
  • Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133
  • suppoer deps from github by @lhoestq in #19141
  • Fix dummy creation for multi-frameworks objects by @sgugger in #19144
  • Allowing users to use the latest tokenizers release ! by @Narsil in #19139
  • Add some tests for check_dummies by @sgugger in #19146
  • Fixed typo in generation_utils.py by @nbalepur in #19145
  • Add accelerate support for ViLT by @younesbelkada in #18683
  • TF: check embeddings range by @gante in #19102
  • Reduce LR for TF MLM example test by @Rocketknight1 in #19156
  • update perf_train_cpu_many doc by @sywangyi in #19151
  • fix: ckpt paths. by @sayakpaul in #19159
  • Fix TrainingArguments documentation by @sgugger in #19162
  • fix HPO DDP GPU problem by @sywangyi in #19168
  • [WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158
  • Add doctests to Perceiver examples by @stevenmanton in #19129
  • Add offline runners info in the Slack report by @ydshieh in #19169
  • Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728
  • Fixed type hint for pipelines/check_task by @Fei-Wang in #19150
  • Update run_clip.py by @enze5088 in #19130
  • german training, accelerate and model sharing by @flozi00 in #19171
  • Separate Push CI images from Scheduled CI by @ydshieh in #19170
  • Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602
  • Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200
  • Move the model type check by @ankrgyl in #19027
  • Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202
  • Updated hf_argparser.py by @IMvision12 in #19188
  • Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203
  • Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206
  • Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874
  • add wav2vec2_alignment by @arijitx in #16782
  • add doc for hyperparameter search by @sywangyi in #19192
  • Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695
  • translated add_new_pipeline by @nickprock in #19215
  • More tests for regression in cached non existence by @sgugger in #19216
  • Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201
  • Added tests for yaml and json parser by @IMvision12 in #19219
  • Fix small use_cache typo in the docs by @ankrgyl in #19191
  • Generate: add warning when left padding should be used by @gante in #19067
  • Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217
  • Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173
  • Document and validate typical_p in generation by @mapmeld in #19128
  • Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208
  • Fix cache names in CircleCI jobs by @ydshieh in #19223
  • Move AutoClasses under Main Classes by @stevhliu in #19163
  • Focus doc around preprocessing classes by @stevhliu in #18768
  • Fix confusing working directory in Push CI by @ydshieh in #19234
  • XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057
  • Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233
  • Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149
  • Fix opt softmax small nit by @younesbelkada in #19243
  • Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244
  • Fix TrainingArgs argument serialization by @atturaioe in #19239
  • Fix test fetching for examples by @sgugger in #19237
  • Cast TF generate() inputs by @Rocketknight1 in #19232
  • Skip pipeline tests by @sgugger in #19248
  • Add job names in Past CI artifacts by @ydshieh in #19235
  • Update Past CI report script by @ydshieh in #19228
  • [Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218
  • Catch HFValidationError in TrainingSummary by @ydshieh in #19252
  • Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183
  • Add stop sequence to text generation pipeline by @KMFODA in #18444
  • Add notebooks by @JingyaHuang in #19259
  • Add beautifulsoup4 to the dependency list by @ydshieh in #19253
  • Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250
  • Fix cached lookup filepath on windows for hub by @kjerk in #19178
  • Docs - Guide to add a new TensorFlow model by @gante in #19256
  • Update no_trainer script for summarization by @divyanshugit in #19277
  • Don't automatically add bug label by @sgugger in #19302
  • Breakup export guide by @stevhliu in #19271
  • Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247
  • Update README.md by @ShubhamJagtap2000 in #19309
  • [Docs] Fix link by @patrickvonplaten in #19313
  • Fix for sequence regression fit() in TF by @Rocketknight1 in #19316
  • Added Type hints for LED TF by @IMvision12 in #19315
  • Added type hints for TF: rag model by @debjit-bw in #19284
  • alter retrived to retrieved by @gouqi666 in #18863
  • ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281
  • ci(workflows): update actions/checkout to v3 by @oscard0m in #19280
  • wrap forward passes with torch.no_grad() by @daspartho in #19279
  • wrap forward passes with torch.no_grad() by @daspartho in #19278
  • wrap forward passes with torch.no_grad() by @daspartho in #19274
  • wrap forward passes with torch.no_grad() by @daspartho in #19273
  • Removing BertConfig inheritance from LayoutLMConfig by @arnaudstiegler in #19307
  • docker-build: Update actions/checkout to v3 by @Sushrut1101 in #19288
  • Clamping hidden state values to allow FP16 by @SSamDav in #19229
  • Remove interdependency from OpenAI tokenizer by @E-Aho in #19327
  • removing XLMConfig inheritance from FlaubertConfig by @D3xter1922 in #19326
  • Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by @divyanshugit in #19331
  • Remove bert interdependency from clip tokenizer by @shyamsn97 in #19332
  • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by @D3xter1922 in #19330
  • Making camembert independent from roberta, clean by @Mustapha-AJEGHRIR in #19337
  • Add sudachi and jumanpp tokenizers for bert_japanese by @r-terada in #19043
  • Frees LongformerTokenizer of the Roberta dependency by @srhrshr in #19346
  • Change BloomConfig docstring by @younesbelkada in #19336
  • Test failing test while we resolve the issue. by @sgugger in #19355
  • Call _set_save_spec() when creating TF models by @Rocketknight1 in #19321
  • correct typos in README by @paulaxisabel in #19304
  • Removes Roberta and Bert config dependencies from Longformer by @srhrshr in #19343
  • Fix gather for metrics by @muellerzr in #19360
  • Fix pipeline tests for Roberta-like tokenizers by @sgugger in #19365
  • Change link of repojacking vulnerable link by @Ilaygoldman in #19393
  • Making ConvBert Tokenizer independent from bert Tokenizer by @IMvision12 in #19347
  • Fix gather for metrics by @muellerzr in #19389
  • Added Type hints for XLM TF by @IMvision12 in #19333
  • add ONNX support for swin transformer by @bibhabasumohapatra in #19390
  • removes prophet config dependencies from xlm-prophet by @srhrshr in #19400
  • Added type hints for TF: TransfoXL by @thliang01 in #19380
  • HF <-> megatron checkpoint reshaping and conversion for GPT by @pacman100 in #19317
  • Remove unneded words from audio-related feature extractors by @osanseviero in #19405
  • edit: cast attention_mask to long in DataCollatorCTCWithPadding by @ddobokki in #19369
  • Copy BertTokenizer dependency into retribert tokenizer by @Davidy22 in #19371
  • Export TensorFlow models to ONNX with dynamic input shapes by @dwyatte in #19255
  • update attention mask handling by @ArthurZucker in #19385
  • Remove dependency of Bert from Squeezebert tokenizer by @rchan26 in #19403
  • Removed Bert and XML Dependency from Herbert by @harry7337 in #19410
  • Clip device map by @patrickvonplaten in #19409
  • Remove Dependency between Bart and LED (slow/fast) by @Infrared1029 in #19408
  • Removed Bert interdependency in tokenization_electra.py by @OtherHorizon in #19356
  • Make Camembert TF version independent from Roberta by @Mustapha-AJEGHRIR in #19364
  • Removed Bert dependency from BertGeneration code base. by @Threepointone4 in #19370
  • Rework pipeline tests by @sgugger in #19366
  • Fix ViTMSNForImageClassification doctest by @ydshieh in #19275
  • Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 by @ydshieh in #19261
  • remove RobertaConfig inheritance from MarkupLMConfig by @D3xter1922 in #19404
  • Backtick fixed (paragraph 68) by @kant in #19440
  • Fixed duplicated line (paragraph #83) Documentation: @sgugger by @kant in #19436
  • fix marianMT convertion to onnx by @kventinel in #19287
  • Fix typo in image-classification/README.md by @zhawe01 in #19424
  • Stop relying on huggingface_hub's private methods by @LysandreJik in #19392
  • Add onnx support for VisionEncoderDecoder by @mht-sharma in #19254
  • Remove dependency of Roberta in Blenderbot by @rchan26 in #19411
  • fix: renamed variable name by @ariG23498 in #18850
  • Fix the error message in run_t5_mlm_flax.py by @yangky11 in #19282
  • Add Italian translation for add_new_model.mdx by @Steboss89 in #18713
  • Fix momentum and epsilon values by @amyeroberts in #19454
  • Generate: corrected exponential_decay_length_penalty type hint by @ShivangMishra in #19376
  • Fix misspelled word in docstring by @Bearnardd in #19415
  • Fixed a non-working hyperlink in the README.md file by @MikailINTech in #19434
  • fix by @ydshieh in #19469
  • wrap forward passes with torch.no_grad() by @daspartho in #19439
  • wrap forward passes with torch.no_grad() by @daspartho in #19438
  • wrap forward passes with torch.no_grad() by @daspartho in #19416
  • wrap forward passes with torch.no_grad() by @daspartho in #19414
  • wrap forward passes with torch.no_grad() by @daspartho in #19413
  • wrap forward passes with torch.no_grad() by @daspartho in #19412
Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @flozi00
    • german autoclass (#19049)
    • correct spelling in README (#19092)
    • german processing (#19121)
    • german training, accelerate and model sharing (#19171)
  • @DeppMeng
    • Add support for conditional detr (#18948)
  • @sayakpaul
    • MSN (Masked Siamese Networks) for ViT (#18815)
    • fix: ckpt paths. (#19159)
    • Add expected output to the sample code for ViTMSNForImageClassification (#19183)
  • @IMvision12
    • Updated hf_argparser.py (#19188)
    • Added tests for yaml and json parser (#19219)
    • Added Type hints for LED TF (#19315)
    • Making ConvBert Tokenizer independent from bert Tokenizer (#19347)
    • Added Type hints for XLM TF (#19333)
  • @ariG23498
    • [TensorFlow] Adding GroupViT (#18020)
    • fix: renamed variable name (#18850)
  • @Mustapha-AJEGHRIR
    • Fix m2m_100.mdx doc example missing labels (#19149)
    • Making camembert independent from roberta, clean (#19337)
    • Make Camembert TF version independent from Roberta (#19364)
  • @D3xter1922
    • removing XLMConfig inheritance from FlaubertConfig (#19326)
    • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (#19330)
    • remove RobertaConfig inheritance from MarkupLMConfig (#19404)
  • @srhrshr
    • Frees LongformerTokenizer of the Roberta dependency (#19346)
    • Removes Roberta and Bert config dependencies from Longformer (#19343)
    • removes prophet config dependencies from xlm-prophet (#19400)
  • @sahamrit
    • [WIP] Add ZeroShotObjectDetectionPipeline (#18445) (#18930)
  • @Davidy22
    • Copy BertTokenizer dependency into retribert tokenizer (#19371)
  • @rchan26
    • Remove dependency of Bert from Squeezebert tokenizer (#19403)
    • Remove dependency of Roberta in Blenderbot (#19411)
  • @harry7337
    • Removed Bert and XML Dependency from Herbert (#19410)
  • @Infrared1029
    • Remove Dependency between Bart and LED (slow/fast) (#19408)
  • @Steboss89
    • Add Italian translation for add_new_model.mdx (#18713)

Notes

If you use this software, please cite it using these metadata.

Files

huggingface/transformers-v4.23.0.zip

Files (13.4 MB)

Name Size Download all
md5:6be7665758cc113f82aaddfdfb3c0d3c
13.4 MB Preview Download

Additional details