There is a newer version of this record available.

Software Open Access

Transformers: State-of-the-Art Natural Language Processing

Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <controlfield tag="005">20211117163849.0</controlfield>
  <datafield tag="500" ind1=" " ind2=" ">
    <subfield code="a">If you use this software, please cite it using these metadata.</subfield>
  <controlfield tag="001">5532744</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Debut, Lysandre</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Sanh, Victor</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Chaumond, Julien</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Delangue, Clement</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Moi, Anthony</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Cistac, Perric</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Ma, Clara</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Jernite, Yacine</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Plu, Julien</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Xu, Canwen</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Le Scao, Teven</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Gugger, Sylvain</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Drame, Mariama</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Lhoest, Quentin</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Rush, Alexander M.</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">11863034</subfield>
    <subfield code="z">md5:0681badc7f0ca1c00c7157bf3c2b7876</subfield>
    <subfield code="u"></subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2020-10-01</subfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="p">user-zenodo</subfield>
    <subfield code="o"></subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Wolf, Thomas</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Transformers: State-of-the-Art Natural Language Processing</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-zenodo</subfield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Other (Open)</subfield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2"></subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">v4.11.0: GPT-J, Speech2Text2, FNet, Pipeline GPU utilization, dynamic model code loading
&lt;p&gt;Three new models are released as part of the GPT-J implementation: &lt;code&gt;GPTJModel&lt;/code&gt;, &lt;code&gt;GPTJForCausalLM&lt;/code&gt;, &lt;code&gt;GPTJForSequenceClassification&lt;/code&gt;, in PyTorch.&lt;/p&gt;
&lt;p&gt;The GPT-J model was released in the &lt;a href=""&gt;kingoflolz/mesh-transformer-jax&lt;/a&gt; repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like causal language model trained on the Pile dataset.&lt;/p&gt;
&lt;p&gt;It was contributed by @StellaAthena, @kurumuz, @EricHallahan, and @leogao2.&lt;/p&gt;
&lt;li&gt;GPT-J-6B #13022 (@StellaAthena)&lt;/li&gt;
&lt;p&gt;Compatible checkpoints can be found on the Hub: &lt;a href=""&gt;;/a&gt;&lt;/p&gt;
SpeechEncoderDecoder &amp;amp; Speech2Text2
&lt;p&gt;One new model is released as part of the Speech2Text2 implementation: &lt;code&gt;Speech2Text2ForCausalLM&lt;/code&gt;, in PyTorch.&lt;/p&gt;
&lt;p&gt;The Speech2Text2 model is used together with Wav2Vec2 for Speech Translation models proposed in &lt;a href=""&gt;Large-Scale Self- and Semi-Supervised Learning for Speech Translation&lt;/a&gt; by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.&lt;/p&gt;
&lt;p&gt;Speech2Text2 is a decoder-only transformer model that can be used with any speech encoder-only, such as Wav2Vec2 or HuBERT for Speech-to-Text tasks. Please refer to the &lt;a href=""&gt;SpeechEncoderDecoder&lt;/a&gt; class on how to combine Speech2Text2 with any speech encoder-only model.&lt;/p&gt;
&lt;li&gt;Add SpeechEncoderDecoder &amp;amp; Speech2Text2 #13186 (@patrickvonplaten)&lt;/li&gt;
&lt;p&gt;Compatible checkpoints can be found on the Hub: &lt;a href=""&gt;;/a&gt;&lt;/p&gt;
&lt;p&gt;Eight new models are released as part of the FNet implementation: &lt;code&gt;FNetModel&lt;/code&gt;, &lt;code&gt;FNetForPreTraining&lt;/code&gt;, &lt;code&gt;FNetForMaskedLM&lt;/code&gt;, &lt;code&gt;FNetForNextSentencePrediction&lt;/code&gt;, &lt;code&gt;FNetForSequenceClassification&lt;/code&gt;, &lt;code&gt;FNetForMultipleChoice&lt;/code&gt;, &lt;code&gt;FNetForTokenClassification&lt;/code&gt;, &lt;code&gt;FNetForQuestionAnswering&lt;/code&gt;,  in PyTorch.&lt;/p&gt;
&lt;p&gt;The FNet model was proposed in &lt;a href=""&gt;FNet: Mixing Tokens with Fourier Transforms&lt;/a&gt; by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT model with a fourier transform which returns only the real parts of the transform. The model is significantly faster than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97% accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model.&lt;/p&gt;
&lt;li&gt;Add FNet #13045 (@gchhablani)&lt;/li&gt;
&lt;p&gt;Compatible checkpoints can be found on the Hub: &lt;a href=""&gt;;/a&gt;&lt;/p&gt;
TensorFlow improvements
&lt;p&gt;Several bug fixes and UX improvements for Tensorflow:&lt;/p&gt;
&lt;li&gt;Users should notice much fewer unnecessary warnings and less 'console spam' in general while using Transformers with TensorFlow.&lt;/li&gt;
&lt;li&gt;TensorFlow models should be less picky about the specific integer dtypes (int32/int64) that are passed as input&lt;/li&gt;
&lt;p&gt;Changes to compile() and train_step()&lt;/p&gt;
&lt;li&gt;You can now compile our TensorFlow models without passing a loss argument! If you do, the model will compute loss internally during the forward pass and then use this value to fit() on. This makes it much more convenient to get the right loss, particularly since many models have unique losses for certain tasks that are easy to overlook and annoying to reimplement. Remember to pass your labels as the "labels" key of your input dict when doing this, so that they're accessible to the model during the forward pass. There is no change to the behavior if you pass a loss argument, so all old code should remain unaffected by this change.&lt;/li&gt;
&lt;p&gt;Associated PRs:&lt;/p&gt;
&lt;li&gt;Modified TF train_step #13678 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Fix Tensorflow T5 with int64 input #13479 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;MarianMT int dtype fix #13496 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Removed console spam from misfiring warnings #13625 (@Rocketknight1)&lt;/li&gt;
Pipeline refactor
&lt;p&gt;The pipelines underwent a large refactor that should make contributing pipelines much simpler, and much less error-prone. As part of this refactor, PyTorch-based pipelines are now optimized for GPU performance based on PyTorch's &lt;code&gt;Dataset&lt;/code&gt;s and &lt;code&gt;DataLoader&lt;/code&gt;s.&lt;/p&gt;
&lt;p&gt;See below for an example leveraging the &lt;code&gt;superb&lt;/code&gt; dataset.&lt;/p&gt;
&lt;pre&gt;&lt;code class="lang-py"&gt;pipe = pipeline(&amp;quot;automatic-speech-recognition&amp;quot;, model=&amp;quot;facebook/wav2vec2-base-960h&amp;quot;, device=0)
dataset = datasets.load_dataset(&amp;quot;superb&amp;quot;, name=&amp;quot;asr&amp;quot;, split=&amp;quot;test&amp;quot;)

# KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
# as we&amp;#39;re not interested in the `target` part of the dataset.
for out in tqdm.tqdm(pipe(KeyDataset(dataset, &amp;quot;file&amp;quot;))):
    # {&amp;quot;text&amp;quot;: &amp;quot;NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND&amp;quot;}
    # {&amp;quot;text&amp;quot;: ....}
    # ....
&lt;li&gt;[Large PR] Entire rework of pipelines. #13308 (@Narsil)&lt;/li&gt;
Audio classification pipeline
&lt;p&gt;Additionally, an additional pipeline is available, for audio classification.&lt;/p&gt;
&lt;li&gt;Add the &lt;code&gt;AudioClassificationPipeline&lt;/code&gt; #13342 (@anton-l)&lt;/li&gt;
&lt;li&gt;Enabling automatic loading of tokenizer with &lt;code&gt;pipeline&lt;/code&gt; for &lt;code&gt;audio-classification&lt;/code&gt;. #13376 (@Narsil)&lt;/li&gt;
Setters for common properties
&lt;p&gt;Version v4.11.0 introduces setters for common configuration properties. Different configurations have different properties as coming from different implementations.&lt;/p&gt;
&lt;p&gt;One such example is the &lt;code&gt;BertConfig&lt;/code&gt; having the &lt;code&gt;hidden_size&lt;/code&gt; attribute, while the &lt;code&gt;GPT2Config&lt;/code&gt; has the &lt;code&gt;n_embed&lt;/code&gt; attribute, which are essentially the same.&lt;/p&gt;
&lt;p&gt;The newly introduced setters allow setting such properties through a standardized naming scheme, even on configuration objects that do not have them by default.&lt;/p&gt;
&lt;p&gt;See the following code sample for an example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;from transformers import GPT2Config
config = GPT2Config()

config.hidden_size = 4  # Failed previously
config = GPT2Config(hidden_size =4)  # Failed previously

config.n_embed  # returns 4
config.hidden_size  # returns 4
&lt;li&gt;Update model configs - Allow setters for common properties #13026 (@nreimers)&lt;/li&gt;
Dynamic model code loading
&lt;p&gt;An experimental feature adding support for model files hosted on the hub is added as part of this release. A walkthrough is available in the &lt;a href=""&gt;PR description&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;:warning: This means that code files will be fetched from the hub to be executed locally. An additional argument, &lt;code&gt;trust_remote_code&lt;/code&gt; is required when instantiating the model from the hub. We heavily encourage you to also specify a &lt;code&gt;revision&lt;/code&gt; if using code from another user's or organization's repository.&lt;/p&gt;
&lt;li&gt;Dynamically load model code from the Hub #13467 (@sgugger)&lt;/li&gt;
&lt;p&gt;The &lt;code&gt;Trainer&lt;/code&gt; has received several new features, the main one being that models are uploaded to the Hub each time you save them locally (you can specify another strategy). This push is asynchronous, so training continues normally without interruption.&lt;/p&gt;
&lt;li&gt;The SigOpt optimization framework is now integrated in the &lt;code&gt;Trainer&lt;/code&gt; API as an opt-in component.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Trainer&lt;/code&gt; API now supports fine-tuning on distributed CPUs.&lt;/li&gt;
&lt;p&gt;Associated PRs:&lt;/p&gt;
&lt;li&gt;Push to hub when saving checkpoints #13503 (@sgugger)&lt;/li&gt;
&lt;li&gt;Add SigOpt HPO to transformers trainer api #13572 (@kding1)&lt;/li&gt;
&lt;li&gt;Add cpu distributed fine-tuning support for transformers Trainer API #13574 (@kding1)&lt;/li&gt;
Model size CPU memory usage reduction
&lt;p&gt;The memory required to load a model in memory using PyTorch's &lt;code&gt;torch.load&lt;/code&gt; requires twice the amount of memory necessary. An experimental feature allowing model loading while requiring only the model size in terms of memory usage is out in version v4.11.0.&lt;/p&gt;
&lt;p&gt;It can be used by using the &lt;code&gt;low_cpu_mem_usage=True&lt;/code&gt; argument with PyTorch pretrained models.&lt;/p&gt;
&lt;li&gt;1x model size CPU memory usage for &lt;code&gt;from_pretrained&lt;/code&gt; #13466 (@stas00)&lt;/li&gt;
GPT-Neo: simplified local attention
&lt;p&gt;The GPT-Neo local attention was greatly simplified with no loss of performance.&lt;/p&gt;
&lt;li&gt;[GPT-Neo] Simplify local attention #13491 (@finetuneanon, @patil-suraj)&lt;/li&gt;
Breaking changes
&lt;p&gt;&lt;em&gt;We strive for no breaking changes between releases - however, some bugs are not discovered for long periods of time, and users may eventually rely on such bugs. We document here such changes that may affect users when updating to a recent version.&lt;/em&gt;&lt;/p&gt;
Order of overflowing tokens
&lt;p&gt;The overflowing tokens returned by the slow tokenizers were returned in the wrong order. This is changed in the PR below.&lt;/p&gt;
&lt;li&gt;Correct order of overflowing_tokens for slow tokenizer #13179 (@Apoorvgarg-creator)&lt;/li&gt;
Non-prefixed tokens for token classification pipeline
&lt;p&gt;Updates the behavior of &lt;code&gt;aggregation_strategy&lt;/code&gt; to more closely mimic the deprecated &lt;code&gt;grouped_entities&lt;/code&gt; pipeline argument.&lt;/p&gt;
&lt;li&gt;Fixing backward compatiblity for non prefixed tokens (B-, I-). #13493 (@Narsil)&lt;/li&gt;
Inputs normalization for Wav2Vec2 feature extractor
&lt;p&gt;The changes in v4.10 (#12804) introduced a bug in inputs normalization for non-padded tensors that affected Wav2Vec2 fine-tuning.
This is fixed in the PR below.&lt;/p&gt;
&lt;li&gt;[Wav2Vec2] Fix normalization for non-padded tensors #13512 (@patrickvonplaten)&lt;/li&gt;
General bug fixes and improvements
&lt;li&gt;Fixes for the documentation #13361 (@sgugger)&lt;/li&gt;
&lt;li&gt;fix wrong 'cls' masking for bigbird qa model output #13143 (@donggyukimc)&lt;/li&gt;
&lt;li&gt;Improve T5 docs #13240 (@NielsRogge)&lt;/li&gt;
&lt;li&gt;Fix tokenizer saving during training with &lt;code&gt;Trainer&lt;/code&gt; #12806 (@SaulLu)&lt;/li&gt;
&lt;li&gt;Fix DINO #13369 (@NielsRogge)&lt;/li&gt;
&lt;li&gt;Properly register missing submodules in main init #13372 (@sgugger)&lt;/li&gt;
&lt;li&gt;Add &lt;code&gt;Hubert&lt;/code&gt; to the &lt;code&gt;AutoFeatureExtractor&lt;/code&gt; #13366 (@anton-l)&lt;/li&gt;
&lt;li&gt;Add missing feature extractors #13374 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Fix RemBERT tokenizer initialization #13375 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;[Flax] Fix BigBird #13380 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[GPU Tests] Fix SpeechEncoderDecoder GPU tests #13383 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Fix name and get_class method in AutoFeatureExtractor #13385 (@sgugger)&lt;/li&gt;
&lt;li&gt;[Flax/run_hybrid_clip] Fix duplicating images when captions_per_image exceeds the number of captions, enable truncation #12752 (@edugp)&lt;/li&gt;
&lt;li&gt;Move Flax self-push to test machine #13364 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Torchscript test #13350 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Torchscript test for DistilBERT #13351 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Torchscript test for ConvBERT #13352 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Torchscript test for Flaubert #13353 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Fix GPT-J _CHECKPOINT_FOR_DOC typo #13368 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Update clip loss calculation #13217 (@sachinruk)&lt;/li&gt;
&lt;li&gt;Add LayoutXLM tokenizer docs #13373 (@NielsRogge)&lt;/li&gt;
&lt;li&gt;[doc] fix mBART example #13387 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;[docs] Update perplexity.rst to use negative log likelihood #13386 (@madaan)&lt;/li&gt;
&lt;li&gt;[Tests] Fix SpeechEncoderDecoder tests #13395 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[SpeechEncoderDecoder] Fix final test #13396 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;✨ Add PyTorch image classification example #13134 (@nateraw)&lt;/li&gt;
&lt;li&gt;Fix tests without any real effect in EncoderDecoderMixin #13406 (@ydshieh)&lt;/li&gt;
&lt;li&gt;Fix scheduled tests for &lt;code&gt;SpeechEncoderDecoderModel&lt;/code&gt; #13422 (@anton-l)&lt;/li&gt;
&lt;li&gt;add torchvision in example test requirements #13438 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;[EncoderDecoder] Fix torch device in tests #13448 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Adding a test for multibytes unicode. #13447 (@Narsil)&lt;/li&gt;
&lt;li&gt;skip image classification example test #13451 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;Add TAPAS MLM-only models #13408 (@NielsRogge)&lt;/li&gt;
&lt;li&gt;Fix scheduled TF Speech tests #13403 (@anton-l)&lt;/li&gt;
&lt;li&gt;Update version of &lt;code&gt;packaging&lt;/code&gt; package #13454 (@shivdhar)&lt;/li&gt;
&lt;li&gt;Update #13421 (@anukaal)&lt;/li&gt;
&lt;li&gt;Fix img classification tests #13456 (@nateraw)&lt;/li&gt;
&lt;li&gt;Making it raise real errors on ByT5. #13449 (@Narsil)&lt;/li&gt;
&lt;li&gt;Optimized bad word ids #13433 (@guillaume-be)&lt;/li&gt;
&lt;li&gt;Use powers of 2 in download size calculations #13468 (@anton-l)&lt;/li&gt;
&lt;li&gt;[docs] update dead quickstart link on resuing past for GPT2 #13455 (@shabie)&lt;/li&gt;
&lt;li&gt;fix CLIP conversion script. #13474 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;Deprecate Mirror #13470 (@JetRunner)&lt;/li&gt;
&lt;li&gt;[CLIP] fix logit_scale init #13436 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;Don't modify labels inplace in &lt;code&gt;LabelSmoother&lt;/code&gt; #13464 (@sgugger)&lt;/li&gt;
&lt;li&gt;Enable automated model list copying for localized READMEs #13465 (@qqaatw)&lt;/li&gt;
&lt;li&gt;Better error raised when cloned without lfs #13401 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Throw ValueError for mirror downloads #13478 (@JetRunner)&lt;/li&gt;
&lt;li&gt;Fix Tensorflow T5 with int64 input #13479 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Object detection pipeline #12886 (@mishig25)&lt;/li&gt;
&lt;li&gt;Typo in "end_of_word_suffix" #13477 (@KoichiYasuoka)&lt;/li&gt;
&lt;li&gt;Fixed the MultilabelTrainer document, which would cause a potential bug when executing the code originally documented. #13414 (@Mohan-Zhang-u)&lt;/li&gt;
&lt;li&gt;Fix integration tests for &lt;code&gt;TFWav2Vec2&lt;/code&gt; and &lt;code&gt;TFHubert&lt;/code&gt; #13480 (@anton-l)&lt;/li&gt;
&lt;li&gt;Fix typo in deepspeed documentation #13482 (@apohllo)&lt;/li&gt;
&lt;li&gt;flax ner example #13365 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;Fix typo in documentation #13494 (@apohllo)&lt;/li&gt;
&lt;li&gt;MarianMT int dtype fix #13496 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;[Tentative] Moving slow tokenizer to the Trie world. #13220 (@Narsil)&lt;/li&gt;
&lt;li&gt;Refactor internals for Trainer push_to_hub #13486 (@sgugger)&lt;/li&gt;
&lt;li&gt;examples: minor fixes in flax example readme #13502 (@stefan-it)&lt;/li&gt;
&lt;li&gt;[Wav2Vec2] Fix normalization for non-padded tensors #13512 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;TF multiple choice loss fix #13513 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;[Wav2Vec2] Fix dtype 64 bug #13517 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;fix PhophetNet 'use_cache' assignment of no effect #13532 (@holazzer)&lt;/li&gt;
&lt;li&gt;Ignore &lt;code&gt;past_key_values&lt;/code&gt; during GPT-Neo inference #13521 (@aphedges)&lt;/li&gt;
&lt;li&gt;Fix attention mask size checking for CLIP #13535 (@Renovamen)&lt;/li&gt;
&lt;li&gt;[Speech2Text2] Skip newly added tokenizer test #13536 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[Speech2Text] Give feature extraction higher tolerance #13538 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[tokenizer] use use_auth_token for config #13523 (@stas00)&lt;/li&gt;
&lt;li&gt;Small changes in &lt;code&gt;perplexity.rst&lt;/code&gt;to make the notebook executable on google collaboratory #13541 (@SaulLu)&lt;/li&gt;
&lt;li&gt;[Feature Extractors] Return attention mask always in int32 #13543 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Nightly torch ci #13550 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Add long overdue link to the Google TRC project #13501 (@avital)&lt;/li&gt;
&lt;li&gt;Fixing #13381 #13400 (@Narsil)&lt;/li&gt;
&lt;li&gt;fixing BC in &lt;code&gt;fill-mask&lt;/code&gt; (wasn't tested in theses test suites apparently). #13540 (@Narsil)&lt;/li&gt;
&lt;li&gt;add flax mbart in auto seq2seq lm #13560 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;[Flax] Addition of FlaxPegasus #13420 (@bhadreshpsavani)&lt;/li&gt;
&lt;li&gt;Add checks to build cleaner model cards #13542 (@sgugger)&lt;/li&gt;
&lt;li&gt;separate model card git push from the rest #13514 (@elishowk)&lt;/li&gt;
&lt;li&gt;Fix test_fetcher when setup is updated #13566 (@sgugger)&lt;/li&gt;
&lt;li&gt;[Flax] Fixes typo in Bart based Flax Models #13565 (@bhadreshpsavani)&lt;/li&gt;
&lt;li&gt;Fix GPTNeo onnx export #13524 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;upgrade sentencepiece version #13564 (@elishowk)&lt;/li&gt;
&lt;li&gt;[Pretrained Model] Add resize_position_embeddings #13559 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[ci] nightly: add deepspeed master #13589 (@stas00)&lt;/li&gt;
&lt;li&gt;[Tests] Disable flaky s2t test #13585 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Correct device when resizing position embeddings #13593 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Fix DataCollatorForSeq2Seq when labels are supplied as Numpy array instead of list #13582 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Fix a pipeline test with the newly updated weights #13608 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Fix make fix-copies with type annotations #13586 (@sgugger)&lt;/li&gt;
&lt;li&gt;DataCollatorForTokenClassification numpy fix #13609 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Feature Extractor: Wav2Vec2 &amp;amp; Speech2Text - Allow truncation + padding=longest #13600 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[deepspeed] replaced deprecated init arg #13587 (@stas00)&lt;/li&gt;
&lt;li&gt;Properly use test_fetcher for examples #13604 (@sgugger)&lt;/li&gt;
&lt;li&gt;XLMR tokenizer is fully picklable #13577 (@ben-davidson-6)&lt;/li&gt;
&lt;li&gt;Optimize Token Classification models for TPU #13096 (@ibraheem-moosa)&lt;/li&gt;
&lt;li&gt;[Trainer] Add nan/inf logging filter #13619 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Fix special tokens not correctly tokenized #13489 (@qqaatw)&lt;/li&gt;
&lt;li&gt;Removed console spam from misfiring warnings #13625 (@Rocketknight1)&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;config_dict_or_path&lt;/code&gt; for #13614 (@aphedges)&lt;/li&gt;
&lt;li&gt;Fixes issues with backward pass in LED/Longformer Self-attention #13613 (@aleSuglia)&lt;/li&gt;
&lt;li&gt;fix some docstring in encoder-decoder models #13611 (@ydshieh)&lt;/li&gt;
&lt;li&gt;Updated tiny distilbert models #13631 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Fix GPT2Config parameters in GPT2ModelTester #13630 (@calpt)&lt;/li&gt;
&lt;li&gt;[run_summarization] fix typo #13647 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;[Fix]Make sure the args tb_writer passed to the TensorBoardCallback works #13636 (@iamlockelightning)&lt;/li&gt;
&lt;li&gt;Fix mT5 documentation #13639 (@ayaka14732)&lt;/li&gt;
&lt;li&gt;Update #13654 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;[megatron_gpt2] checkpoint v3 #13508 (@stas00)&lt;/li&gt;
&lt;li&gt;Change https:/ to https:// to dataset GitHub repo #13644 (@flozi00)&lt;/li&gt;
&lt;li&gt;fix research_projects/mlm_wwm examples #13646 (@LowinLi)&lt;/li&gt;
&lt;li&gt;Fix typo distilbert doc to code link #13643 (@flozi00)&lt;/li&gt;
&lt;li&gt;Add Speech AutoModels #13655 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;beit-flax #13515 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;[FLAX] Question Answering Example #13649 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;Typo "UNKWOWN" -&amp;gt; "UNKNOWN" #13675 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;[SequenceFeatureExtractor] Rewrite padding logic from pure python to numpy #13650 (@anton-l)&lt;/li&gt;
&lt;li&gt;[SinusoidalPositionalEmbedding] incorrect dtype when resizing in &lt;code&gt;forward&lt;/code&gt; #13665 (@stas00)&lt;/li&gt;
&lt;li&gt;Add push_to_hub to no_trainer examples #13659 (@sgugger)&lt;/li&gt;
&lt;li&gt;Layoutlm onnx support (Issue #13300) #13562 (@nishprabhu)&lt;/li&gt;
&lt;li&gt;Update #13680 (@kamalkraj)&lt;/li&gt;
&lt;li&gt;[FlaxWav2Vec2] Revive Test #13688 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[AutoTokenizer] Allow creation of tokenizers by tokenizer type #13668 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[Wav2Vec2FeatureExtractor] Fix &lt;code&gt;extractor.pad()&lt;/code&gt; dtype backwards compatibility #13693 (@anton-l)&lt;/li&gt;
&lt;li&gt;Make gradient_checkpointing a training argument #13657 (@sgugger)&lt;/li&gt;
&lt;li&gt;Assertions to exceptions #13692 (@MocktaiLEngineer)&lt;/li&gt;
&lt;li&gt;Fix non-negligible difference between GPT2 and TFGP2 #13679 (@ydshieh)&lt;/li&gt;
&lt;li&gt;Allow only textual inputs to VisualBert #13687 (@gchhablani)&lt;/li&gt;
&lt;li&gt;Patch training arguments issue #13699 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Patch training arguments issue #13700 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;[GPT-J] Use the &lt;code&gt;float16&lt;/code&gt; checkpoints in integration tests #13676 (@anton-l)&lt;/li&gt;
&lt;li&gt;[docs/gpt-j] add a note about tokenizer #13696 (@patil-suraj)&lt;/li&gt;
&lt;li&gt;Fix FNet reference to tpu short seq length #13686 (@gchhablani)&lt;/li&gt;
&lt;li&gt;Add BlenderBot small tokenizer to the init #13367 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Fix typo in torchscript tests #13701 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;Handle &lt;code&gt;UnicodeDecodeError&lt;/code&gt; when loading config file #13717 (@qqaatw)&lt;/li&gt;
&lt;li&gt;Add FSNER example in research_projects #13712 (@sayef)&lt;/li&gt;
&lt;li&gt;Replace torch.set_grad_enabled by torch.no_grad #13703 (@LysandreJik)&lt;/li&gt;
&lt;li&gt;[ASR] Add official ASR CTC example to &lt;code&gt;examples/pytorch/speech-recognition&lt;/code&gt; #13620 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Make assertions only if actually chunking forward #13598 (@joshdevins)&lt;/li&gt;
&lt;li&gt;Use torch.unique_consecutive to check elements are same #13637 (@oToToT)&lt;/li&gt;
&lt;li&gt;Fixing zero-shot backward compatiblity #13725 (@Narsil)&lt;/li&gt;
&lt;li&gt;[Tests] FNetTokenizer #13729 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Warn for unexpected argument combinations #13509 (@shirayu)&lt;/li&gt;
&lt;li&gt;Add model card creation snippet to example scripts #13730 (@gchhablani)&lt;/li&gt;
&lt;li&gt;[Examples] speech recognition - remove gradient checkpointing #13733 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Update test dependence for torch examples #13738 (@sgugger)&lt;/li&gt;
&lt;li&gt;[Tests] Add decorator to FlaxBeit #13743 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;Update requirements for speech example #13745 (@sgugger)&lt;/li&gt;
&lt;li&gt;[Trainer] Make sure shown loss in distributed training is correctly averaged over all workers #13681 (@patrickvonplaten)&lt;/li&gt;
&lt;li&gt;[megatron gpt checkpoint conversion] causal mask requires pos_embed dimension #13735 (@stas00)&lt;/li&gt;
&lt;li&gt;[Tests] Cast Hubert model tests to fp16 #13755 (@anton-l)&lt;/li&gt;
&lt;li&gt;Fix type annotations for &lt;code&gt;distributed_concat()&lt;/code&gt; #13746 (@Renovamen)&lt;/li&gt;
&lt;li&gt;Fix loss computation in Trainer #13760 (@sgugger)&lt;/li&gt;
&lt;li&gt;Silence warning in gradient checkpointing when it's False #13734 (@sgugger)&lt;/li&gt;
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">url</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a"></subfield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3385997</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.5532744</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
All versions This version
Views 34,343337
Downloads 1,19812
Data volume 8.9 GB142.4 MB
Unique views 28,581262
Unique downloads 59212


Cite as