Software Open Access
Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Perric; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander M.
WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
WavLM sets a new SOTA on the SUPERB benchmark.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm
Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli. Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognitionUniSpeech-SAT
Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
UniSpeech-SAT is especially good at speaker related tasks.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-satUniSpeech
Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Three new models are released as part of the ImageGPT integration:
ImageGPTForImageClassification, in PyTorch.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeechNew Tasks Speaker Diarization and Verification
Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures. You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification
require_datasetstesting utility by @LysandreJik in https://github.com/huggingface/transformers/pull/14795
generateby @lvwerra in https://github.com/huggingface/transformers/pull/14779
FlaxMarianMTModelreturn block. by @sgugger in https://github.com/huggingface/transformers/pull/14873
RobertaTokenizerFastby @SaulLu in https://github.com/huggingface/transformers/pull/14752