Published May 19, 2023
| Version v2
Conference paper
Open
Pengi 🐧: An Audio Language Model for Audio Tasks
- 1. Microsoft
- 2. Carnegie Mellon University
Description
Pengi is an Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions.
The code repository is: microsoft/Pengi: An Audio Language model for Audio Tasks (github.com)
Files
Files
(4.1 GB)
Name | Size | Download all |
---|---|---|
md5:af1b59432bbd383999d3cd4a3e2e0a5d
|
2.4 GB | Download |
md5:87550397e4cc3c1e34bde2fb765988e9
|
1.7 GB | Download |