Published May 19, 2023 | Version v2
Conference paper Open

Pengi 🐧: An Audio Language Model for Audio Tasks

  • 1. Microsoft
  • 2. Carnegie Mellon University

Description

Pengi is an Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions.

The code repository is: microsoft/Pengi: An Audio Language model for Audio Tasks (github.com)

Files

Files (4.1 GB)

Name Size Download all
md5:af1b59432bbd383999d3cd4a3e2e0a5d
2.4 GB Download
md5:87550397e4cc3c1e34bde2fb765988e9
1.7 GB Download