Pengi 🐧: An Audio Language Model for Audio Tasks

Deshmukh, Soham; Elizalde, Benjamin; Singh, Rita; Wang, Huaming

doi:10.5281/zenodo.8387083

Published May 19, 2023 | Version v2

Conference paper Open

Pengi 🐧: An Audio Language Model for Audio Tasks

1. Microsoft
2. Carnegie Mellon University

Pengi is an Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions.

The code repository is: microsoft/Pengi: An Audio Language model for Audio Tasks (github.com)

Files

Files (4.1 GB)

Name	Size
base.pth md5:af1b59432bbd383999d3cd4a3e2e0a5d	2.4 GB	Download
base_no_text_enc.pth md5:87550397e4cc3c1e34bde2fb765988e9	1.7 GB	Download

Views

Downloads

Show more details

	All versions	This version
Views	1,971	1,800
Downloads	1,962	1,903
Data volume	6.2 TB	6.1 TB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: September 28, 2023
Modified: September 29, 2023

Pengi 🐧: An Audio Language Model for Audio Tasks

Authors/Creators

Description

Files

Files (4.1 GB)