Mellow: small audio language model for reasoning

Deshmukh, Soham; Dixit, Satvik; Singh, Rita; Raj, Bhiksha

doi:10.5281/zenodo.15036628

Published March 16, 2025 | Version v2

Model Open

Mellow: small audio language model for reasoning

1. Carnegie Mellon University

Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.

The code repository is: soham97/mellow: small audio language model for reasoning

Files

reasonaqa.zip

Files (780.7 MB)

Name	Size
reasonaqa.zip md5:258cb947dda6b558a27660198f39c986	110.5 MB	Preview Download
v0.ckpt md5:909c504703805a291a109fdea109a182	670.2 MB	Download

Additional details

Repository URL: https://github.com/soham97/mellow
Programming language: Python

635

Views

409

Downloads

Show more details

	All versions	This version
Views	635	452
Downloads	409	339
Data volume	181.7 GB	132.1 GB

More info on how stats are collected....

DOI

Resource type

Model

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 16, 2025
Modified: March 16, 2025

Mellow: small audio language model for reasoning

Authors/Creators

Description

Files

reasonaqa.zip

Files (780.7 MB)

Additional details

Software