Published March 16, 2025 | Version v2
Model Open

Mellow: small audio language model for reasoning

  • 1. ROR icon Carnegie Mellon University

Description

Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.

The code repository is: soham97/mellow: small audio language model for reasoning

Files

reasonaqa.zip

Files (780.7 MB)

Name Size Download all
md5:258cb947dda6b558a27660198f39c986
110.5 MB Preview Download
md5:909c504703805a291a109fdea109a182
670.2 MB Download

Additional details

Software

Repository URL
https://github.com/soham97/mellow
Programming language
Python