Published March 16, 2025
| Version v2
Model
Open
Mellow: small audio language model for reasoning
Description
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
The code repository is: soham97/mellow: small audio language model for reasoning
The code repository is: soham97/mellow: small audio language model for reasoning
Files
reasonaqa.zip
Files
(780.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:258cb947dda6b558a27660198f39c986
|
110.5 MB | Preview Download |
|
md5:909c504703805a291a109fdea109a182
|
670.2 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/soham97/mellow
- Programming language
- Python