Published September 25, 2023 | Version v1
Other Open

CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

Description

CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.

Weights for the Microsoft CLAP model published in 2023 and 2022. clapcap is the audio captioning model that uses the 2023 encoders.

Refer to the GitHub repository for the code.

microsoft/CLAP: Learning audio concepts from natural language supervision (github.com)

 

Files

Files (4.7 GB)

Name Size Download all
md5:0731ffb09d8567ba5610be34aa577a62
2.3 GB Download
md5:1006a9206ccb48982dfb3b46581b8a27
690.0 MB Download
md5:521913b023dcc38853d3c27ad177a997
1.7 GB Download