CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

Benjamin Elizalde; Soham Deshmukh; Mahmoud Al Ismail; Huaming Wang

doi:10.5281/zenodo.8378278

Published September 25, 2023 | Version v1

Other Open

CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.

Weights for the Microsoft CLAP model published in 2023 and 2022. clapcap is the audio captioning model that uses the 2023 encoders.

Refer to the GitHub repository for the code.

microsoft/CLAP: Learning audio concepts from natural language supervision (github.com)

Files

Files (4.7 GB)

Name	Size	Download all
CLAP_weights_2022.pth md5:0731ffb09d8567ba5610be34aa577a62	2.3 GB	Download
CLAP_weights_2023.pth md5:1006a9206ccb48982dfb3b46581b8a27	690.0 MB	Download
clapcap_weights_2023.pth md5:521913b023dcc38853d3c27ad177a997	1.7 GB	Download

	All versions	This version
Views	3,176	1,625
Downloads	1,678	1,185
Data volume	3.6 TB	2.2 TB

CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

Creators

Description

Files

Files (4.7 GB)