CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

Benjamin Elizalde; Soham Deshmukh; Mahmoud Al Ismail; Huaming Wang

doi:10.5281/zenodo.7312125

There is a newer version of the record available.

Published November 11, 2022 | Version v0

Other Open

CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

CLAP (Contrastive Language-Audio Pretraining) is a neural network model that learns acoustic concepts from natural language supervision. It achieved SoTA in “Zero-Shot” classification, Audio-Text & Text-Audio Retrieval, and in some datasets when finetuned.

Weights for the Microsoft CLAP model published in 2022. Refer to the GitHub repository for the code.

microsoft/CLAP: Learning audio concepts from natural language supervision (github.com)

Files

Files (2.3 GB)

Name	Size	Download all
CLAP_weights_2022.pth md5:0731ffb09d8567ba5610be34aa577a62	2.3 GB	Download

Views

Downloads

Show more details

	All versions	This version
Views	3,128	1,441
Downloads	1,645	398
Data volume	3.6 TB	1.2 TB

More info on how stats are collected....

DOI

Resource type

Other

Publisher

Zenodo

License: Creative Commons Attribution 3.0 United States

No further description. Read more

Technical metadata

Created: November 11, 2022
Modified: September 26, 2023

CLAP: Learning Audio Concepts From Natural Language Supervision (Pretrained Model)

Creators

Description

Files

Files (2.3 GB)