There is a newer version of this record available.

Dataset Open Access

HumBugDB: a large-scale acoustic mosquito dataset

Ivan Kiskin; Lawrence Wang; Marianne Sinka; Adam D. Cobb; Benjamin Gutteridge; Davide Zilli; Waqas Rafique; Rinita Dam; Theodoros Marinos; Yunpeng Li; Gerard Killeen; Dickson Msaky; Emmanuel Kaindoa; Kathy Willis; Steve J. Roberts

A large-scale multi-species dataset of acoustic recordings

Dataset accompanying code and paper: HumBugDB: a large-scale acoustic mosquito dataset.

A large-scale multi-species dataset containing recordings of mosquitoes collected from multiple locations globally, as well as via different collection methods. In total, we present 71,286 seconds (20 hours) of labelled mosquito data with 53,227 seconds (15 hours) of corresponding background noise, recorded at the sites of 8 experiments.  Of these, 64,843 seconds contain species metadata, consisting of 36 species (or species complexes).

This repository contains:

This data is supplemented by a GitHub repository,, which aids as follows:

  • The multi-part zip is intended to be extracted into the folder: /data/audio/ in the repository.
  • Latest metadata is hosted on GitHub to allow the modification of additional metadata as it becomes available in the database or bug-fixing.
  • Documentation for code use, and a complete Datasheet for Datasets also available on GitHub.
  • Example code for data splitting, feature extraction, model training, and evaluation in the top-level notebook main.ipynb.
  • Bayesian Convolutional Neural Network models, in both Keras and PyTorch, trained on this data available at GitHub release v1.0


Funding from the 2014 Google Impact Challenge Award, and The Bill and Melinda Gates Foundation (
Files (4.1 GB)
Name Size
1.0 GB Download
1.1 GB Download
1.3 GB Download
592.8 MB Download
1.5 MB Download
All versions This version
Views 1,3141,126
Downloads 1,3941,210
Data volume 1.5 TB1.1 TB
Unique views 1,092973
Unique downloads 474386


Cite as