Published February 12, 2026 | Version v1
Dataset Open

Bacterial Colony Images on Multiple Media (10-class dataset)

  • 1. Pakistan Supercomputing
  • 2. PakASIC
  • 3. Namal University
  • 4. UCERD

Description

# Bacterial Colony Images on Multiple Media (10-class dataset)

This repository provides **documentation, metadata, scripts, and a lightweight GitHub subset** of a bacterial colony image dataset organized by **species × culture medium** (10 classes).

✅ **Full dataset (recommended for research use):** hosted on **Zenodo** (~9.2 GB)  
✅ **GitHub subset:** a reduced version (~1.9 GB) containing images from **all 10 classes**, intended for quick access, preview, and reproducible folder structure (not the complete set).

---

### Zenodo (full archival dataset)
- Full-resolution dataset (**~9.2 GB**) intended for experiments and publications.
- DOI-based, citable, versioned archive.

---

## Quick facts (Full Zenodo release)

- **Total images (full):** 2317  
- **File formats:** JPG, PNG  
- **Classes:** 10 (4 species across multiple media)  
- **Full dataset size:** ~9.2 GB (Zenodo)  
- **GitHub subset size:** ~1.9 GB (this repo; not complete)  
- **Imaging protocol (paper):** fixed ≥16 MP camera, controlled lighting, ~30 cm standoff; consistent framing and preprocessing.

---

## Class taxonomy (species × medium) — Full dataset counts (Zenodo)

| Species | Medium (folder name) | Images (full) |
|---|---|---:|
| E_Coli | E_Coli on EMB agar medium | 203 |
| E_Coli | E_Coli on MacConkey_Agar medium | 203 |
| E_Coli | E_Coli on Nutrients agar medium | 216 |
| Salmonella | Salmonella on XLD agar medium | 211 |
| Salmonella | Salmonella on MacConkey agar medium | 215 |
| Salmonella | Salmonella on Nutrients agar medium | 294 |
| Enterococcus | Enterococcus on Slantz_and_Bartley agar medium | 208 |
| Enterococcus | Enterococcus on Nutrients agar medium | 285 |
| Staphylococcus | Staphylococcus on MSA agar medium | 175 |
| Staphylococcus | Staphylococcus on Nutrient Agar | 307 |
| **Total** |  | **2317** |

> Note: The **GitHub subset contains fewer images than the table above** (subset is ~1.9 GB).  
> The table reflects the **full Zenodo release**.

---

 

Description_file/conversations.jsonl

is a JSON Lines annotation file (one JSON object per line) that links each colony image to multiple natural-language question–answer pairs in an instruction/VQA format—each record contains an id, an image filename (matching the corresponding file under Images/...), and a conversations array with a human question and an assistant answer (keys like {"from":"human","value":...} and {"from":"gpt","value":...}); in the current subset it includes 16,501 Q/A items covering 592 images, with questions designed to capture colony morphology and diagnostic cues (e.g., colour, size range, margin, elevation, surface texture, opacity), growth/distribution patterns (crowding, isolation suitability, approximate count, purity/contamination hints), and medium-specific indicators (e.g., lactose fermentation cues, EMB sheen, swarming/motility, hemolysis), so it can be used to (i) fine-tune or benchmark vision-language models for colony interpretation, (ii) build an interactive teaching/QA assistant for microbiology lab training, (iii) generate standardized captions/attribute labels for weak supervision, retrieval, or dataset search, and (iv) reproduce the exact prompts/answers used in downstream experiments by grouping records by image and pairing them with the corresponding plate image files in the GitHub subset or Zenodo archive.

 

Files

1.E_Coli.zip

Files (10.7 GB)

Name Size Download all
md5:6da0b29b8fadecf00d652a4e59502752
3.9 GB Preview Download
md5:72cfd3a906f7ee295ddeef4ef784303b
1.1 GB Preview Download
md5:359d358c4d87fdac3df33d0630907cd4
2.4 GB Preview Download
md5:f57d1340a5f8b5f39cf98993d5947ad5
340.2 MB Preview Download
md5:7bb341e0f9b9818430af5130e6c9de62
3.0 GB Preview Download
md5:518acfabada51b1deaecf0e6254c78c5
222 Bytes Preview Download
md5:e158bea1b95c2d2224441edb263d2ddd
780 Bytes Download
md5:3896ab71b2882c923c7154f7904bef89
2.3 kB Preview Download
md5:7a508ff01d61ce64631718f86d51ab8e
219 Bytes Download
md5:756de7979449512bd0aad091cf766365
3.1 kB Preview Download
md5:18f87854aa0e382d9b00d1fbfda3875b
738 Bytes Preview Download