Bacterial Colony Images on Multiple Media (10-class dataset)
Authors/Creators
- 1. Pakistan Supercomputing
- 2. PakASIC
- 3. Namal University
- 4. UCERD
Description
# Bacterial Colony Images on Multiple Media (10-class dataset)
This repository provides **documentation, metadata, scripts, and a lightweight GitHub subset** of a bacterial colony image dataset organized by **species × culture medium** (10 classes).
✅ **Full dataset (recommended for research use):** hosted on **Zenodo** (~9.2 GB)
✅ **GitHub subset:** a reduced version (~1.9 GB) containing images from **all 10 classes**, intended for quick access, preview, and reproducible folder structure (not the complete set).
---
### Zenodo (full archival dataset)
- Full-resolution dataset (**~9.2 GB**) intended for experiments and publications.
- DOI-based, citable, versioned archive.
---
## Quick facts (Full Zenodo release)
- **Total images (full):** 2317
- **File formats:** JPG, PNG
- **Classes:** 10 (4 species across multiple media)
- **Full dataset size:** ~9.2 GB (Zenodo)
- **GitHub subset size:** ~1.9 GB (this repo; not complete)
- **Imaging protocol (paper):** fixed ≥16 MP camera, controlled lighting, ~30 cm standoff; consistent framing and preprocessing.
---
## Class taxonomy (species × medium) — Full dataset counts (Zenodo)
| Species | Medium (folder name) | Images (full) |
|---|---|---:|
| E_Coli | E_Coli on EMB agar medium | 203 |
| E_Coli | E_Coli on MacConkey_Agar medium | 203 |
| E_Coli | E_Coli on Nutrients agar medium | 216 |
| Salmonella | Salmonella on XLD agar medium | 211 |
| Salmonella | Salmonella on MacConkey agar medium | 215 |
| Salmonella | Salmonella on Nutrients agar medium | 294 |
| Enterococcus | Enterococcus on Slantz_and_Bartley agar medium | 208 |
| Enterococcus | Enterococcus on Nutrients agar medium | 285 |
| Staphylococcus | Staphylococcus on MSA agar medium | 175 |
| Staphylococcus | Staphylococcus on Nutrient Agar | 307 |
| **Total** | | **2317** |
> Note: The **GitHub subset contains fewer images than the table above** (subset is ~1.9 GB).
> The table reflects the **full Zenodo release**.
---
Description_file/conversations.jsonl
is a JSON Lines annotation file (one JSON object per line) that links each colony image to multiple natural-language question–answer pairs in an instruction/VQA format—each record contains an id, an image filename (matching the corresponding file under Images/...), and a conversations array with a human question and an assistant answer (keys like {"from":"human","value":...} and {"from":"gpt","value":...}); in the current subset it includes 16,501 Q/A items covering 592 images, with questions designed to capture colony morphology and diagnostic cues (e.g., colour, size range, margin, elevation, surface texture, opacity), growth/distribution patterns (crowding, isolation suitability, approximate count, purity/contamination hints), and medium-specific indicators (e.g., lactose fermentation cues, EMB sheen, swarming/motility, hemolysis), so it can be used to (i) fine-tune or benchmark vision-language models for colony interpretation, (ii) build an interactive teaching/QA assistant for microbiology lab training, (iii) generate standardized captions/attribute labels for weak supervision, retrieval, or dataset search, and (iv) reproduce the exact prompts/answers used in downstream experiments by grouping records by image and pairing them with the corresponding plate image files in the GitHub subset or Zenodo archive.
Files
1.E_Coli.zip
Files
(10.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6da0b29b8fadecf00d652a4e59502752
|
3.9 GB | Preview Download |
|
md5:72cfd3a906f7ee295ddeef4ef784303b
|
1.1 GB | Preview Download |
|
md5:359d358c4d87fdac3df33d0630907cd4
|
2.4 GB | Preview Download |
|
md5:f57d1340a5f8b5f39cf98993d5947ad5
|
340.2 MB | Preview Download |
|
md5:7bb341e0f9b9818430af5130e6c9de62
|
3.0 GB | Preview Download |
|
md5:518acfabada51b1deaecf0e6254c78c5
|
222 Bytes | Preview Download |
|
md5:e158bea1b95c2d2224441edb263d2ddd
|
780 Bytes | Download |
|
md5:3896ab71b2882c923c7154f7904bef89
|
2.3 kB | Preview Download |
|
md5:7a508ff01d61ce64631718f86d51ab8e
|
219 Bytes | Download |
|
md5:756de7979449512bd0aad091cf766365
|
3.1 kB | Preview Download |
|
md5:18f87854aa0e382d9b00d1fbfda3875b
|
738 Bytes | Preview Download |