Published February 17, 2026 | Version v1
Dataset Open

Full Blood Count Dataset from Sri Lankan Healthcare Institutions (222,042 records)

  • 1. ROR icon Post Graduate Institute of Medicine, University of Colombo, Sri Lanka
  • 2. Ministry of Health, Sri Lanka

Description

Full Blood Count Dataset from Sri Lankan Healthcare Institutions

This dataset contains 222,042 anonymized Full Blood Count (FBC) laboratory result records from four healthcare institutions in Sri Lanka, spanning April 2014 to August 2023.

Background

Data were extracted from institutional database instances of an open-source Hospital Management Information System (github.com/hmislk/hmis) that has been in continuous production use across more than 40 healthcare institutions in Sri Lanka since 2004. Patient data were captured routinely during clinical care, governed by institutional Privacy Notices displayed prominently in the participating healthcare facilities. Patients retained the right to opt out of data storage and usage at any time. This study was approved by the Ethics Review Committee of the Postgraduate Institute of Medicine, University of Colombo, Sri Lanka (ERC/PGIM/2023/135).

Parameters

The dataset includes 13 standardized haematological parameters:

 

  • Red cell parameters (all 4 institutions): Haemoglobin (HB), Red Blood Cell Count (RBC), Packed Cell Volume (PCV), Mean Corpuscular Volume (MCV), Mean Corpuscular Haemoglobin (MCH), Mean Corpuscular Haemoglobin Concentration (MCHC)

  • White cell and platelet parameters (1 institution): White Blood Cell Count (WBC), Platelet Count (PLT), Neutrophils (NEU), Lymphocytes (LYM), Monocytes (MON), Eosinophils (EOS), Basophils (BAS)

Demographics

Each record includes patient sex (57.0% female, 42.9% male), age at test (available for 99.8% of records), and year of birth. The age distribution covers paediatric through geriatric populations (0–17 years: 21.0%, 18–39: 34.4%, 40–59: 24.0%, 60–79: 17.9%, 80+: 2.5%).

Anonymization

All direct patient identifiers (names, identity numbers, addresses, contact details) were excluded. Date of birth was reduced to year only. Institution names were replaced with anonymous labels (Institution_A through Institution_D). Internal report identifiers were removed.

File

  • fbc_dataset_public.csv — 222,042 rows, 7 columns (institution, sex, age_at_test, birth_year, bill_date, parameter, value)

Important Notes

  1. Platelet values from Institution_A are recorded as raw counts (e.g., 258,000). Divide by 1,000 to convert to the conventional x10⁹/L unit.

  2. A small number of physiologically implausible values exist (<1% for any parameter). Users should apply appropriate range filters.

  3. 25 records have negative calculated ages due to erroneous source data entries.

Potential Uses

  • Establishing population-specific FBC reference intervals for Sri Lanka

  • Age- and sex-stratified analysis of haematological parameters

  • Temporal trend analysis across nearly a decade of clinical data

  • Inter-institutional variation studies

  • Benchmarking and methodology development for HMIS-derived research datasets

Keywords

full blood count, haematology, Sri Lanka, reference intervals, open data, hospital information system, CBC, FBC, South Asia, electronic health records

Files

DATASET_README.md

Files (9.8 MB)

Name Size Download all
md5:c24371b97f85de1159378c769c18b57f
4.1 kB Preview Download
md5:583a5d383fde20696940b11b38efd760
9.8 MB Preview Download

Additional details

Software

Repository URL
https://github.com/hmislk/hmis/
Programming language
Java , C# , Python
Development Status
Active