NHIF Bulgaria. Expenditures and Patient Counts for Home Treatment Medicines and Medical Products 2021-2024 (by Region, NHIF Code, and ICD Code)
Authors/Creators
Contributors
Editor (2):
Other (2):
Description
📊 Dataset Description
Title: NHIF Bulgaria: Outpatient Pharmacy Reimbursement Expenditures and Patient Counts for Home Treatment (🇧🇬 Лекарствени продукти за домашно лечение)
Summary: This dataset contains detailed records of pharmaceutical expenditures and patient counts for home treatment in Bulgaria, as reported by the National Health Insurance Fund (NHIF). The data covers all reimbursed medicinal products, medical devices, and dietary foods for special medical purposes dispensed through community pharmacies.
The data is aggregated at the regional level within Regional Health Insurance Offices (RZOK), and grouped by NHIF reimbursement codes and ICD-10 diagnosis codes.
Key Characteristics:
- Aggregated by Regional Health Insurance Office (RZOK) across all 28 regions.
- Linked to NHIF reimbursement codes, ATC classification codes, and ICD-10 diagnosis codes.
- Covers the full spectrum of NHIF-reimbursed outpatient pharmaceuticals for home treatment.
- Sub-monthly reporting granularity preserved via the
partvariable. - Dual-currency columns (BGN and EUR) derived using the official fixed exchange rate of 1 EUR = 1.95583 BGN, with a currency flag indicating the original denomination (BGN for pre-2026 records, EUR from 2026 onward).
Temporal Granularity: Each calendar month is reported in three sub-periods, preserved in the part variable:
- Part 01: days 1–10 of the month
- Part 02: days 11–20 of the month
- Part 03: days 21–end of the month
The period variable always represents the first day of the calendar month regardless of sub-period, enabling straightforward monthly aggregation by grouping on period while retaining the option for intra-month dispensing pattern analysis via part. This structure is useful for identifying trends, delays, or spikes in utilisation within reporting cycles.
Data Cleaning and Preprocessing: The raw monthly XLS files (198 source files, named as costs_part_NN_mmm_YYYY.xls) were processed using R (version 4.5.1; tidyverse, readxl, janitor, lubridate) with a standardised pipeline that:
- Harmonised Bulgarian column names to English and R-compatible identifiers via a defined column mapping dictionary
- Padded region codes to a uniform 2-digit zero-padded format
- Standardised region names using a canonical 28-region mapping
- Parsed temporal identifiers (month, year) and sub-period part numbers from filenames
- Added dual-currency columns (
costs_bgn,costs_eur) and acurrencyflag based on the reporting period - Tracked all file-level issues (read failures, missing columns, unmatched headers) and reported them in a structured summary
All character variables have been preprocessed with the following transformations:
- Lowercasing all strings (except ICD-10 disease names, which preserve original case)
- Trimming and squishing whitespace
- Removing quotes (
"and') - Standardizing
" - "to"-" - Removing trailing punctuation
All 198 source files were processed successfully with no file-level issues. No records were imputed, modified, or excluded beyond the transformations described above.
Data quality validation identified:
- 2,474 extreme cost outliers exceeding 10 times the 99th percentile of the reimbursement distribution
- 95 records with negative costs (refunds/corrections), totalling 40,307.85 BGN
- No missing cost or num_in_pack data (0.0%)
- No duplicate records, no invalid region codes, no missing or invalid part values
- All 66 months have complete 3-part sub-period coverage
Structure:
- 📦 Rows: 7,266,074
- 📁 Columns: 19
- 📆 Temporal coverage: July 2020 – December 2025 (66 months, 198 source files in 3 sub-periods each)
- 🌍 Geographical scope: All 28 NHIF regions in Bulgaria
- 💊 Distinct NHIF medication codes: 3,367
- 🧪 Distinct ATC codes: 526
- 💰 Total reimbursement: 6,686,965,176 BGN (3,418,991,004 EUR)
Sub-period (part) distribution:
- Part 01 (days 1–10): 2,564,109 records (35.3%)
- Part 02 (days 11–20): 2,530,737 records (34.8%)
- Part 03 (days 21–end): 2,171,228 records (29.9%)
Records by year:
- 2020: 651,667 (9.0%)
- 2021: 1,287,060 (17.7%)
- 2022: 1,299,797 (17.9%)
- 2023: 1,313,999 (18.1%)
- 2024: 1,351,516 (18.6%)
- 2025: 1,362,035 (18.7%)
Key Variables:
| Variable | Description |
|---|---|
region_num |
NHIF regional code (2-digit, zero-padded, e.g. 01) |
region_name |
Name of the NHIF regional office (lowercase Cyrillic) |
atc_code |
Anatomical Therapeutic Chemical (ATC) classification code |
atc_name |
International nonproprietary name (INN) of the active substance |
nhif_code |
NHIF-specific reimbursement product code |
market_name |
Marketed product name (brand name) |
packaging |
Dosage form and packaging format |
concentration |
Strength or concentration per unit |
num_in_pack |
Number of units per package |
icd_code |
ICD-10 code of the diagnosed disease |
icd_name |
Diagnosis name (in Bulgarian, original case preserved) |
patients_num |
Number of insured persons (ЗОЛ) reimbursed for the product during the period |
pack_num |
Number of reimbursed packages |
costs |
Reimbursement amount in original currency (BGN pre-2026; EUR from 2026) |
period |
First day of the reporting month (YYYY-MM-DD); identical across all three sub-periods of the same month |
part |
Sub-period indicator: 01 = days 1–10, 02 = days 11–20, 03 = days 21–end |
currency |
Original currency denomination (BGN or EUR) |
costs_bgn |
Reimbursement amount standardised to BGN (1 EUR = 1.95583 BGN) |
costs_eur |
Reimbursement amount standardised to EUR (1 EUR = 1.95583 BGN) |
Use Cases: This dataset is suitable for:
- Time series analysis of outpatient pharmaceutical expenditure
- Pharmacoepidemiology and drug utilisation research
- Regional inequality studies in access to reimbursed medicines
- Health economics research and budget impact analyses
- Pharmaceutical policy evaluation at national and regional levels
- Intra-month dispensing pattern analysis (via the three-part reporting structure)
Note: The unit of observation is an administrative reimbursement record aggregated at the region–product–diagnosis–period–part level. The patients_num field counts insured persons reimbursed for the given product within each stratum and does not represent unique patient identifiers across strata, parts, or periods. To obtain monthly totals, group by period and aggregate across parts. Product names and diagnoses are in Bulgarian; ATC codes follow the WHO international classification.
Source: National Health Insurance Fund (NHIF), Bulgaria — https://www.nhif.bg/
License: Unless otherwise restricted by NHIF, this dataset is shared under Creative Commons Attribution 4.0 International (CC BY 4.0)
Files included:
- nhif_outpatient_pharmacy_combined.csv — merged analytical dataset (UTF-8)
- nhif_outpatient_pharmacy_combined_metadata.csv — variable-level data dictionary with English and Bulgarian descriptions, source column mappings, data types, and value formats
Files
nhif_outpatient_pharmacy_combined_metadata.csv
Files
(1.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:64e13fdd76e215823fbcfaf2682181ed
|
3.1 kB | Preview Download |
|
md5:b43fb62d3d44525de74f930f472d2f03
|
1.7 GB | Preview Download |