Published March 22, 2026 | Version v3
Dataset Open

NHIF Bulgaria. Expenditures and Patient Counts for Home Treatment Medicines and Medical Products 2021-2024 (by Region, NHIF Code, and ICD Code)

  • 1. ROR icon Medical University Plovdiv
  • 2. ROR icon D. A. Tsenov Academy of Economics
  • 3. Medical University of Plovdiv, Center for Translational Neuroscience
  • 1. D. A. Tsenov Academy of Economics
  • 2. Medical University of Plovdiv, Center for Translational Neuroscience
  • 3. ROR icon Medical University Plovdiv

Description

📊 Dataset Description

Title: NHIF Bulgaria: Outpatient Pharmacy Reimbursement Expenditures and Patient Counts for Home Treatment (🇧🇬 Лекарствени продукти за домашно лечение)

Summary: This dataset contains detailed records of pharmaceutical expenditures and patient counts for home treatment in Bulgaria, as reported by the National Health Insurance Fund (NHIF). The data covers all reimbursed medicinal products, medical devices, and dietary foods for special medical purposes dispensed through community pharmacies.

The data is aggregated at the regional level within Regional Health Insurance Offices (RZOK), and grouped by NHIF reimbursement codes and ICD-10 diagnosis codes.

Key Characteristics:

  • Aggregated by Regional Health Insurance Office (RZOK) across all 28 regions.
  • Linked to NHIF reimbursement codes, ATC classification codes, and ICD-10 diagnosis codes.
  • Covers the full spectrum of NHIF-reimbursed outpatient pharmaceuticals for home treatment.
  • Sub-monthly reporting granularity preserved via the part variable.
  • Dual-currency columns (BGN and EUR) derived using the official fixed exchange rate of 1 EUR = 1.95583 BGN, with a currency flag indicating the original denomination (BGN for pre-2026 records, EUR from 2026 onward).

Temporal Granularity: Each calendar month is reported in three sub-periods, preserved in the part variable:

  • Part 01: days 1–10 of the month
  • Part 02: days 11–20 of the month
  • Part 03: days 21–end of the month

The period variable always represents the first day of the calendar month regardless of sub-period, enabling straightforward monthly aggregation by grouping on period while retaining the option for intra-month dispensing pattern analysis via part. This structure is useful for identifying trends, delays, or spikes in utilisation within reporting cycles.

Data Cleaning and Preprocessing: The raw monthly XLS files (198 source files, named as costs_part_NN_mmm_YYYY.xls) were processed using R (version 4.5.1; tidyverse, readxl, janitor, lubridate) with a standardised pipeline that:

  • Harmonised Bulgarian column names to English and R-compatible identifiers via a defined column mapping dictionary
  • Padded region codes to a uniform 2-digit zero-padded format
  • Standardised region names using a canonical 28-region mapping
  • Parsed temporal identifiers (month, year) and sub-period part numbers from filenames
  • Added dual-currency columns (costs_bgn, costs_eur) and a currency flag based on the reporting period
  • Tracked all file-level issues (read failures, missing columns, unmatched headers) and reported them in a structured summary

All character variables have been preprocessed with the following transformations:

  • Lowercasing all strings (except ICD-10 disease names, which preserve original case)
  • Trimming and squishing whitespace
  • Removing quotes (" and ')
  • Standardizing " - " to "-"
  • Removing trailing punctuation

All 198 source files were processed successfully with no file-level issues. No records were imputed, modified, or excluded beyond the transformations described above.

Data quality validation identified:

  • 2,474 extreme cost outliers exceeding 10 times the 99th percentile of the reimbursement distribution
  • 95 records with negative costs (refunds/corrections), totalling 40,307.85 BGN
  • No missing cost or num_in_pack data (0.0%)
  • No duplicate records, no invalid region codes, no missing or invalid part values
  • All 66 months have complete 3-part sub-period coverage

Structure:

  • 📦 Rows: 7,266,074
  • 📁 Columns: 19
  • 📆 Temporal coverage: July 2020 – December 2025 (66 months, 198 source files in 3 sub-periods each)
  • 🌍 Geographical scope: All 28 NHIF regions in Bulgaria
  • 💊 Distinct NHIF medication codes: 3,367
  • 🧪 Distinct ATC codes: 526
  • 💰 Total reimbursement: 6,686,965,176 BGN (3,418,991,004 EUR)

Sub-period (part) distribution:

  • Part 01 (days 1–10): 2,564,109 records (35.3%)
  • Part 02 (days 11–20): 2,530,737 records (34.8%)
  • Part 03 (days 21–end): 2,171,228 records (29.9%)

Records by year:

  • 2020: 651,667 (9.0%)
  • 2021: 1,287,060 (17.7%)
  • 2022: 1,299,797 (17.9%)
  • 2023: 1,313,999 (18.1%)
  • 2024: 1,351,516 (18.6%)
  • 2025: 1,362,035 (18.7%)

Key Variables:

Variable Description
region_num NHIF regional code (2-digit, zero-padded, e.g. 01)
region_name Name of the NHIF regional office (lowercase Cyrillic)
atc_code Anatomical Therapeutic Chemical (ATC) classification code
atc_name International nonproprietary name (INN) of the active substance
nhif_code NHIF-specific reimbursement product code
market_name Marketed product name (brand name)
packaging Dosage form and packaging format
concentration Strength or concentration per unit
num_in_pack Number of units per package
icd_code ICD-10 code of the diagnosed disease
icd_name Diagnosis name (in Bulgarian, original case preserved)
patients_num Number of insured persons (ЗОЛ) reimbursed for the product during the period
pack_num Number of reimbursed packages
costs Reimbursement amount in original currency (BGN pre-2026; EUR from 2026)
period First day of the reporting month (YYYY-MM-DD); identical across all three sub-periods of the same month
part Sub-period indicator: 01 = days 1–10, 02 = days 11–20, 03 = days 21–end
currency Original currency denomination (BGN or EUR)
costs_bgn Reimbursement amount standardised to BGN (1 EUR = 1.95583 BGN)
costs_eur Reimbursement amount standardised to EUR (1 EUR = 1.95583 BGN)

Use Cases: This dataset is suitable for:

  • Time series analysis of outpatient pharmaceutical expenditure
  • Pharmacoepidemiology and drug utilisation research
  • Regional inequality studies in access to reimbursed medicines
  • Health economics research and budget impact analyses
  • Pharmaceutical policy evaluation at national and regional levels
  • Intra-month dispensing pattern analysis (via the three-part reporting structure)

Note: The unit of observation is an administrative reimbursement record aggregated at the region–product–diagnosis–period–part level. The patients_num field counts insured persons reimbursed for the given product within each stratum and does not represent unique patient identifiers across strata, parts, or periods. To obtain monthly totals, group by period and aggregate across parts. Product names and diagnoses are in Bulgarian; ATC codes follow the WHO international classification.

Source: National Health Insurance Fund (NHIF), Bulgaria — https://www.nhif.bg/

License: Unless otherwise restricted by NHIF, this dataset is shared under Creative Commons Attribution 4.0 International (CC BY 4.0)

Files included:

  • nhif_outpatient_pharmacy_combined.csv — merged analytical dataset (UTF-8)
  • nhif_outpatient_pharmacy_combined_metadata.csv — variable-level data dictionary with English and Bulgarian descriptions, source column mappings, data types, and value formats

Files

nhif_outpatient_pharmacy_combined_metadata.csv

Files (1.7 GB)

Name Size Download all
md5:64e13fdd76e215823fbcfaf2682181ed
3.1 kB Preview Download
md5:b43fb62d3d44525de74f930f472d2f03
1.7 GB Preview Download