# PEC Framework: Cross-Domain Validation Datasets (Climate Risk, Fintech Credit, and Health Outcomes)

**DOI:** 10.5281/zenodo.20546322 (will be assigned upon publication)

## Overview

This dataset collection supports the cross-domain validation of the **Predict-Explain-Certify (PEC) framework**, validating it on three external domains beyond the original Madagascar food security application. It is the companion dataset to the article *"Generalizing the Predict-Explain-Certify Framework: Validation on Climate Risk, Fintech Credit, and Health Outcomes"* submitted to **Applied Soft Computing**.

The PEC framework achieves certification scores of **87.4 to 99.2/100** (Grades A and B) across nine targets in three domains, confirming generalizability. Ridge regression wins **9/12 targets**, confirming the interpretable model advantage in data-scarce settings.

## Datasets

### Domain 1: Climate Risk — Southeast Asia

| Property | Value |
|----------|-------|
| **Observations** | 120 |
| **Regions** | 12 (Vietnam, Cambodia, Laos) |
| **Time period** | 2014–2024 (10 years) |
| **Features** | 30 |
| **Targets** | 3 (FRI, ED, PAR) |

**Targets:**
| Target | Full Name | Unit | Description |
|--------|-----------|------|-------------|
| **FRI** | Flood Risk Index | 0–100 | Composite flood vulnerability score |
| **ED** | Economic Damages | Millions USD | Direct and indirect flood losses |
| **PAR** | Population at Risk | Millions | People potentially affected by flooding |

**Data Sources:** EM-DAT International Disaster Database, World Bank Climate Change Knowledge Portal, Mekong River Commission, national meteorological agencies.

### Domain 2: Fintech Credit Risk — West Africa

| Property | Value |
|----------|-------|
| **Observations** | 135 |
| **Countries** | 15 (WAEMU member states, focus: Senegal, Côte d'Ivoire, Cameroon) |
| **Time period** | 2016–2024 (9 years) |
| **Features** | 27 |
| **Targets** | 3 (DR, NPL, FP) |

**Targets:**
| Target | Full Name | Unit | Description |
|--------|-----------|------|-------------|
| **DR** | Default Rate | % | Percentage of loans in default |
| **NPL** | Non-Performing Loan Ratio | % | NPL as share of total loans |
| **FP** | Fintech Penetration | % | Mobile financial services adoption |

**Data Sources:** IMF Financial Access Survey, World Bank Global Findex Database, BCEAO banking statistics, national central banks.

### Domain 3: Health Outcomes — East Africa

| Property | Value |
|----------|-------|
| **Observations** | 140 |
| **Regions** | 14 (Kenya, Tanzania, Uganda DHS regions) |
| **Time period** | 2013–2023 (11 years) |
| **Features** | 31 |
| **Targets** | 3 (IMR, VC, QD) |

**Targets:**
| Target | Full Name | Unit | Description |
|--------|-----------|------|-------------|
| **IMR** | Infant Mortality Rate | Per 1000 | Deaths per 1000 live births |
| **VC** | Vaccination Coverage | % | Children with complete vaccination |
| **QD** | Qualified Deliveries | % | Births with skilled attendant |

**Data Sources:** Demographic and Health Surveys (DHS), WHO Global Health Observatory, UNICEF data.

## PEC Certification Results (Cross-Domain)

| Domain | Target | Performance | Stability | Fairness | Score | Grade | Best Model |
|--------|--------|-------------|-----------|----------|-------|-------|------------|
| Madagascar | IP | 91.05 | 93.2 | 94.0 | 94.5 | A | Ridge |
| Madagascar | MC | 95.88 | 95.5 | 94.2 | 95.8 | A | Ridge |
| Madagascar | MA | 96.70 | 97.8 | 97.5 | 98.0 | A | Ridge |
| Climate (SE Asia) | FRI | 88.93 | 87.67 | 92.4 | 89.6 | B | Random Forest |
| Climate (SE Asia) | ED | 96.22 | 95.68 | 97.7 | 96.5 | A | Ridge |
| Climate (SE Asia) | PAR | 99.07 | 99.00 | 99.6 | 99.2 | A | Ridge |
| Fintech (W Africa) | DR | 85.18 | 84.70 | 93.1 | 87.4 | B | CatBoost |
| Fintech (W Africa) | NPL | 87.08 | 86.40 | 93.8 | 88.9 | B | CatBoost |
| Fintech (W Africa) | FP | 98.73 | 98.65 | 99.4 | 98.9 | A | Ridge |
| Health (E Africa) | IMR | 97.44 | 97.37 | 98.4 | 97.7 | A | Ridge |
| Health (E Africa) | VC | 98.76 | 98.79 | 99.2 | 98.9 | A | Ridge |
| Health (E Africa) | QD | 99.09 | 99.07 | 99.5 | 99.2 | A | Ridge |

## Key Findings

1. **Ridge wins 9/12 targets** across all four domains, confirming the interpretable model advantage
2. **Temporal features dominate SHAP** in all 9 targets (MA3 is top-1 feature), confirming temporal persistence as a robust cross-domain phenomenon
3. **Multi-method triangulation** (SHAP + LIME + Permutation Importance) achieves Kendall's τ > 0.75 for all domain-target pairs
4. **Three Grade B targets** identified: FRI (data quality), DR and NPL (structural discriminatory bias)

## Domain-Specific Adaptation Requirements

1. **Financial services**: Fairness weight increased to w_T = 0.4 (from 0.3)
2. **Climate data**: Relaxed stability threshold CV < 15% (from 10%)
3. **Imbalanced classification**: AUC metrics instead of R² for Performance sub-score

## Temporal Feature Engineering

All three datasets include the same temporal feature engineering:
- **MA3**: 3-year moving average
- **Lag1**: 1-year lag feature
- **Delta**: Year-over-year change

## Operational Web Platforms

| Platform | URL | Description |
|----------|-----|-------------|
| PEC Gen | https://pec-gen.streamlit.app | Generalization dashboard (all domains) |
| PEC Mada | https://pec-mada.streamlit.app | Madagascar dashboard |
| PEC Aide | https://pec-aide.streamlit.app | Decision support |
| PEC What-If | https://pec-what.streamlit.app | What-If simulation |

## File Inventory

| File | Description |
|------|-------------|
| `climate_risk_se_asia.csv` | Climate risk dataset (120 rows × 33 columns) |
| `fintech_credit_w_africa.csv` | Fintech credit dataset (135 rows × 30 columns) |
| `health_outcomes_e_africa.csv` | Health outcomes dataset (140 rows × 34 columns) |
| `data_dictionary_climate.csv` | Variable definitions for climate dataset |
| `data_dictionary_fintech.csv` | Variable definitions for fintech dataset |
| `data_dictionary_health.csv` | Variable definitions for health dataset |
| `crossdomain_certification.csv` | PEC certification scores for all 12 targets |
| `model_comparison_crossdomain.csv` | Model comparison across all 4 domains |
| `shap_values_all_domains.csv` | Top-5 SHAP values for all 9 external targets |
| `triangulation_crossdomain.csv` | Kendall's τ triangulation results by domain |

## Funding

This research received no external funding.

## License

This dataset is licensed under the **Creative Commons Attribution 4.0 International (CC-BY-4.0)** license.

## Citation

If you use this dataset, please cite:

```bibtex
@dataset{ralinirina2026crossdomain,
  author    = {Ralinirina, Rosa Elysabeth and Ralaivao, Jean Christian and Ralaivao, Niaiko Michaël and Ratovondrahona, Alain Josué and Mahatody, Thomas},
  title     = {PEC Framework: Cross-Domain Validation Datasets (Climate Risk, Fintech Credit, and Health Outcomes)},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20546322},
  url       = {https://doi.org/10.5281/zenodo.20546322}
}
```

## Contact

Rosa Elysabeth Ralinirina  
EDMI, École Normale Informatique (ENI)  
University of Fianarantsoa, Madagascar  
Email: rosa@eni.mg
