Read Me

Description of replication files for

"Identification of Time-Inconsistent Models: The Case of Insecticide Treated Nets"

by Aprajit Mahajan, Christian Michel, and Alessandro Tarozzi

Overview

The replication package includes (1) Data, in Stata (.dta) and Matlab (.mat, .csv) readable format; (2) Stata and Matlab codes to reproduce all results. We construct the variables included in the replication file from data collected during field work for the evaluation of a program of insecticide treated bed nets (ITNs) delivered for free or sold at market prices, see Tarozzi et al. (2014). The replication package includes both the data directly used in the reduced form and structural estimation, as well as the source data used to construct the variables.

To protect privacy, we have dropped all names of individuals as well as villages from the data, so all identifiers do not correspond to any recognizable ID.

We provide separate instructions for:

  1. Replicating Figure 1 and Tables 1 and 2 (done in Stata)
  2. Replicating Tables 3-5 (done in Matlab)
  3. Replicating the Appendix tables (done in Matlab)

Data Availability and Citations

All data files are included in the replication package and are publicly available at: https://doi.org/10.5281/zenodo.15699365

Data Citations

Primary Replication Data

Mahajan, Aprajit; Michel, Christian; Tarozzi, Alessandro (2025), "Replication Data for: Identification of Time-Inconsistent Models: The Case of Insecticide Treated Nets", Zenodo, V1, https://doi.org/10.5281/zenodo.15699365

Original Data Collection

Tarozzi, Alessandro, Aprajit Mahajan, Brian Blackburn, Dan Kopf, Lakshmi Krishnan, and Joanne Yoong. 2014. "Micro-loans, bednets and malaria: Evidence from a randomized controlled trial in Orissa (India)." American Economic Review 104(7): 1909-1941. https://doi.org/10.1257/aer.104.7.1909

See the separate Data Availability Statement for detailed information about data sources, access procedures, and citations.

Data Sources Summary

Data Name Data Files
ITN Field Experiment - Baseline Data hhinfo_bl.dta
ITN Field Experiment - Follow-up Data hhinfo_fup.dta
Malaria Biomarker Data panel_biomarkers.dta
Individual Roster Data data_01_roster.dta
Time Preference Survey Data data_08_timediscount.dta
Analysis-Ready Data (Stata) data_mahajan_et_al.dta
Analysis-Ready Data (Matlab) data_mahajan_et_al_matlab.csv

Computational Requirements

Software Requirements

MATLAB

  • Version: MATLAB 9.9.0.1592791 (R2020b) Update 5
  • Required Toolboxes:
    • Econometrics Toolbox (Version 5.5 )
    • Global Optimization Toolbox (Version 4.4)
    • Optimization Toolbox (Version 9.0)
    • Statistics and Machine Learning Toolbox (Version 12.0)
    • Symbolic Math Toolbox (Version 8.6)

Stata

  • Version: Stata 14 (for Figure 1 and Tables 1-2)

LaTeX (Optional)

  • Required for: Formatted table output from MATLAB code
  • Dependencies: Standard LaTeX installation with packages: amsmath, booktabs, pdflscape, geometry
  • Note: If LaTeX is not available, MATLAB code will still run but formatted tables will not be generated

Operating System Requirements

  • Tested on: Linux (Ubuntu)
  • LaTeX compilation: Works on \*nix environments and macOS with appropriate LaTeX dependencies

Hardware Requirements

  • Minimum RAM: 8GB (16GB recommended)
  • Processor: Multi-core processor recommended for optimization routines
  • Storage: Approximately 2GB of free space for temporary files and output

Runtime Information

  • Figure 1 and Tables 1-2: Approximately 5-10 minutes
  • Tables 3-5: Approximately 1-2 hours (depending on hardware)
  • Appendix Tables: Approximately 2-3 hours

Additional Notes

  • Random number generation is seeded for reproducibility where applicable
  • Directory structure: Code expects basedir/temp and basedir/outputdata directories to exist
  • The basedir variable should be set in setdirectories.m and not overwritten in subsequent files

Numerical Reproducibility Across Computational Environments

All Linux systems with MATLAB R2022a or earlier using CNR branch AVX produce numerically identical results to our original submission, regardless of toolbox versions or specific MATLAB release within that range.

Instructions for Exact Replication on Linux

  • Step 1: Check Your Current MKL Configuration

    Start MATLAB and run:

    version -blas
    

    Look for the CNR branch in the output:

    • CNR branch AVX → Will produce exact replication
  • Step 2: Force AVX Instruction Set (if needed and if MATLAB ≤ R2022a)

    If you have MATLAB R2022a or earlier and see a different CNR branch:

    1. Exit MATLAB completely
    2. In your terminal, set these environment variables:

      export MKL_ENABLE_INSTRUCTIONS=AVX
      export MKL_DEBUG_CPU_TYPE=5
      
    3. Start MATLAB from the same terminal:

      matlab
      
    4. Verify the configuration changed:

      version -blas
      

      You should now see "CNR branch AVX"

  • Step 3: Run Replication Code

    Execute the replication as normal. With CNR branch AVX active on Linux with MATLAB ≤R2022a, one should obtain exact replication of all published results.

Systems Producing Exact Replication:

We have tested the code on the following systems and replicated the results exactly:

  • Linux + MATLAB R2020b with CNR branch AVX (original submission)
  • Linux + MATLAB R2021b with CNR branch AVX
  • Linux + MATLAB R2022a with CNR branch AVX

Systems Producing Different (but internally consistent) Results:

  • Mac systems (use CNR branch AVX2)
  • Windows systems (use CNR branch AVX2)
  • Linux + MATLAB R2024b and later (use CNR branch SSE4.2 due to AVX deprecation)

Root of Discrepancy: Intel Math Kernel Library (MKL) Instruction Sets

The differences arise from Intel MKL's Conditional Numerical Reproducibility (CNR) feature, which automatically selects different instruction set architectures based on available hardware. These instruction sets (AVX, AVX2, SSE4.2) implement floating-point operations with subtle differences in vectorization and rounding behavior that get amplified through optimization algorithms and Hessian matrix calculations.

Summary of Estimation in Different Environments

All computational environments mentioned above produce:

  • Nearly identical parameter estimates (maximum difference ~10^-5)
  • Identical qualitative results and economic conclusions
  • Valid statistical inference (proven via fixed-parameter validation tests)
  • The differences affect numerical precision only, not economic substance.

What to Expect on Different Systems

  • Linux with MATLAB R2022a or Earlier + AVX:
    • Parameter estimates: Exact match to published results
    • Standard errors: Exact match to published results
  • Linux with MATLAB R2024b or Later:
    • MATLAB has deprecated AVX support in newer versions
    • CNR branch automatically falls back to SSE4.2
    • Parameter estimates: Nearly identical (~10^-5 differences)
    • Standard errors: Will differ from published values
    • Environment variable forcing will not work
  • macOS or Windows (Any MATLAB Version):
    • Parameter estimates: Nearly identical (~10^-5 differences)
    • Standard errors: Will differ from published values
    • These systems use AVX2 by default
    • Operating system differences prevent exact replication even with instruction set forcing
  • Validation and Verification
    • We provide hard-coded benchmark parameter estimates (corresponding to the CNR

      branch AVX results) in the file benchmark_parameters.mat contained in the folder (/outputdata). Replicators can:

      1. Compare their optimization results to these benchmarks
      2. Compute standard errors using the exact benchmark parameters to verify their system produces correct inference. When all systems use identical parameter values, they produce identical standard errors.
  • Technical Background: CNR Branch Explanation
    • What is CNR (Conditional Numerical Reproducibility)?
      • Intel MKL automatically detects CPU capabilities at runtime and selects the most advanced instruction set the processor supports:
        • SSE4.2: Oldest, most compatible (128-bit operations)
        • AVX: Advanced Vector Extensions, introduced 2011 (256-bit operations)
        • AVX2: Enhanced version, introduced 2013
    • Why This Causes Numerical Differences
      • While all instruction sets comply with IEEE floating-point standards, they differ in:
        • Vectorization strategies for batching matrix operations
        • Rounding behavior in numerical linear algebra routines
        • Memory alignment and operation ordering

        These tiny differences (at machine precision level ~10^-16) accumulate through iterative optimization algorithms and get amplified in Hessian matrix calculations, producing different standard errors despite nearly identical parameter estimates.

    • Evidence Differences Are Benign
      1. Fixed Parameter Tests: When standard errors are computed using identical parameter values across all systems, we get identical standard errors
      2. Optimization Path Sensitivity: Different numerical libraries guide optimization along slightly different paths, all converging to nearly the same solution
      3. Economic Robustness: All qualitative conclusions remain unchanged across computational environments

Diagnostic Script

The replication package includes (in the folder Matlab_Replication/) the file check_mkl_config.m which automatically:

  • Detects your CNR branch
  • Reports whether you will achieve exact replication
  • Provides system-specific guidance

Run this before executing the main replication code. For detailed technical documentation of our systematic testing across 7 computational environments and the root cause analysis of standard error differences, see computational_reproducibility_analysis.pdf in the replication folder.

Replicating Figure1 and Tables 1 and 2

Data

  • Source file: hhinfo_bl.dta. Household-level data, with unique identifier given by the combination of id_v (village ID) and id_hhno (within-village household ID). This file includes variables recorded in the baseline (pre-intervention) household survey (t=0), at the time of the ITN sales (t=1), and at the time of the first (t=2) as well as second (t=3) re-treatment of bed nets.
  • Source file: hhinfo_fup.dta. Household-level data, with unique identifier given by the combination of id_v (village ID) and id_hhno (within-village household ID). This file includes variables recorded in the endline (post-intervention, or "follow-up") household survey (t=4).
  • Source file: panel_biomarkers.dta. Individual-level data, with unique identifier given by the combination of id_v (village ID), id_hhno (within-village household ID) and memid (within-household ID). This includes individual-level information on malaria incidence and prevalence collected at baseline and endline, as well as other individual-level characteristics.
  • Source file: data_01_roster.dta. Individual-level data, with unique identifier given by the combination of id_v (village ID), id_hhno (within-village household ID) and memid (within-household ID). This includes individual-level information on malaria incidence and prevalence collected at baseline and endline, as well as other individual-level characteristics.
  • Source file: data_08_timediscount.dta. Household-level data, with unique identifier given by the combination of id_v (village ID) and id_hhno (within-village household ID). This includes the responses to inter-temporal choices made by respondents at baseline (t=0). These responses are key for the construction of type signal (r), see panel B of Table 1 for summary statistics.

Replication codes

  • Build data_mahajan_et_al.do. Uses Stata data sets hhinfo_bl.dta, hhinfo_fup.dta, panel_biomarkers.dta, and data_08_timediscount.dta to construct Stata file data_mahajan_et_al.dta, which includes variables used in the reduced form estimation.
  • Build data_mahajan_et_al_matlab.do. Uses Stata data sets data_mahajan_et_al.dta to construct data_mahajan_et_al_matlab.dta. The latter is the starting point for all the structural estimation. This .dta file is converted to a .csv file named data_mahajan_et_al_matlab.csv and used in replicating Tables 3-5 below.
  • Stata replication reduced form.do. Uses Stata file data_mahajan_et_al.do to produce the results in Figure 1 and Tables 1 and 2.

Replicating Tables 3-5

  • The folder Matlab_Replication contains the files necessary to replicate Tables 3-5. All file references below are to files in this folder unless mentioned otherwise.
  • Set the basedir variable following the directions in the file master.m. This sets the base directory for the code.
  • Ensure you have all the toolboxes as specified in master.m
  • Run master.m
  • Details of the .m files and their call order is outlined in readme.txt
  • If you are running a \*nix environment or on a Mac with the appropriate Latex dependencies, the final output will be a set of formatted latex tables in the folder latex/ in the Matlab_Replication folder.

Replicating the Appendix Tables

  • The folder Appendix_replication contains the files necessary to replicate the Appendix tables. All file references below are to files in this folder unless mentioned otherwise.
  • Set the startdir variable in the file master_appendix.m
  • Set the basedir variable as directed in the file master_appendix.m
  • Ensure you have all the toolboxes as specified in master_appendix.m
  • Run master_appendix.m
  • Details of the .m files is in readme.txt
  • If you are running a \*nix environment (or on a Mac) with the appropriate Latex dependencies, the final output will be a set of formatted latex tables saved in the tableB1/outputdata/ , tableB2/outputdata/ , tableB3/outputdata/ folders.

References

Tarozzi, Alessandro, Aprajit Mahajan, Brian Blackburn, Dan Kopf, Lakshmi Krishnan, and Joanne Yoong. 2014. "Micro-loans, bednets and malaria: Evidence from a randomized controlled trial in Orissa (India)." American Economic Review 104(7): 1909-1941. https://doi.org/10.1257/aer.104.7.1909

Author: amahajan

Created: 2025-09-26 Fri 21:21

Validate