============================================================================================================
START readme
============================================================================================================

This file focuses more on the details of the data package. For 'administrative' metadata on this data,
please view the metadata files within the parent folder of this data package (where this readme file
is also found).


============================================================================================================
# General
============================================================================================================

- Author: Dominique van Wonderen
- Affiliations: Data Science & Innovation, Wageningen Social & Economic Research, Wageningen University & Research
- ORCID number: 0000-0002-4298-0428
- Project: Diet optimization: modeling iron and zinc absorption by nonlinear programming and piecewise linear 
  approximation using National Health and Nutrition Examination Survey
- Contact: rodrigo.romerosilva@wur.nl


============================================================================================================
# Title
============================================================================================================

Data and code underlying the publication 
"Diet optimization: modeling iron and zinc absorption by nonlinear programming and piecewise linear 
approximation using National Health and Nutrition Examination Survey"
(https://doi.org/10.1016/j.ajcnut.2025.06.022) 


============================================================================================================
# TableOfContents 
============================================================================================================

- General
- Title
- TableOfContents
- Methods  
    - Introduction  
    - Description  
- FolderStructure
- FolderContents
- ResearchSoftware
    - RSLanguage  
    - RSRequirements  
- FileFormats
- CodeBook
- Contributing


============================================================================================================
# Methods
============================================================================================================

## Introduction

The aim of this study was to evaluate the effectiveness of nonlinear programming (NLP) and piecewise linear 
approximation (PLA) for solving diet models with nonlinear equations for nonheme iron and zinc absorption.
Please view https://doi.org/10.1016/j.ajcnut.2025.06.022 for added information on materials and methods.


## Description

- Meals and observed consumption
Meals and consumption data used for modeling were based on a.o. NHANES consumption data. 
The input data is further described in FolderContents - 1 Raw Data.
The processing of the data was done in R, see FolderContents - 4 Model input data. This includes the estimation 
of food components such as phytate, which are necessary to estimate nonheme iron and zinc absorption.

- Estimation of absorbable iron and zinc
The formulas as described in literature were used to calculate the absorbable iron and zinc content of meals 
(see FolderContents - 2 Absorption equations).

- Piecewise linear approximation
For iron and zinc, univariate and bivariate piecewise linear approximation was performed, respectively
(see FolderContents - 3 Piecewise linear approximation)

- Diet models
A mixed-integer and a continuous diet model were developed to optimize absorbable iron and zinc intake, 
using different absorption equations available from the literature (Conway and Hallberg for iron, and Miller
for zinc). With the mixed-integer diet model, 3 types of 2-wk menu plans were created: omnivorous, vegetarian, 
and vegan. With the continuous diet model, diet plans were generated with a varying degree of allowed deviations 
from the observed diet. We tested the performance of NLP and PLA for both models. For NLP, 2 different nonlinear 
solvers were applied: LINDO and SCIP. In addition,  the efficiency of multistart and initialization 
functionalities and different time limits were tested (see FolderContents - 4 Diet model). 

- Processing and analysis scripts
All processing of data (modifications, calculations, etc.) and analysis of data was performed in R. All 
scripts are elaborately commented, describing each different step taken and the reason for the step.

- Figures and tables
All figures and tables were created through R. Refer to the R scripts for detailed information within the 
script.


============================================================================================================
# FolderStructure
============================================================================================================

- Project
-- 1 Raw data 
-- 2 Absorption equations
-- 3 Piecewise linear approximation
-- 4 Model input data
--- Constraints
--- DRV
--- Meals and observed consumption
-- 5 Diet models
--- Iron
---- Continuous
---- Integer
---- Model output
----- Continuous
----- Integer - 1h time limit
----- Interger - 30min time limit
--- Zinc
---- Continuous
---- Integer
---- Model output
----- Continuous
----- Integer - 1h time limit
----- Interger - 30min time limit
-- 6 Results analysis
--- Figures



============================================================================================================
# FolderContents
============================================================================================================

-- 1 Raw Data/  
Contains the raw input data used for modeling, namely consumption data from the open-source datasets NHANES 
and FPED. The 4 .xpt files contain NHANES food consumption data, demographics, and food names. Variable names 
are given in the .xlsx file. The .sas7bdat files contain the standardized food group equivalents (FPED) of 
the NHANES consumption data and the .xlsx file contains the WWEIA food group classification, which are necessary 
to estimate the phytate content of consumption. The phytate content of the food groups as reported by 
Larvie & Armah is given in the .xlsx file. 

-- 2 Absorption equations/
The .R script contains the iron and zinc absorption equations available from literature, with a list of 
enhancers and inhibitors in the .xlsx file.

-- 3 Piecewise linear approximation/
The .R files contain the scripts to obtain the PLA parameters for both iron and zinc, with the outputs stored
in .RData files.

-- 4 Model input data/
The folder Constraints holds a .xlsx file that contains the food intake constraints applied in the diet model.

The folder DRV includes a .xlsx file that contains the dietary reference values (DRV) as available from literature.
The .R script formats the data so it can be applied in the diet model, for which the output is stored in the 
.RData file.

The folder 'Meals and observed consumption' stores the .R script that contains the code used to transform 
the NHANES consumption dataset into the meal dataset used by the mixed-integer diet model. For the continuous 
diet model, observed consumption data is used. Both datasets were enriched. This includes the estimation of 
phytate and absorbable iron content of the meals, as well as the classification whether foods are fortified or 
not. With these scripts, 3 meal datasets have been created varying in meal type (Omnivorous, Vegetarian, 
Vegan), and 1 consumption dataset (.RData).

-- 5 Diet models/
For iron and zinc, and for the continuous and integer diet models seperately, the .R files convert the .RData 
files to .gdx and run the diet models in GAMS (.gms). The .xlsx file contains the scenarios run by the diet 
models. The model output is stored in .RData files.

-- 6 Results analysis/
Contains four .R scrips that were used to analyse modeling results. Figures (2, .jpeg, 3 .tiff, and 5 .pdf files) 
are stored in a separate folder.


============================================================================================================
# ResearchSoftware
============================================================================================================

## RSLanguage

All scripts are written using R in Rstudio and GAMS in GAMS Studio. 


## RSRequirements 

- R version 4.2.2 (2022-10-31 ucrt)
- Rstudio 2022.7.2.576
- GAMS version 40
- GAMS Studio 1.11.1
- Windows 10 x64 build 19044

- R libraries 
    - Cairo CRAN v1.6-2
    - extrafont CRAN v0.19
    - foreign CRAN v0.8-82
    - gdxdt CRAN v0.1.0
    - gdxrrw CRAN v 1.0.10
    - haven CRAN v2.4.1
    - lattice CRAN v0.20-45
    - openxlsx CRAN v4.2.4
    - patchwork CRAN v1.2.0
    - rvest CRAN v1.0.0
    - scales CRAN v1.3.0
    - segmented CRAN v2.0-1
    - tidyverse CRAN v1.3.1


============================================================================================================
# FileFormats
============================================================================================================

- .R 
- .RData
- .csv
- .txt
- .tiff
- .pdf
- .jpeg
- .xlsx
- .XPT
- .sas7bdat
- .gms


============================================================================================================
# CodeBook
============================================================================================================

Please view codebook.csv (where this readme file is also found) for the variable names and their definitions 
used in the scripts. 

============================================================================================================
# Contributing 
============================================================================================================

You are free to use the materials published under a CC BY license and can thus expand on the existing data.
If you find any errors within the data, please contact us so that we can check and improve where appropriate. 
If concerns are validated, we will update the publication package to a new version. We do not intend on 
maintaining or updating the research scripts we developed. These scripts are not present on any Git repository 
(and therefore there is no formal issue tracking available).


============================================================================================================
END readme
============================================================================================================