Targeted Newborn Metabolomics: Prediction of gestational age from cord blood

Objective: Our study sought to determine whether metabolites from a retrospective collection of banked cord blood specimens could accurately estimate gestational age and to validate these findings in cord blood samples from Busia, Uganda. Study Design: Forty-seven metabolites were measured by tandem mass spectrometry or enzymatic assays from 942 banked cord blood samples. Multiple linear regression was performed, and the best model was used to predict gestational age, in weeks, for 150 newborns from Busia, Uganda. Results: The model including metabolites and birthweight, predicted the gestational ages within 2 weeks for 76.7% of the Ugandan cohort. Importantly, this model estimated the prevalence of preterm birth <34 weeks closer to the actual prevalence (4.67% and 4.00%, respectively) than a model with only birthweight which overestimates the prevalence by 283%. Conclusion: Models that include cord blood metabolites and birth weight appear to offer improvement in gestational age estimation over birth weight alone.


Introduction
Preterm birth (<37 weeks completed gestation) is the leading cause of child death worldwide, with the greatest burden in low-resource regions [1]. Precise population estimates of gestational age are essential for determining the burden of preterm birth as well as identifying regions with greater than average preterm birth rates where interventions would have the greatest impact [2]. Additionally, accurate estimation of gestational age is important for research studies focused on identifying the risk factors and causes of preterm birth.
There are numerous methods for estimating gestational age during pregnancy, with first trimester ultrasound being the standard of care in high resource settings [3]. In low resource areas where women have limited access to prenatal care, ultrasound imaging may not be readily accessible [4,5]. Additionally, accurate gestational age dating by ultrasound is complicated in fetuses that are growth restricted or small for their gestational age (SGA) in utero, a common problem in developing areas of the world [6,7]. In the absence of ultrasound, gestational age may be estimated based on a woman's last known menstrual period (LMP). Gestational age estimation based on LMP is significantly inferior to ultrasound dating as it heavily relies on a woman remembering the date of her last menstrual cycle [8][9][10][11].
Gestational age can also be estimated after birth using standardized scoring systems based on neuromuscular and physical characteristics including birth weight of the newborn [12,13]. These estimates are less precise than obstetric estimation and often overestimate the number of infants born at less than 40 weeks gestation [8,[14][15][16]. Using cohorts from the U.S. and Canada, several research groups, including ours, have demonstrated that metabolic and endocrine markers, captured in routine newborn screening, in combination with birth weight provide a better estimation of gestational age than birth weight alone [17][18][19]. Furthermore, our group and others, have shown birth weight is a poor surrogate for gestational age particularly in the presence of growth restriction which is of importance in low-and middle-income countries which have higher rates of infants born SGA [18,[20][21][22]. As there are distinct risks for morbidity and mortality associated with SGA, it is important for surveillance and research efforts to have accurate measures of gestational age that distinguish between infants born SGA at term or preterm.
While these results are promising, particularly for the surveillance of preterm birth and for improving estimation of gestational age in studies examining the risk factors and causes of preterm birth, newborn screening methodologies are not universal within the United States and access to such records proves increasingly challenging. Furthermore, in low resource regions of the world, newborn screening is often not performed or only available to high income families. Cord blood is readily available at birth and acceptable to parents for sampling as no invasive procedures to the infant are required. It thus presents an alternative approach to estimating gestational age after birth for methods of preterm birth surveillance and research.
A recent study demonstrated the success of metabolite-based methods of gestational age estimation using either heel-prick or cord blood measurements in a population from Matlab, Bangladesh [23]. Our research team also demonstrated the effectiveness of using U.S. based metabolite models as applied to gestational estimation using a heel-prick blood spot measurement in Busia, Uganda in East Africa [24]. In this proof-of-concept study we aimed to determine if cord blood metabolites could improve estimation of gestational age above and beyond birth weight alone. In this study we use banked cord blood specimens from a biorepository of samples from Iowa to estimate gestational age and validate this model in 150 cord blood specimens from Busia, Uganda. We hypothesize that metabolic and endocrine markers captured through cord blood can provide an accurate estimation of preterm birth prevalence for surveillance and research purposes.

Study Population
A retrospective analysis was performed using 938 banked newborn cord blood samples collected at the University of Iowa Hospitals and Clinics. These samples were used as a model building dataset to create a predictive model. To determine the final performance of the predictive model, the model was tested in an independent population of 150 newborns from Busia, Uganda.  [25][26][27]. Briefly, standardized mass spectrometry (MS/MS) was used to measure the amino acids, acylcarnitines, and free carnitine. TSH and 17-OHP were measured by high-performance liquid chromatography, and GALT was measured using fluorometric enzyme assay. Completeness of metabolic data varied across individuals.
Four hundred ninety-nine cards were made from preterm babies' samples (gestational ages 24 to 36 weeks) and 443 cards were made from term babies' samples (37 to 41 weeks). Gestational age was determined by ultrasound for 80% of the samples (N=753); the remaining were either determined by LMP or the method of determining the gestational age was not mentioned in the electronic medical record (N=189). Birthweight, in grams, was obtained from the medical record for each newborn. One hundred twenty-one samples were from twins or higher-order multiple births.
Uganda Cohort: Individuals from Uganda were enrolled as participants in the Prevention of Malaria in HIV-uninfected Pregnant Women and Infants (NCT02793622), a doubleblinded randomized controlled trial comparing risk of adverse birth outcomes among HIV-uninfected pregnant women randomized to receive intermittent preventive therapy (IPTp) with monthly sulfadoxine-pyrimethamine (SP) versus monthly dihydroartemisininpiperaquine (DP). Individuals were enrolled at a clinic within the Masafu General Hospital. Women were included in the study if they were pregnant, had a gestational age estimated by ultrasound between 12 weeks 0 days and 20 weeks 0 days at time of enrollment, were confirmed to be HIV-uninfected by a rapid test, were 16 years or older, resided in the Busia District of Uganda, provided informed consent, agreed to come to the clinic for any febrile episode or other illness, avoided medication given outside of the study protocol, and planned to deliver at the hospital. Women enrolled in this study also consented for cord blood analysis of metabolites. In this proof-of-concept study we used the first 150 births enrolled in the cohort to test our developed models. Women were excluded if they had a history of serious adverse events to SP or DP, had an active medical problem requiring inpatient evaluation at time of screening or chronic medical condition requiring frequent medical attention, or had prior SP preventative therapy or any other antimalarial therapy during the pregnancy. Infants from Uganda included in this study were born between December 22, 2016 and April 7, 2017. Birthweight was recorded in grams for each newborn. Cord blood samples from Uganda were obtained following birth, spotted onto Whatman 903 cards, and sent within 2 weeks maintaining a temperature of at least −20℃ to the University of Iowa for metabolite testing. Samples were tested at the State Hygienic Laboratory of Iowa, for the same metabolites as collected on the Iowa population and under the same protocols described above.

Statistical Analysis
Forty-seven metabolites were examined. Each metabolite was examined for association with gestational age by univariate analysis. The linearity between single metabolite levels and gestational age was examined through plots of residuals versus predicted values. To address nonlinearity between metabolites and gestational age, squared terms and, subsequently, cubed terms were included for each model. Multiple linear regression modeling was performed, with gestational age, in weeks, as the outcome measure. Metabolites that were found to be significant in the univariate analysis were included as variables in the model. Ordinary least squares were used to estimate the regression. In the modelbuilding dataset, all metabolites significant (p < 0.1) in the univariate analysis were included in the initial model. The relationship between gestational age and birthweight was inspected visually using plots of residual versus predicted values. Only one individual was missing a birthweight measurement and was therefore excluded from models including this term. Significant terms (p < 0.05) were maintained for subsequent modeling. Significant metabolites' squared and cube terms were sequentially incorporated into the model, with nonsignificant terms (p > 0.05) removed afterwards. Cubic terms were only inspected when square terms were significant. Adjusted coefficients of determination (R²), root-mean-square error (RMSE) and area under the curves (AUC) were utilized to evaluate models.
Three models (birthweight only, metabolites only, and metabolites plus birthweight) were developed to predict gestational age in the Ugandan model-testing dataset (n = 150). Predicted gestational age was then compared to the best-known gestational age for each individual. All analyses were performed in SAS version 9.4 (SAS Institute Inc, Cary, NC) or Stata/SE version 12.1 (StataCorp LP, College Station, TX). We examined performance among all preterm newborns (<37 weeks). Prevalence estimates were calculated for 2week intervals of gestational age to determine if metabolites improved the prediction of gestational age above and beyond birth weight for earlier preterm births which are at the highest risk for resulting in neonatal morbidity and mortality.

General characteristics of the Iowa study population and the newborn metabolic model
In the Iowa dataset, there were 938 individuals. Approximately 49% of these individuals were born preterm (<37 weeks). Gestational age of the infants ranged from 24 weeks to 41 weeks. The mean birthweight was 2643.58 grams. Within the model building dataset, the full model included 17 metabolites (11 acylcarnitines and 6 amino acids), 12 squared metabolite terms, and 6 cubed metabolite terms (Table 1).

Model performance in the Iowa model-building dataset
The model including only metabolites explained 73.3% of the variation in gestational age in the Iowa model-building dataset. The average difference between gestational ages predicted by the model and the actual gestational age was 2.3 weeks ( Table 2). The metabolite model was able to accurately differentiate between infants born preterm (<37 weeks) versus those born at term (≥37 weeks; AUC = 94.8%). In the model-building dataset, birthweight alone explained 81.8% of the variation in gestational age and the average difference between the actual and predicted gestational ages was 1.9 weeks ( Table 2). The model utilizing birthweight only was also able to distinguish between premature and term infants (AUC = 96.7%). Including birthweight in the metabolite model improved the difference between predicted and actual gestational age to 1.55 weeks and explained 87.8% of the variation in gestational age ( Table 2). Inclusion of metabolite measurements explained an additional 6% of the variation in gestational age above and beyond birthweight alone. The model including both metabolites and birthweight was better able to differentiate preterm compared to term infants (AUC = 98.2%) than either the metabolite only or birthweight only models ( Figure  1a). Gestational age was predicted within 1 week or less for 66.2% of individuals and within 2 weeks or less for 88.0% of individuals in the Iowa model-building dataset.

Model performance in the Ugandan model-testing dataset
Eleven (7.33%) of the 150 infants in the Ugandan model-testing dataset were born prematurely. The gestational ages of the Ugandan population ranging from 28 weeks to 43 weeks. The infants in the model testing dataset had a mean birthweight of 2991.4 grams. Using the model including metabolites and birthweight, 76.7% of the cohort had predicted gestational ages within 2 weeks of their actual gestational age. This is compared to 61.3% with accurate gestational age predictions (±2 weeks) for the model including only metabolites and 46.0% with accurate gestational age predictions (±2 weeks) for the model including only birthweight.
When examining the prevalence of preterm birth in the Ugandan population (Table 3), the model containing both metabolites and birthweight comes significantly closer to the true population prevalence of preterm birth than the model containing only birthweight or only the metabolites. The model containing both metabolites and birthweight overestimates the prevalence of preterm birth <37 weeks by 191.1% which is much lower than the model containing only the metabolites (overestimate of 463.8%) or containing only birthweight (overestimate of 664.0%). Furthermore, the model containing both metabolites and birthweight estimates the prevalence of preterm birth ≤34 weeks almost exactly (4.67% for the model versus the actual prevalence of 4.00%) ( Table 3). The metabolite only model overestimates the prevalence of preterm birth ≤34 weeks by 208.3% and the birthweight only model overestimates the prevalence by 283.3%.

Discussion
In this study, we demonstrate that the newborn metabolic profile measured in cord blood in combination with birth weight, is as accurate for estimating gestational age as methods that use metabolites collected by heel stick for routine newborn screening. The cord blood models including only metabolites explained 73.3% of the variation in gestational age in the Iowa model building dataset and were able to accurately differentiate between infants born preterm (<37 weeks) versus those born at term (≥37 weeks; AUC = 94.8%). This is comparable to our previous published work examining metabolite measurements from heel stick samples collected 24-72 hours after birth as part of the newborn screening program in Iowa and as validated in heel stick samples from Busia, Uganda [18,24]. While emerging, newborn screening is still not offered in all parts of the world. Furthermore, some programs around the world find it is easier to collect cord blood as this can be drawn at the time of delivery rather than waiting 24-48 hours after birth, when many women will have already returned home. While optimal newborn screening must occur 1-2 days following birth; our findings show that models using metabolites from cord blood is a feasible method for gestational age estimation.
Our cord blood model was validated using a preliminary cohort of infants from Busia, Uganda. The model including metabolites and birthweight, predicted the gestational age within 2 weeks of the actual gestational age for 76.7% of the cohort. Furthermore, the model was able differentiate preterm (<37 weeks) infants from those born at term (≥37 weeks) better than a model containing only birth weight (AUC=85.1% vs. 83.6%). The prevalence of preterm birth (<37 weeks) when including metabolites to birth weight for gestational age estimation was still overestimated (21% versus 7%); although to a lesser extent than the birth weight only model (56%). This is similar for what is seen for Ballard score, a common method of postnatal gestational age estimation. A study examining 688 singleton pregnancies from rural Papua New Guinea found Ballard score estimated preterm birth prevalence between 8.2 to 21.3% compared to the actual prevalence of 5.2% [14]. Notably, our metabolite and birth weight model was much more effective in correctly estimating the prevalence of preterm birth <34 weeks than birth weight alone and this may represent an improvement over what is traditionally seen for Ballard score.
While it is unlikely that newborn screening will soon be comprehensively implemented in every part of the world, we have demonstrated, that obtaining cord blood samples for targeted surveillance of preterm birth is feasible. The cost of newborn screening varies but remains reasonable compared to other high-throughput technologies like genome sequencing. In our model building dataset of 942 cord blood samples from Iowa we were limited in our ability to accurately determine gestational age for 20% of the cohort. While this is a limitation, it likely did not bias our results and future studies using first trimester ultrasound dated gestational age will likely see similar if not stronger findings than we report here. Our validation cohort in the Uganda sample was a proof-of-concept convenience sample and as such was limited to the first 150 births with only 11 preterm births. Future studies examining larger cohorts for validation are needed.
This study was strengthened by the ability to validate the predictive models in an independent population. Furthermore, while the population used to create these models was predominately white and has, in general, good access to resources and healthcare, the population used for validation was different [24]. The performance of the model in such a population provides further support for its potential utility. Additionally, while the samples used for model building were older, which could potentially have resulted in deterioration of metabolite levels within them, there was still adequate validation in an independent cohort where samples were collected, frozen, and measured within a short time frame to preserve sample integrity. Nonetheless future work using population specific, and time of storage information could lead to even better predictive algorithms.
Accurate estimation of gestational age is critical for surveillance work and epidemiologic research. Models that include cord blood metabolites and birth weight appear to offer improvement in gestational age estimation over birth weight alone. The newborn metabolic profile, derived from cord blood, in combination with birth weight is an accurate method for estimating gestational age and preterm birth prevalence. Future studies, building populationspecific estimates of gestational age are needed to further increase model optimization.