Development and Evaluation of an Automated Algorithm to Estimate the Nutrient Intake of Infants from an Electronic Complementary Food Frequency Questionnaire

Background: We previously validated a four-day complementary food frequency questionnaire (CFFQ) to estimate the nutrient intake in New Zealand infants aged 9-12 months. However, manual entry of the CFFQ data into nutritional analysis software was time-consuming. Therefore, we developed an automated algorithm and evaluated its accuracy by comparing the nutrient estimates with those obtained from the nutritional analysis software. Methods: We analysed 50 CFFQ completed at 9and 12-months using Food Works nutritional analysis software. The automated algorithm was programmed in SAS by multiplying the average daily consumption of each food item by the nutrient content of the portion size. We considered the most common brands for commercially prepared baby foods. Intakes of energy, macronutrients, and micronutrients were compared between methods using Bland-Altman analysis. Results: The automated algorithm did not have any significant bias for estimates of energy (kJ) (MD 15, 95% CI -27, 58), carbohydrate (g) (MD -0.1, 95% CI -1.2,1.0), and fat (g) (-0.1, 95% CI -0.3,0.1), but slightly underestimated intake of protein (MD -0.4 g, 95% CI -0.7,-0.1), saturated fat, PUFA, dietary fibre, and niacin. The algorithm provided accurate estimates for other micronutrients. The limits of agreement were relatively narrow. Conclusion: This automated algorithm is an efficient tool to estimate the nutrient intakes from CFFQ accurately. The small negative bias observed for few nutrients was clinically insignificant and can be minimised. This algorithm is suitable to use in large clinical trials and cohort studies without the need for proprietary software.


INTRODUCTION
Nutrition in the first year can have a profound impact on later metabolic health. Both under-and overnutrition may increase the risk of obesity in childhood, and insulin resistance, impaired glucose tolerance, hypertension, and dyslipidaemia in adulthood [1][2][3][4][5]. Hence, assessing nutritional intake in infancy is important in understanding the influence of early life on long-term health and planning intervention strategies to prevent these consequences.
Assessing dietary intake in infants can be challenging due to the different types of milk consumed, the transition from milk feeding to complementary feeding, and wide variations in feeding patterns. Further, it is difficult to estimate the volume of breastmilk consumed during breastfeeding and the portion sizes of solid foods, particularly as infants often consume only a proportion of the food that is offered. There are several different methods for assessing *Address correspondence to this author at the Liggins Institute, University of Auckland, Private Bag 92019, Victoria Street West, Auckland 1142, New Zealand; Tel: +64 274725099; E-mail: c.mckinlay @auckland.ac.nz dietary intake, including the four-day weighed food diary, 24-hour food recall, 7-day food record, and food frequency questionnaires (FFQ) [6]. There is an ongoing debate about which tool is most predictive of absolute intake, but FFQ are popular in large clinical studies because they are relatively inexpensive to administer and have a low burden on respondents. They usually consist of questions about the frequency of consumption of pre-listed food items over the past few weeks, supplemented by a description of additional items and commercial food brands. It is important that FFQ are current, specific to the population of interest, and assessed for reproducibility and validity [7].
Globally, few validated FFQ assess dietary intake in infants aged less than 12 months [8][9][10]. We have previously developed a four-day Complementary Food Frequency Questionnaire (CFFQ) for infants and compared this to a four-day weighed food diary. The CFFQ had acceptable relative validity and good reproducibility for assessing nutrient intake in New Zealand infants aged 9 to 12 months [11]. Nutrient intakes from the CFFQ were estimated by multiplying the frequency of per day consumption of each food by the nutrient content of the portion size and adding these products across all food items. Because this electronic CFFQ is quick and easy for parents to complete and is available online with photos of portion size measure, it is currently being used in several large clinical trials [12,13].
FFQ are usually analysed by entering the frequency and portion size for all recorded food items into nutritional analysis software, which automatically generates the nutrient estimates using food composition databases. The resulting data can be exported to a spreadsheet or statistical analysis software. Although nutritional analysis software is convenient, the entry of data from FFQ is a complex process. It requires manually coding the food items in the nutrition analysis software, converting the frequency of consumption of food items into a daily average, and entering the reported quantity of food items in prespecified measurement units. Manual data coding and entry is time-consuming, and there is potential to introduce transcribing errors between the data reported and recorded. Despite quality assurance protocols followed by the research teams, such discrepancies are evident in large studies and can have a major impact on the quality of data [14].
Automated dietary analysis can reduce time and costs, and while it may also reduce data entry errors, accounting for individually processed foods and nonstandard items is more difficult. Therefore, we sought to develop an automated algorithm to assess nutrient intakes from the CFFQ output data. Further, to assess the accuracy of the nutrient estimates, we compared them to those obtained using nutritional analysis software.

Study Population
CFFQ data were obtained from participants in the BabyGEMS Study, a prospective cohort study of a subgroup of infants born to women enrolled in the ongoing Gestational Diabetes Mellitus Study of Detection Thresholds (GEMS) Trial (ACTRN12615000290594). Women with a singleton pregnancy and without a previous history of diabetes were eligible for the GEMS Trial if they planned to give birth within the Auckland and Counties Manukau District Health Boards, Auckland, New Zealand. For the BabyGEMS Study, infant follow-up included assessment of nutrient intake at 9 and 12 months using a previously validated four-day complementary food frequency (CFFQ) questionnaire [11]. Questionnaires were completed on electronic case report forms (eCRF) using the REDCap system and checked for completeness by a member of the research team. The eCRFs contained appropriate range and logic checks to identify data entry errors. Participants were contacted for any missing information.

CFFQ
The semi-quantitative CFFQ was developed based on data from the Growing Up in New Zealand Study and food and nutrition guidelines for healthy infants and toddlers in New Zealand [15]. It has been designed to be completed as a web-based survey using the REDCap database system [16]. Its use at 9 to 12 months of age to estimate nutrient intake has been validated against a four-day weighed food diary [11]. It provided valid estimates of nutrient intakes for 14 out of 19 nutrients and demonstrated good reproducibility for all nutrients [11].
The CFFQ comprises 65 food items, classified in six food groups: cereal/carbohydrate (11 items), dairy products (5 items), homemade prepared protein food (10 items), fresh or home-cooked vegetables (18 items), snacks (9 items), various types of milk and fluids (8 items), and commonly used commercially prepared baby foods (4 items). These food items were assigned portion sizes and frequencies according to a list of relevant foods that was generated and prioritised based on the contextual and cultural appropriateness of each item to reflect the dietary intake of infants. For commercial foods, caregivers are asked to record their most-used brand. Additional foods not included in the listed items can be described by free text.
Caregivers are instructed to record all infant food intake over the previous four consecutive days. At the beginning of the questionnaires, pictures of measures and information regarding portion sizes and amount conversions are given to ensure that the participant reports accurate food intake of the baby.

Food Works Analysis
The data from the CFFQ was analysed for nutrient content in Food Works (Xyris Software, 2018), utilising the inbuilt food databases: New Zealand FOODfiles 2016, Nestle Baby Products 2018, Nutricia 2015, and Nutricia Early Life Nutrition 2014 supplied by the software. For the food items not available, the nutrient information was entered from the nutrition information panel of the product.
Breastmilk intake was determined from the number of feeds that the baby received each day and the typical breastfeeding duration. The breastfeed volume was estimated as 10 ml/min for feeds less than 10 min and 100 ml for feeds greater than 10 mins, and the nutrient content of breastmilk was estimated using published reference data [17,18]. For other milk types and fluids, the intake was measured by the number of feeds received each day and the typically reported volume (ml) of bottle feeds and each drink, respectively. Assumptions for the amount of formula milk powder in grams were made using the standard dilution recommendations for each brand.
The intake of solid foods was determined by the reported frequency of consumption in the last four days, along with estimated portion sizes. For cooked items, portion sizes were estimated as cups or tablespoons and for biscuits, bread, fruits, and snacks, as pieces. The food items in the CFFQ were matched to the food names in the New Zealand food composition tables to estimate the intake in grams (from the reported portion sizes) and the corresponding nutrient values.

Automated Analysis
An automated algorithm was written using Base SAS ® 9.4 Software (SAS Institute Inc., Cary, NC, USA) by multiplying the average per day consumption of each food by the nutrient content of the portion size and adding these products across all food items. Additional recorded items not included in the CFFQ food list were excluded from the automated algorithm. For commercially prepared baby foods, a single representative brand was chosen a priori for this validation study. Two authors (KM and CM) checked the code in an iterative manner, comparing the Food Works and automated algorithm output to eliminate any coding errors.

Statistical Analysis
Statistical analysis was performed in JMP 14.2.0 (SAS Institute Inc., Cary, NC, USA). Categorical data are presented as number and percentage, and continuous data as mean and standard deviation. The Bland-Altman method was used to compare the automated algorithm and Food Works analysis, with agreement presented as bias, determined as mean difference (MD) with 95% confidence interval (CI), and limits of agreement (LOA), calculated as 1.96 standard deviations from the bias [19]. If the bias is significant (confidence limits exclude zero), then the methods are deemed to differ systematically, on average. The LOA describes the range of differences (95%) between methods that may be observed with any given paired measurement. To assess the accuracy of the automated algorithm for identifying infants with adequate nutritional intake, the proportion of children meeting Recommended Daily Allowances (RDA) for protein, zinc, and iron and Estimated Energy Requirements (EER) were compared between methods using the kappa coefficient. Assessment of individual adequacy of intake for other nutrients was not possible as only Adequate Intake (AI) values, representing population mean values, were available under 12 months [20,21].
There is no widely accepted method for calculating sample size in method-comparison studies, but it is generally accepted that at least 50 paired measurements are needed to assess the level of agreement [19,22]. Therefore, we aimed to analyse 50 questionnaires using both the methods.

RESULTS
Of the 34 participants, 16 completed both the 9-and 12-month CFFQ, 9 completed only the 9-month questionnaire, and 9 completed only the 12-month questionnaire. The mean maternal BMI was 30.3 ± 9.8 kg/m 2 , and 38% of mothers were Asian, 24% were European, 23% were Pacific, and 15% were Māori ( Table 1). All infants were born at term, and 59% were male. Most women initiated breastfeeding, but only 32% of infants had continued breastfeeding at ≥9 months, and 88% had ever received formula milk. Solids were commenced at a mean (SD) age of 5.1 (0.7) months. The mean (SD) weight of infants at 9 months was 8.9 (0.9) kg and 10.0 (1.0) kg at 12 months.
There was no significant bias in values obtained by automated algorithms compared with the Food Works       Table 2, Figure 1A), nor for carbohydrate or fat intake (Figures 1C and 1D). The automated algorithm slightly under-estimated protein intake by -0.4 g (95% CI -0.7, -0.1, p=0.02) ( Figure 1B). This difference was due to the varying nutrient content of commercially prepared baby food brands. For energy and all macronutrients, the limits of agreement were within 5% to 10% of mean values.
There was a small negative bias with the automated algorithm in the estimation of saturated fat intake (MD -0.1 g, 95% CI -0.2, 0.0, p=0.004), PUFA (MD -0.1 g, 95% CI -0.1, 0.0, p=0.01), and dietary fibre (g) (MD -0.3 95% CI -0.5,-0.1, p=0.03) but no bias was observed for MUFA. There was no bias in the estimates for the intake of fat-and water-soluble vitamins, except for vitamin K where there was a small positive bias (MD 0.5 g, 95% CI 0.3, 0.7, p<0.001) and niacin, where there was a small negative bias (MD -0.5 g, 95% CI -0.7, -0.4, p <0.001). There was no bias in estimates of mineral and trace element intake. The limits of agreement for vitamins and minerals were within 3% to 18% of the mean values.
The number of infants meeting the EER for energy and RDA for protein, iron, and zinc was 29 (58%), 46 (92%), 14 (28%), and 29 (58%), respectively. There was complete concordance for these outcomes between the automated algorithm and the Food Works analysis (all kappa coefficients =1).

DISCUSSION
We developed an automated algorithm to estimate the nutrient intake from a previously validated CFFQ for infants aged 9 to 12 months. This study compared the estimates obtained from the automated algorithm and those obtained from nutrition analysis software. We found that the automated algorithm provided accurate estimates, compared with nutritional analysis software, for the intake of energy and most macronutrients and micronutrients (no bias and narrow limits of agreement), except for a slight negative bias in estimating protein (0.4 g), saturated fat (0.1 g), PUFA (0.1 g), dietary fibre (0.3 g) and niacin (0.5 g). These small differences are unlikely to be clinically significant and did not affect the assessment of dietary protein adequacy. There are two possible reasons for the small negative bias associated with the automated algorithm for several nutrients. First, underestimation of intake for some nutrients in the automated algorithm, such as saturated fat, PUFA, dietary fibre, and niacin, may have been due to the exclusion of non-standard items that were recorded by 6% of participants as free text, e.g., frooze balls, milo drinks, and potato sticks. These additional non-standard items were coded in the Food Works analysis but were not able to be included in the automated analysis. Second, we could not account for the different types of brands of baby foods in the automated algorithm as these were recorded as free text in the CFFQ. Instead, we used the nutrient values of a single brand (Heinz Wattie's Ltd., Hastings, New Zealand) that is commonly consumed and widely available in New Zealand supermarkets. However, some participants reported using other baby food products, such as Only Organic and Heinz Organic, which have a higher content of protein and dietary fibre compared to similar items produced by Wattie's. Accounting for commercially prepared baby foods in the nutritional analysis is challenging as they form a major part of the diet for many infants [23], and there is a wide range of products available.
Nevertheless, despite these limitations, the automated algorithm produced acceptable estimates of nutrient intake of infants aged 9 to 12 months from the CFFQ while dramatically improving efficiency. The SAS computational time for analysis of 50 CFFQ was less than 2 minutes, whereas manual entry of the CFFQ data into Food Works took approximately 29 hours (35 minutes per questionnaire). Therefore, this tool offers researchers substantial efficiency gains and makes it feasible to use the CFFQ in large clinical studies. Furthermore, with automation, the potential for transcription errors with manual data entry and coding is removed, which may be particularly important in large clinical studies involving multiple personnel.
There have been several previous reports of automated analysis of FFQ in children and adults [24][25][26][27][28]. These studies used a dedicated computer application program to analyse food intake directly from an electronic FFQ and generated nutrient output using the food composition tables specific to the population. However, to our knowledge, no study has reported the use of an electronic FFQ and automated nutritional analysis specifically for infants. Our approach of using a SAS code, combined with an online validated CFFQ provides an efficient, flexible, cost-effective solution for automated generation of nutrient intake estimates in infants without the need for proprietary software.
Although our automated algorithm provided acceptable nutrient estimates compared with manual Food Works analysis, several improvements are possible. For future analysis in large cohort studies or clinical trials, bias may be minimised by substituting the median value of the commonly consumed nonstandard items and the use of a weighted average of the nutrient composition of all commercially prepared baby foods. Additionally, the inclusion of an additional question in CFFQ containing a drop-down list of common commercial baby foods brands will also improve the nutrient estimates.

CONCLUSION
Our method-comparison study has successfully shown that the developed automated algorithm is an efficient tool to estimate nutrient intakes from the CFFQ. Compared to the traditional method of analysis, it has provided accurate estimates for the intake of energy and most nutrients. Researchers can incorporate median values or enlist the non-standard food items and commercially prepared baby foods to minimise the negative bias associated with few nutrients. The systemic implementation of this algorithm will decrease the burden of manual entry, lead to faster statistical analysis and certainly improve the accuracy and quality of nutritional data in large clinical trials or cohort studies.

DATA SHARING
Copies of the CFFQ SAS code and REDCap data file can be requested from the corresponding author.