Association of CSF Biomarkers With Hippocampal-Dependent Memory in Preclinical Alzheimer Disease

Objective To determine whether memory tasks with demonstrated sensitivity to hippocampal function can detect variance related to preclinical Alzheimer disease (AD) biomarkers, we examined associations between performance in 3 memory tasks and CSF β-amyloid (Aβ)42/Aβ40 and phosopho-tau181 (p-tau181) in cognitively unimpaired older adults (CU). Methods CU enrolled in the Stanford Aging and Memory Study (n = 153; age 68.78 ± 5.81 years; 94 female) completed a lumbar puncture and memory assessments. CSF Aβ42, Aβ40, and p-tau181 were measured with the automated Lumipulse G system in a single-batch analysis. Episodic memory was assayed using a standardized delayed recall composite, paired associate (word–picture) cued recall, and a mnemonic discrimination task that involves discrimination between studied “target” objects, novel “foil” objects, and perceptually similar “lure” objects. Analyses examined cross-sectional relationships among memory performance, age, and CSF measures, controlling for sex and education. Results Age and lower Aβ42/Aβ40 were independently associated with elevated p-tau181. Age, Aβ42/Aβ40, and p-tau181 were each associated with (1) poorer associative memory and (2) diminished improvement in mnemonic discrimination performance across levels of decreased task difficulty (i.e., target–lure similarity). P-tau mediated the effect of Aβ42/Aβ40 on memory. Relationships between CSF proteins and delayed recall were similar but nonsignificant. CSF Aβ42 was not significantly associated with p-tau181 or memory. Conclusions Tests designed to tax hippocampal function are sensitive to subtle individual differences in memory among CU and correlate with early AD-associated biomarker changes in CSF. These tests may offer utility for identifying CU with preclinical AD pathology.

Identifying cognitively unimpaired older adults (CU) who harbor Alzheimer disease (AD) pathology is critical for developing disease-modifying treatments, which may be most effective during the asymptomatic (preclinical) stage of the disease. 1 Decreases in CSF β-amyloid (Aβ 42 ) and increases in phosphotau 181 (p-tau 181 ) may be the earliest detectable changes in the AD pathophysiologic cascade. 2,3 However, detecting crosssectional associations between CSF and cognition using traditional standardized cognitive tests has posed a challenge. [4][5][6][7][8] Tests designed to tax core functions of the hippocampus and entorhinal cortex-areas of the medial temporal lobe (MTL) affected early on by tangle pathology 9,10 -may be sensitive to subtle variations in memory that are associated with biomarker abnormalities, particularly elevations in CSF p-tau 181 , which is known to associate with tangle pathology. 11 Associative memory (figure 1A) and mnemonic discrimination of studied target stimuli and perceptually similar lure stimuli 12 (figure 1B), tasks in which performance is tightly linked with hippocampal function, [13][14][15] show initial promise. [16][17][18][19] However, the ability of these tasks to detect CSF biomarker abnormalities in CU remains unclear.
This study leverages critical developments in CSF protein analysis, including fully automated 20,21 measurement of ptau 181 , Aβ 42 , and Aβ 40 22-25 to quantify preclinical AD burden in CU. We examine associations between CSF Aβ 42 /Aβ 40 , ptau 181 , and memory performance on standardized memory tests and specialized hippocampal-dependent tests: associative memory and mnemonic discrimination. In an exploratory analysis, relationships between CSF proteins and MTL tau, measured by 18 F-PI-2620, 26,27 were examined. We predicted that performance on specialized hippocampal-dependent tests would be associated with CSF biomarkers of AD, particularly p-tau 181 .

Participants
This study includes data from 153 CU (table 1; aged 60-88 years) of an initial 212 enrolled in the Stanford Aging and Memory Study (SAMS; see figure e-1 for participant flowchart, doi:10.5061/dryad.ngf1vhhrp). SAMS is a fluid and neuroimaging biomarker study focused on neuronal and behavioral measures of the MTL. 15 SAMS eligibility included normal or corrected-to-normal vision/hearing, righthandedness, native English speaking, no history of neurologic or psychiatric disease, a Clinical Dementia Rating 28 global score of zero, and performance within the normal range on a standardized neuropsychological test battery. 15 All participants were deemed cognitively unimpaired during a clinical consensus meeting consisting of neurologists and neuropsychologists.

CSF Data
Participants underwent lumbar puncture at 8 or 9 AM following overnight fasting. CSF was stored in either 1.0 or 0.5 mL aliquots at −80°C. A single aliquot for each participant was used to measure Aβ 42 , Aβ 40 , p-tau 181 , and total tau using the fully automated Lumipulse G system (Fujirebio US, Inc., Malvern, PA) in a single-batch analysis using procedures previously described 29 by the Stanford Alzheimer's Disease Research Center Biomarker Core. Our primary measures of amyloid and tau were the Aβ 42 /Aβ 40 ratio, due to greater specificity and sensitivity for detecting AD-related amyloid pathology than Aβ 42 alone, [22][23][24][25] and p-tau 181 , due to greater specificity for AD than total tau. 30 For comparison, we also report CSF Aβ 42 levels.
We primarily examined CSF Aβ 42 /Aβ 40 and p-tau 181 as continuous variables, but also examined amyloid by binary status. We used a Gaussian mixture modeling approach (R package mclust v4.1 31 ) to classify CU as Aβ+ or Aβ− based on CSF Aβ 42 /Aβ 40 values (figure 2D; see figure e-2 for more detail, doi:10.5061/dryad.ngf1vhhrp). Briefly, this procedure identified a 2-distribution model as optimal, yielding an Aβ 42 / Aβ 40 cutoff of 0.0752. Participants were classified as Aβ+ if they had a greater than 0.5 probability of belonging to the Aβ+ distribution (Aβ 42 /Aβ 40 <0.0752) or as Aβ− if they had a greater than 0.5 probability of belonging to the Aβ− distribution (Aβ 42 /Aβ 40 > 0.0752). For visualization only, we also categorized participants into tau+ and tau− groups. The top quartile of CSF p-tau 181 concentration (>42 pg/mL) was used to define tau+, as the distribution showed a continuum of values. This binary classification was used to plot performance as a function of combined Aβ/tau status (Aβ−/tau−, Aβ+/ tau−, Aβ−/tau+, Aβ+/tau+), as described in the biomarker framework of AD. 32 Tau PET Data An exploratory analysis of the relationship between CSF proteins and MTL tau, measured by 18 F-PI-2620 PET, was conducted in 32 participants who had both CSF and tau PET data available (table 1). The data collection and image processing procedures, along with the exploratory outcomes, are reported in the supplement (method e-1; figure e-3, doi:10. 5061/dryad.ngf1vhhrp).

Episodic Memory Measures Standardized Neuropsychological Delayed Recall
The composite delayed recall score reflected delayed recall performance across (1) the logical memory subtest of the (A) Schematic of the associative memory paradigm, reproduced from reference 15 (creativecommons.org/licenses/by/4.0/). Participants intentionally studied word-picture pairs. At test, they viewed studied words intermixed with new words, and were asked to recall the picture associated with each word, if old. Participants responded "face" or "place" if they remembered the associated picture or picture category; "old" if they remembered the word but could not recollect the associate; "new" if they did not remember the word as studied. (B) Schematic of the mnemonic discrimination paradigm. Participants incidentally encoded objects and memory was assessed using a modified recognition memory test with perceptually similar lures, ranging from high (L1) to low similarity (L5), as well as novel (non-lure) foils. Correct responses are indicated next to each stimulus. (C) Performance on the associative memory task. Memory for studied words, irrespective of memory for the associate (old/new d9), was higher than memory for the associations (associative d9). (D, E) Mnemonic discrimination performance by target-lure similarity. Both lure/new d9 (D) and old/lure d9 (E) increased as target-lure similarity decreased. ISI = interstimulus interval.
Wechsler Memory Scale, (2) the Hopkins Verbal Learning Test-Revised, and (3) the Brief Visuospatial Memory Test-Revised. Composite scores were computed by first z-scoring individual subtest scores using the full SAMS sample as reference and then averaging. Data were available from all 153 participants.

Associative Memory
The associative memory task (figure 1A) was administered concurrent with fMRI as previously described. 15 Briefly, this task assessed memory for word-picture pairs comprising concrete nouns (e.g., "banana," "violin") paired with pictures of famous faces (e.g., "Queen Elizabeth," "Ronald Reagan") or well-known places (e.g., "Golden Gate Bridge," "Niagara Falls"). The task consisted of 5 alternating study and test blocks. Each study block included 12 word-face and 12 word-place pairs; participants were instructed to form a link between the word and picture presented. In each test block, participants saw a mix of 24 studied words and 6 novel (foil) words. Memory was assessed using an associative cued-recall test accompanied by a button response: participants selected "face" or "place" if they remembered the word and could recall the associated picture or picture category; they selected "old" if they remembered the word but could not recall the associated picture or picture category; they selected "new" if they did not remember studying the word.
Associative memory performance was estimated using a sensitivity index, associative d9, where hits were defined as correct associative category responses to studied words and false alarms were defined as incorrect category responses to new words. Thus, associative d9 = Z("correct associate category"|old) − Z("associate category"|new). To assess basic task comprehension and the ability to make discriminations between studied and novel words, we also calculated an old/new sensitivity index. Here, hits were defined as correct old responses to studied words, irrespective of successful memory for the associate, and false alarms rate was defined as any incorrect old response to novel words. Thus, old/new d 9= Z("old" + "face" + "place"|old) − Z("old" + "face" + "place"| new). Analyses included data from 128 participants (table 1; figure e-1, doi:10.5061/dryad.ngf1vhhrp).

Mnemonic Discrimination
The mnemonic discrimination task (figure 1B) was administered using previously described measures and instructions 12 (task and stimuli are available at github.com/celstark/ MST). During an incidental encoding phase, participants made indoor/outdoor judgments for 128 pictures of everyday objects. Participants then performed a surprise memory test, in which half of the studied objects (64) were intermixed with 64 perceptually similar lure objects and 64 novel (dissimilar) objects. Participants were to respond "old" if they remembered the object as having been studied, "similar" if they remembered the object as similar, but not identical, to a studied object, or "new" if they remembered the object as not having been studied. Trials with a biologically implausible reaction time (<400 ms; M 1.68 trials/participant; SD 3.36) and trials in which participants did not respond (M 10.50 trials/participant; SD 9.47) were excluded from analysis.
Of particular interest in this task is the ability to correctly identify lures as "similar," avoiding the tendency to label lures as "old"; this ability is thought to be hippocampal-dependent. Due to the 3-response task design, there are 2 measures of memory sensitivity (d9) that can been calculated to quantify lure discrimination ability: (1) lure/new d9-the ability to correctly classify perceptually similar lures and differentiate them from novel objects, as Z("similar"|lure) − Z("similar"| novel foil); and (2) old/lure d9-the ability to correctly endorse studied objects and avoid the propensity to incorrectly endorse lures as old, as Z("old"|target) − Z("old"|lure). Although related (r = 0.53), the former measure may be particularly sensitive to hippocampal function. 14 The 64 lures systematically varied in perceptual similarity to the studied targets (figure 1B), ranging from level 1 (high perceptual similarity; most difficult) to 5 (low perceptual similarity; least difficult). For each of the 5 similarity levels, 13 lures were presented, except for level 1, in which 12 lures were presented. Thus, each lure discrimination measure was computed overall, as described above, as well as at each level of target-lure similarity (for example, Z["similar"|lure bin 1] − Z ["similar"|novel foil]; Z["old"|target] − Z["old"|lure bin 1]). In addition to these lure discrimination measures of primary interest, we also computed old/new d9-the ability to differentiate between studied objects and novel objects, which is not selectively dependent on hippocampal function 33 -as Z("old"|target) − Z("old"|novel foil). Analyses included data from 133 participants (table 1; figure e-1, doi:10.5061/dryad. ngf1vhhrp).

Statistical Analysis
Statistical analyses were conducted in R version 3.3.1. Multiple linear regression was used to examine the relationship between CSF proteins and memory. Prior to analysis, all continuous predictors were z scored across participants; standardized coefficients are reported. All models included age, sex, and years of education as nuisance regressors. Linear mixed-effects models were used to examine the relationship between CSF proteins and mnemonic discrimination as a function of target-lure similarity (5 levels, treated as ordinal variable and centered), with the inclusion of (1) an interaction term of CSF protein by similarity, (2) interaction terms for age, sex, and education by similarity, and (3) a random intercept and slope for each participant. To visualize interactions, we extracted the slope across lure bins for each participant and plotted against CSF proteins.
To mitigate the effect of influential data points, such as individuals with high p-tau 181 , on the outcomes, we used bootstrap resampling with 5,000 iterations of data sampled with replacement to determine effect significance. Thus, for all analyses relating continuous CSF proteins to memory measures, we report 95% confidence intervals (CIs), and consider effects significant only if 0 does not fall within the 95% CI of the bootstrapped estimate of the effect. For mediation analyses, the coefficient of the indirect path was calculated as the product of direct effects a × b, and considered significant if 0 does not fall within the 95% CI of the bootstrapped estimate. All findings were replicated when log-transforming CSF p-tau 181 values as an alternative approach to mitigating the influence of extreme values (data not shown).

Standard Protocol Approvals, Registrations, and Patient Consents
All participants provided informed consent in accordance with a protocol approved by the Stanford institutional review board.

Data Availability
Anonymized data will be made available to any qualified investigator upon request.

Standardized Neuropsychological Delayed Recall
Delayed recall composite score declined with age (table 2

Associative Memory
The primary measure of interest from the associative memory task is associative d9-the ability to remember the category of the image initially paired with the cue word ( figure 1C; see figure e-5 for results as a function stimulus category-i.e., face associations and place associations, doi:10.5061/dryad. ngf1vhhrp). Associative d9 declined with age (table 2 and figure  4D), but did not vary by sex or education (all p > 0.175). Including these variables as covariates, we found lower levels of CSF Aβ 42 /Aβ 40 (table 2 and figure 4E) and Aβ+ status (β = −0.537, p = 0.007) were associated with poorer associative d9, whereas Aβ 42 was not (β = 0.049, p = 0.570). Similarly, p-tau 181 was negatively related to associative d9 and the bootstrapped effect was significant (table 2 and figure 4F). When CSF Aβ 42 / Aβ 40 and p-tau 181 were combined in the same model, p-tau 181 remained a significant predictor of associative d9, while age and amyloid were no longer significant (  figure 5B), which demonstrates that pronounced deficits in performance are observed primarily in the A+/T+ group.
The same pattern was observed with respect to old/new d9-the ability to discriminate studied from novel words irrespective of associative recall accuracy (figure 1C), which was highly correlated with associative d9 (r = 0.86). Perfor-

Mnemonic Discrimination
Participants were able to successfully identify studied items as "old" (M 0.85, SD 0.10) and novel foils as "new" (M 0.83, SD 0.11). In contrast, the ability to identify lures as "similar," avoiding the propensity to call lures "old," varied systematically as a function of target-lure similarity, as reflected in the lure/new d9 and old/lure d9 scores ( figure 1, D and E). Specifically, the probability of incorrectly calling a lure "old" (i.e., a false alarm) decreased as lures went from high (M 0.76, SD 0.16) to low similarity (M 0.35, SD 0.20). Likewise, the probability of correctly calling a lure "similar" was least likely when similarity was high (M 0.18, SD 0.14) and systematically improved as similarity decreased (M 0.49, SD 0.25).
We modeled each d9 measure in a linear mixed model context to determine whether there was a linear relationship between lure similarity and performance and whether this relationship varied as a function of demographics and CSF variables. Target-lure similarity significantly affected performance (lure/new d9: β = 0.304, p < 10 −16 ; old/lure d9: β = 0.435, p < 10 −16 ), suggesting that each similarity bin was associated with a d9 increase of 0.30-0.43 (in z-score units) across the entire group (figure 1, D and E). We also observed an education × similarity interaction (lure/new d9: β = 0.056, p = 0.001; old/ lure d9: β = 0.081, p < 10 −5 ), such that as target-lure similarity decreased, fewer years of education were associated with a smaller enhancement in performance. An age × similarity interaction was also observed, but the bootstrapped effect was not significant for lure/new d9 (table 2

Behavioral Predictors of Preclinical AD Pathology
Given that qualitatively similar relationships were observed between memory performance and CSF biomarkers across multiple memory measures, we next assessed which measures might be most sensitive to individual differences in CSF Aβ 42   Step Thus, associative memory and mnemonic discrimination are stronger predictors, relative to delayed recall, of variance in CSF biomarkers in CU. Moreover, these tasks are not redundant, but explain unique variance in p-tau 181 .

Discussion
This study tested the hypothesis that in a CU population, memory assays designed to tax hippocampal function are sensitive to variations in performance associated with underlying preclinical AD pathology. In the present cohort, over a quarter of individuals were identified as Aβ+ based on CSF Aβ 42 /Aβ 40 distribution. We observed significant associations between CSF Aβ 42 /Aβ 40 , p-tau 181 , and 2 specialized assays of hippocampal-dependent memory, where effects of Aβ 42 /Aβ 40 were mediated by associated increases in p-tau 181 . Although relationships were qualitatively similar, a standardized clinical memory measure-delayed recall composite score-did not exhibit significant associations with CSF proteins. Together, these results suggest that assays that are designed to incisively tax hippocampal function have promise for detecting variance in memory related to the presence of preclinical AD pathology in CU.
These findings have relevance for efforts to identify specialized cognitive screening tools for detecting biomarker positivity in CU, which are gaining increasing interest and popularity 18,34-36 due to limitations of standardized neuropsychological assays for detecting subtle variations in cognition among CU, especially when examining crosssectional associations. Our results indicate that both associative cued recall and mnemonic discrimination of perceptually similar objects outperform standardized delayed recall tests with respect to detecting variance related to CSF biomarkers of preclinical AD in CU. Thus, although all 3 memory tests  here are supported by the hippocampus and surrounding medial temporal lobe structures, tasks designed to tax hippocampal computations (e.g., pattern separation, pattern completion) may offer enhanced sensitivity to detect initial changes in performance related to AD pathology. Notably, we found that associative memory and mnemonic discrimination measures explained unique variance in p-tau 181 , suggesting that these tasks may be used in combination to improve detection of preclinical AD pathology in older adults.
Critically, however, such relationships within the mnemonic discrimination task were observed only as a function of change in performance (i.e., slope) across levels of target-lure similarity: whereas all participants performed poorly on the most difficult discriminations (i.e., high lure-target similarity), only biomarker-positive individuals failed to systematically improve as discriminations became easier (i.e., lower lure-target similarity; figure 5). The sensitivity of an individual's slope across similarity bins to variance in AD biomarkers replicates recent work using a spatial mnemonic discrimination task and amyloid PET to measure preclinical AD burden. 19 Notably, this pattern also parallels boundary conditions of the ability of the hippocampus to differentiate similar inputs: at extremely high levels of perceptual overlap, even a functional hippocampus will often fail to distinguish between events 37 ; this level therefore lacks utility for measuring differences across CU. As overlap decreases across lure bins, but events nevertheless share overlapping features, performance improves in a linear fashion, reflecting hippocampal-dependent computations supporting performance. The magnitude of improvement across similarity levels, or lack thereof, may provide an index of hippocampal functional integrity. This pattern highlights important boundary conditions regarding the use of mnemonic discrimination tasks for detecting variance related to AD biomarkers in CU, suggesting it may be optimal to measure change in performance across successive levels of difficulty.
More broadly, this pattern is consistent with current hypotheses that the sensitivity of these tasks to biomarker levels in CU is related to links between performance and functional integrity of the hippocampus-entorhinal circuit, areas that are particularly vulnerable to early tangle pathology in CU. For example, we previously demonstrated in the SAMS cohort that the magnitude of hippocampal activity was tightly coupled with the likelihood of accurate associative cued recall on individual trials, and predicted associative d9 across individuals. 15 Similarly, prior work indicates that discrimination of perceptually similar lures from studied objects engages the hippocampus and anterolateral entorhinal cortex, and that functional imbalances of this circuit are associated with worse performance. 13,14 Thus, performance may also be sensitive to alterations in hippocampal-entorhinal functional integrity, such as those arising due to tangle pathology. Consistent with this possibility, prior work in CU has observed associations between MTL tau, altered activity, 18 and functional connectivity 38,39 in these areas and memory performance.
The present observation that CSF p-tau 181 was more proximal to behavior, mediating relationships between Aβ 42 /Aβ 40 and performance, is compatible with these hypotheses. CSF p-tau 181 is an indirect measure of tangle pathology, and longitudinal data indicate that CSF p-tau 181 becomes abnormal relatively early in the disease course, years prior to significant regional uptake using tau PET imaging, 3,40 which provides a measure of focal tangle accumulation. 41 The present results therefore build on findings from Aβ-PET and tau-PET imaging suggesting sensitivity of associative cued recall tasks 16,17 and mnemonic discrimination tasks 18,19,36 to preclinical AD pathology, and provide novel evidence for such relationships using CSF to measure biomarker abnormality in a large sample of CU. Moreover, by measuring amyloid and tau simultaneously, they also provide insights into how these 2 proteins relate to one another and behavior. While the present data are cross-sectional, they suggest that amyloid-dependent increases in p-tau 181 are necessary to observe decrements in memory performance using the present measures (i.e., they are not observed in Aβ+ alone or T+ alone, and overt tau elevations are not present in the Aβ− group). This observation is consistent with prior work highlighting a correlation between amyloid plaque and neurofibrillary tangle deposition, as well as work showing that tangles are a closer proxy of cognitive decline and clinical status compared to amyloid plaques. [42][43][44] Beyond the adoption of specialized tasks, the ability to detect meaningful variability in CSF protein levels within CU may be particularly affected by methodologic precision, given the more limited range in analyte values. We employed fully automated CSF analysis, which reduces experimenterintroduced noise and intralaboratory and interlaboratory variability 45 in the data through incorporation of standardized reference material. 46 We also normalized Aβ 42 by Aβ 40 , which improves sensitivity and specificity for detecting Aβ burden related to AD 22-25 by adjusting for individual differences in Aβ production. Notably, the Aβ 42 /Aβ 40 ratio provided a basis for establishing Aβ positivity within CU in the absence of Aβ-PET or a patient comparison group, yielding a cutoff (<0.0752) that corresponds remarkably well with that from an independent CU cohort (<0.075), 47 despite the use of a different analysis platform (Elecsys). Furthermore, the ratio enabled detection of relationships between amyloid and both p-tau 181 and memory, neither of which was achieved using CSF Aβ 42 alone. These results demonstrate the value of Aβ 40 measurement and suggest it may be particularly critical for detecting initial changes in Aβ in CU.
Importantly, while the present results provide evidence for significant relationships between CSF proteins and specialized tests of hippocampal-dependent memory, we observed qualitatively similar, though nonsignificant, relationships with standardized clinical memory tests. Future work is needed to further define the task characteristics that optimize sensitivity to preclinical AD pathology in CU. Moreover, although these tasks are readily implemented in laboratory contexts, there may be challenges for integration into clinical contexts in their current form. Nevertheless, the present findings add to a growing body of evidence encouraging further exploration of assays that tax hippocampal function for early detection of biomarker positivity in cognitively unimpaired older adults.

Study Funding
Supported by the National Institute on Aging (R01AG048076, R21AG058111, R21AG058859, and P30AG066515), Stanford Wu Tsai Neurosciences Institute, and Stanford Center for Precision Health and Integrated Diagnostics (PHIND). Publication History