Skip to main content

Predicting Alzheimer's disease development: a comparison of cognitive criteria and associated neuroimaging biomarkers



The definition of “objective cognitive impairment” in current criteria for mild cognitive impairment (MCI) varies considerably between research groups and clinics. This study aims to compare different methods of defining memory impairment to improve prediction models for the development of Alzheimer’s disease (AD) from baseline to 24 months.


The sensitivity and specificity of six methods of defining episodic memory impairment (< −1, −1.5 or −2 standard deviations [SD] on one or two memory tests) were compared in 494 non-demented seniors from the Alzheimer’s Disease Neuroimaging Initiative using the area under the curve (AUC) for receiver operating characteristic analysis. The added value of non-memory measures (language and executive function) and biomarkers (hippocampal and white-matter hyperintensity volume, brain parenchymal fraction [BPF], and APOEε4 status) was investigated using logistic regression.


Baseline scores < −1 SD on two memory tests predicted AD with 75.91 % accuracy (AUC = 0.80). Only APOE ε4 status further improved prediction (B = 1.10, SE = 0.45, p = .016). A < −1.5 SD cut-off on one test had 66.60 % accuracy (AUC = 0.77). Prediction was further improved using Trails B/A ratio (B = 0.27, SE = 0.13, p = .033), BPF (B = −15.97, SE = 7.58, p = .035), and APOEε4 status (B = 1.08, SE = 0.45, p = .017). A cut-off of < −2 SD on one memory test (AUC = 0.77, SE = 0.03, 95 % CI 0.72-0.82) had 76.52 % accuracy in predicting AD. Trails B/A ratio (B = 0.31, SE = 0.13, p = .017) and APOE ε4 status (B = 1.07, SE = 0.46, p = .019) improved predictive accuracy.


Episodic memory impairment in MCI should be defined as scores < −1 SD below normative references on at least two measures. Clinicians or researchers who administer a single test should opt for a more stringent cut-off and collect and analyze whole-brain volume. When feasible, ascertaining APOE ε4 status can further improve prediction.


Patients with mild cognitive impairment due to Alzheimer’s disease (MCI) [1] - also known as mild neurocognitive disorder [2] - are considered to be at an early stage of dementia. There are now multiple published criteria sets for identifying these individuals at high risk of progression [13], all of which include at least: 1) subjective concern; 2) an objective cognitive impairment on formal neuropsychological testing in one or more cognitive domains, typically including memory; 3) preservation of functional independence; and 4) no dementia.

Although these criteria have been a major step forward in the conceptualization of MCI, they leave room for considerable ambiguity, particularly regarding the operational definition of objective cognitive impairment. A number of cognitive tests have been proposed that may be useful for identifying objective episodic memory impairment in MCI, specifically measures that assess both immediate and delayed recall, such as word-list learning or paragraph recall [1, 4]. These suggestions are very useful in providing common ground for clinicians and researchers working with MCI cohorts. However, three critical issues remain.

First, it is unclear which cutoff scores should be used to define impairment. Studies examining MCI patients typically report test performance in the range of one to two standard deviations (SD) below age-adjusted and/or education-adjusted norms. However, using a −1 SD cutoff may be overly inclusive, as cognitive performance in healthy older adults often falls below this limit [5] for a variety of non-pathological reasons (e.g., fatigue, anxiety). Conversely, using a −2 SD cutoff may underestimate the number of individuals who are in the earliest phases of the disease process.

Second, it is unclear how many measures should be used in assessing cognition. In memory clinics, diagnosis is typically based on results of a battery of neuropsychological tests including more than one test probing the same cognitive domain. Longitudinal evidence confirms that using at least two tests to establish impairment greatly increases diagnostic accuracy [6]. In research settings, however, MCI diagnosis is often based on a single test. This is potentially problematic, as research has shown that more than one quarter of healthy elderly adults who are tested using a single memory measure obtain scores in impaired ranges (< −1.5 SD), while this number is reduced to 14.1 % when a second test is added [5]. As mentioned above, impaired performance on a single test in otherwise healthy normal adults may be explained by numerous factors such as anxiety, depression, fatigue, or inattention. Thus, this single-test procedure may not be adequate for identifying individuals who are at highest risk of dementia.

Third, it is unclear which cognitive domain(s) should be assessed, if any, in addition to episodic memory. Originally, Petersen’s [3] diagnostic criteria recommended that a distinction be made between single-domain and multiple-domain MCI, with the assumption that this classification would be of heuristic value in determining the probable etiology of the disorder. This recommendation is echoed in Albert and colleagues’ [1] revised criteria as well. Indeed, some longitudinal evidence suggests that these subtypes evolve differently over time [7], suggesting distinct etiological processes. However, the most recent DSM-5 criteria for mild neurocognitive disorder [2] do not discriminate between single-domain and multiple-domain cognitive impairment. Many research studies also do not make this distinction.

In addition, recent guidelines for diagnosing MCI have emphasized the importance of using genetic and imaging biomarkers in addition to neuropsychological testing. The presence of one or two copies of the epsilon 4 allele (ε4) in the apolipoprotein E (APOE) gene is one commonly accepted genetic characteristic believed to increase the risk of development of dementia due to Alzheimer’s disease (AD) [8]. Additionally, metrics obtained from structural magnetic resonance imaging (MRI) that assess neuronal injury, such as total brain atrophy [9, 10], ventricular enlargement [1113], hippocampal (HP) volume loss [14, 15], medial temporal lobe atrophy [16], and possibly the presence of small vessel disease [17], may be informative predictors for the development of AD dementia.

Using data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), the purpose of this study is to determine whether prediction of development of clinical dementia among non-demented participants is improved by: 1) using cutoff scores of −1.0, −1.5 or −2.0 SD to define cognitive impairment; 2) assessing episodic memory using one or two tests; 3) assessing additional non-memory domains; and 4) accounting for commonly used neuroimaging and genetic biomarker data. It was hypothesized that the identification of individuals at risk for the development of dementia would best be predicted by defining objective impairment as performance < −1 SD on two episodic memory tests. Furthermore, it was anticipated that the ability to predict the development of AD would be further optimized by considering performance in at least one other, non-memory domain. Finally, it was expected that the inclusion of imaging and genetic biomarkers known to be associated with AD would further improve prediction.

Materials and methods

Data used in the preparation of this article were obtained from the ADNI database ( on 3 February 2015. The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.


Of the 819 participants enrolled in ADNI-1, those who had neuropsychological and genetic data available at baseline and 24-month follow up were selected for this study (n = 630). A 24-month follow-up period was selected to maximize statistical power and to ensure that harmonized imaging outcome measures were available for the majority of the sample. Of these 819 participants, those with a diagnosis of probable AD at baseline were excluded (n = 136). Individuals with a history of neurological or psychiatric illness or substance abuse, or without a study partner able to corroborate reports of functioning, were not eligible for ADNI; complete eligibility criteria for the ADNI study as a whole are described at The final sample consisted of the remaining 494 non-demented participants. According to the assigned diagnoses in the ADNI database, 294 of these participants were classified as having MCI, and the remaining 200 were classified as cognitively normal. All participants (201 women, 293 men) were 55–89 years old at baseline (mean = 75.3 ± 6.4) and had 6–20 years of education (mean = 15.9 ± 2.8).


Cognitive measures

A neuropsychological battery was administered to all participants upon admission to ADNI, and raw scores were downloaded from the ADNI Neuropsychological Battery table. Of interest in the present study are tests that measure general cognition (Mini-mental state exam (MMSE)), episodic memory (Logical memory story A delayed recall (LM-II), Rey auditory verbal learning test (AVLT)), language (Category fluency, Boston naming test (BNT)) and executive functioning (Trails A and B). A derived Trails B/Trails A ratio was calculated to obtain a relatively independent measure of executive control, as has been suggested by other authors [18]. Raw scores were transformed to standardized scores (z scores or scaled scores (SS)) based on published age-adjusted norms for the AVLT [19], Category fluency [20], BNT [21] and Trails A & B [18]. Education-adjusted z scores for LM-II story A were obtained using a web-based calculator [22] based on data from a large published report [23]. Higher z scores or SS represent better performance, with the exception of Trails A and B in which higher scores represent poorer performance (i.e., longer time to complete the test).

Outcome measure

The presence or absence of clinically probable AD was assessed at 24 months and defined as: 1) MMSE <26; 2) Clinical Dementia Rating (CDR) ≥0.5; and 3) positive NINCDS/ADRDA criteria for probable AD [24].

Imaging and genetic biomarkers

Neuroimaging-based biomarkers were obtained from downloaded ADNI database tables (hierarchical parcellation of MRI using multi-atlas labeling methods (UPENN); white matter hyperintensity volumes (UCD)). Whole brain atrophy was assessed using the brain parenchymal fraction (BPF), which was calculated as a ratio of total parenchymal volume (gray matter (GM) and white matter (WM)) to total cranial vault (TCV) volume as follows:

$$ \mathrm{B}\mathrm{P}\mathrm{F} = \left(\mathrm{G}\mathrm{M} + \mathrm{W}\mathrm{M}\right)/\mathrm{T}\mathrm{C}\mathrm{V}. $$

To assess medial and focal atrophy, head-size-corrected ventricular cerebrospinal fluid (vCSF) and HP volume were automatically segmented using previously published and validated methods [11, 14]. Small vessel disease burden was assessed using whole brain white matter hyperintensity (WMH) volumes [25]. Full segmentation methodological details can be obtained from ADNI (see ADNI1_Methods_UCD_WMH_Volumes_Methods.pdf and ADNI_Total_Cranial_Vault_Segmentation_Method_20121108.pdf). In addition, the presence of one or two copies of the APOE ε4 allele was determined for all participants as per standard ADNI protocol.

Statistical analyses

Six binary variables were created based on scores < −1.0, −1.5 or −2.0 SD on one (LM-II or AVLT delayed recall) or two (LM-II and AVLT delayed recall) memory tests, and participants were classified as above or below each cutoff. The predictive accuracy of these six cutoffs was tested using the area under the curve (AUC) for receiver operating characteristic (ROC) analysis. The minimum value for an AUC to be considered clinically significant was >0.75 [26]. Hanley and McNeil’s [27] method was used to test for statistical differences between AUC values. Cutoff scores with AUC values >0.75 were then entered into separate binary logistic regression analyses with hierarchical designs, with probable AD at 24 months as the binary (yes/no) dependent variable. In all models, age, sex, education, MMSE and the selected cutoff score were entered in a first block. A second block included performance on non-memory cognitive measures, specifically standardized Category fluency, BNT, and Trails B/A- derived scores. A third block assessed the potential added predictive value of biomarkers that are known to be associated with probable AD: BPF, vCSF volume, total HP volume, WMH volume, and APOE ε4 status. We verified that all variables met multicollinearity and linearity assumptions.

Last, in order see whether participants whose performance fell above and below the best selected cutoff scores were phenotypically different, multivariate analysis of covariance (MANCOVA) was used to compare cognitive and neuroimaging characteristics between these two groups, with age, sex and education entered as covariates. Highly skewed variables exhibiting non-normal distributions were log-transformed (WMH, vCSF) or inverse-transformed (Trails B/A ratio) prior to analysis. Category fluency scores did not meet the equal variance assumption and were therefore log-transformed. Dichotomous variables were compared using the chi-square test.


At 24 months post-baseline, 112 participants (22.7 %) had received a diagnosis of AD. Sensitivity, specificity and accuracy of the different cutoff scores are illustrated in Fig. 1. On ROC analysis there were three cutoffs with AUC values >0.75. A cutoff of < −1 SD on two memory tests (AUC = 0.80, standard error (SE) = 0.02, 95 % CI 0.75, 0.84) had 75.91 % accuracy in correctly identifying patients who would later develop probable AD (97 true positives) and those who would not (278 true negatives). A cutoff of < −1.5 SD on one memory test (AUC = 0.77, SE = 0.02, 95 % CI 0.73, 0.81) had 66.60 % accuracy (108 true positives, 221 true negatives). A cutoff of < −2 SD on one memory test (AUC = 0.77, SE = 0.03, 95 % CI 0.72, 0.82) had 76.52 % accuracy (87 true positives, 291 true negatives). The AUC values for the three cutoff scores were not statistically different (all comparisons p >0.05, one-tailed).

Fig. 1
figure 1

Sensitivity, specificity and accuracy of different cutoff scores in 494 non-demented participants at baseline. AD Alzheimer’s disease, LM-II Logical memory story A delayed recall, AVLT Rey auditory verbal learning test

Seven participants were excluded from subsequent analyses because they had missing data (two had missing WMH data, two had missing Trails B data, one had missing BNT data, and two had missing Trails B and BNT data). First, on logistic regression model to test the added value of non-memory measures and biomarkers, in addition to a cutoff of < −1 SD on two memory tests (B = 2.55, SE = 0.33, p <0.001), MMSE was a significant predictor of future AD (B = −0.34, SE = 0.08, p <0.001). Only the presence of two APOE ε4-positive alleles (B = 1.10, SE = 0.45, p = 0.016) further improved prediction. Altogether, this model accounted for 83.4 % of the variance in risk of probable AD (Table 1).

Table 1 Variables predicting AD in addition to < −1 SD on two episodic memory tests

In the second model, in addition to a cutoff of < −1.5 SD on one memory test (B = 3.09, SE = 0.54, p <0.001), significant predictors of probable AD were MMSE (B = −0.32, SE = 0.07, p <0.001) and the Trails B/A ratio in the non-memory cognitive measures block (B = 0.27, SE = 0.13, p = 0.033). Biomarkers that significantly improved prediction included BPF (B = −16.58, SE = 7.64, p = 0.030) and presence of two APOE ε4-positive alleles (B = 1.05, SE = 0.45, p = 0.021). This model accounted for 82.3 % of the variance in risk of probable AD (Table 2).

Table 2 Variables predicting AD in addition to < −1.5 SD on one episodic memory test

In the third model, in addition to a cutoff of < −2 SD on one memory test (B = 2.04, SE = 0.28, p <0.001), significant predictors of probable AD were MMSE (B = −0.40, SE = 0.08, p <0.001) and the Trails B/A ratio in the non-memory cognitive measures block (B = 0.31, SE = 0.13, p = 0.017). Presence of two APOE ε4-positive alleles (B = 1.07, SE = 0.46, p = 0.019) further improved prediction. This model accounted for 81.9 % of the variance in risk of probable AD (Table 3).

Table 3 Variables predicting AD in addition to < −2 SD on one episodic memory test

Participants who scored above (n = 291) and below (n = 196) a cutoff score of < −1 SD on two memory tests were compared using MANCOVA. Levene’s test indicated that both groups had equal variances (all variables p >0.05). As summarized in Table 4, it was found that those with episodic memory scores below the cutoff had poorer performance on Category fluency (F (4,482) = 14.23, p <0.001), BNT (F (4,482) = 25.60, p <0.001), and Trails B/A ratio (F (4,482) = 7.18, p <0.001). For brain morphology, patients below the cutoff had smaller BPF (F (4,482) = 49.02, p <0.001), smaller left (F (4,482) = 44.83, p <0.001) and right HP volumes (F (4,482) = 41.03, p <0.001), more vCSF (F (4,482) = 28.99, p <0.001) and smaller WMH volume (F (4,482) = 8.69, p <0.001).

Table 4 Characteristics (mean (SD)) of participants above and below selected cutoffs

Participants who scored above (n = 223) and below (n = 264) a cutoff score of < −1.5 SD on one memory test were compared in a second MANCOVA. Two variables violated Levene’s test (Trails B/A ratio and left HP volume), likely due to the large sample sizes. Inspection of the data showed that the variance between both groups was highly similar (in the above-cutoff and below-cutoff groups, the respective variances were 0.010 and 0.016 for Trails B/A ratio, and 0.001 and 0.001 for left HP volume), and therefore parametric analyses were retained. Results revealed that individuals with episodic memory scores below the cutoff had poorer performance on Category fluency (F (4,482) = 14.24, p <0.001), BNT (F (4,482) = 24.00, p <0.001), and Trails B/A ratio (F (4,482) = 3.81, p = 0.005). They also had smaller BPF (F (4,482) = 45.00, p <0.001), smaller left (F (4,482) = 27.38, p <0.001) and right HP volume (F (4,482) = 33.42, p <0.001), more vCSF (F (4,482) = 28.94, p <0.001) and larger WMH volume (F (4,482) = 8.90, p <0.001).

Participants who scored above (n = 313) and below (n = 174) a cutoff score of <2 SD on one memory test were compared in a third MANCOVA. Trails B/A ratio violated Levene’s test of equality of error variances, but again inspection of the data showed highly similar variances between the above-cutoff (0.010) and below-cutoff (0.016) groups. Parametric analyses were thus retained. Individuals with episodic memory scores below the cutoff had poorer performance on Category fluency (F (4,482) = 11.61, p <0.001), BNT (F (4,482) = 19.23, p <0.001), and Trails B/A ratio (F (4,482) = 3.40, p = 0.009). They also had smaller BPF (F (4,482) = 45.07, p <0.001), smaller left (F (4,482) = 31.79, p <0.001) and right HP volume (F (4,482) = 35.16, p <0.001), more vCSF (F (4,482) = 28.72, p <0.001) and larger WMH volume (F (4,482) = 9.33, p <0.001).


This study aimed to assess how various cognitive, neuroimaging and genetic measures collected at baseline can be used to predict the development of probable AD dementia at 24 months in a sample of elderly participants obtained from ADNI. By assessing a series of normative cutoff scores from cognitive test results, the number of episodic memory and non-memory tests used to assess cognitive performance, and other commonly used neuroimaging and genetic biomarkers, a set of recommended criteria was established which may be used in future investigations to improve prediction for the development of probable AD in the elderly.

Consistent with our initial hypotheses, performance < −1 SD on two memory tests (LM-II and AVLT delay) had the best trade-off between sensitivity and specificity for predicting probable AD, followed by performance < −1.5 SD and < −2 SD on one memory test (LM-II). These results suggest that to maximize diagnostic certainty, a minimum of two measures should ideally be used to assess episodic memory performance and impairment should be defined as scores at least 1 SD below appropriate normative references on both measures. Jak and colleagues [28] were among the first to recommended establishing impairment on at least two measures within a cognitive domain as the best way to increase sensitivity while maintaining reliability, and other authors have since corroborated the value of this approach [6, 2931]. Our results further indicate that clinicians or researchers with limited resources who administer only a single memory test should opt for a much more stringent cutoff (i.e., −2 SD below normative reference data) to determine episodic memory impairment with comparable accuracy to two measures. Applying a −1.5 SD cutoff to a single test should be avoided when possible, as it remains highly prone to false positive diagnostic errors (c.f. [30, 31]) which reached nearly one-third of the sample (32.6 %) in the present study.

The only variable that improved prediction above and beyond episodic memory testing using two measures was APOE status, consistent with previous research recognizing APOE ε4-positive status as a major risk factor for subsequent AD (see [32] for a review). When only one test was used to assess episodic memory, prediction of dementia was improved using a non-memory test, specifically the ratio of Trails B/A, considered to be a measure of executive control [18]. Predictive accuracy was further increased using APOE ε4 status and whole-brain atrophy (as indexed by brain parenchymal fraction). These interesting results suggest that thorough episodic memory testing using several measures is successful in predicting subsequent dementia with at least as much accuracy as using one memory test plus additional memory tests and biomarkers. It has previously been reported that the use of sensitive neuropsychological instruments are at least as effective in predicting AD as imaging biomarkers [3336]. Other authors have also reported that the use of a single memory test is not optimal in predicting AD, and that adding information on brain atrophy and/or cerebrospinal fluid biomarkers is necessary to improve predictive accuracy in regression models [35, 37, 38]. We corroborate these findings, and extend them to specify that “impairment” should be defined as performance more than 1 SD below normative data.

Certain limitations must be considered in interpreting these data. First, the ADNI study specifically set out to recruit patients who represented relatively pure cases of MCI and dementia of the Alzheimer’s type, who are appropriate for clinical trials; this is evident in patients’ relatively low burden of WMH [39] (thought to reflect underlying vascular disease [40]). As such, the sample primarily includes individuals whose suspected etiology is AD, and whose primary (and often only) cognitive deficit involves memory. While ADNI provides a large and rich database to study individuals who are at high risk of developing AD, findings generated from these data have limited generalizability to real-world patient populations [39]. Other, more inclusive cohorts of individuals with MCI are needed. In addition, the standardized scores used in this study were derived from published age-adjusted norms for each test. It is possible that the use of local norms may produce different results (e.g., see [41]).

We have shown that diagnostic accuracy can be improved by approximately 10 % by administering an extra memory test to evaluate memory capacities in persons suspected of MCI. This improved accuracy is mostly the result of reducing false positive results, which other authors have shown are inflated when using a single test [31]. Although adding a test to the diagnostic battery resulted in some patients being missed at baseline, who went on to develop AD at 24 months, our findings suggest that this trade-off is altogether fair. An incorrect diagnosis of AD has serious implications for research and clinical practice. First, studies that employ only LM-II to test for memory impairment in participants are effectively pooling true MCI cases with those who are likely cognitively normal, thus potentially weakening the robustness of the research findings and limiting their generalizability. Clinically, the consequences of an incorrect diagnosis include needless testing, pharmacotherapy, and anxiety incurred by the patient and family. Also, inaccurate diagnosis implies that alternative (potentially reversible) causes of cognitive changes are not being investigated.

In closing, we must acknowledge that expanding cognitive batteries to include an extra memory test has some disadvantages. Namely, more clinician time and additional test materials are required, and research protocols will be slightly lengthened. However, we believe that these caveats are greatly outweighed by the benefit of improved accuracy, and that an additional memory measure should be added to clinical and research cognitive batteries to the extent that it is feasible.


The findings of our study in the ADNI cohort suggest that neuropsychological testing can predict decline with high accuracy regardless of biomarkers, when memory is assessed using delayed recall of a short story and a word list, using a cutoff of < −1 SD below normative references. This criterion provides the optimal trade-off between specificity and sensitivity for predicting conversion to AD at two years. The increased accuracy that this criterion provides decreases the probability of misdiagnosing a patient and avoids needless testing, pharmacotherapy and anxiety, and provides a high-accuracy, low-cost strategy for identifying individuals at highest risk of dementia. In situations where it is only feasible to administer a single memory test, collecting information on non-memory performance and imaging or genetic biomarkers is necessary to optimize diagnostic accuracy.



Alzheimer’s disease


Alzheimer’s Disease Neuroimaging Initiative


apolipoprotein E


area under the curve


Rey auditory verbal learning test


Boston naming test


brain parenchymal fraction


Clinical Dementia Rating


Food and Drug Administration


gray matter




logical memory story a delayed recall


multivariate analysis of covariance


mild cognitive impairment due to Alzheimer’s disease


Mini-mental state exam


magnetic resonance imaging


National Institute on Aging


National Institute of Biomedical Imaging and Bioengineering


National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association


positron emission tomography


receiver operating characteristic


standard deviation


scaled scores


total cranial vault


ventricular cerebrospinal fluid


white matter


white matter hyperintensity


  1. Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7:270–9.

    Article  PubMed Central  PubMed  Google Scholar 

  2. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 5th ed. Arlington, VA: American Psychiatric Publishing; 2013.

    Google Scholar 

  3. Petersen RC. Mild cognitive impairment as a diagnostic entity. J Intern Med. 2004;256:183–94.

    Article  CAS  PubMed  Google Scholar 

  4. Belleville S, Fouquet C, Duchesne S, Collins DL, Hudon C. Detecting early preclinical Alzheimer’s disease via cognition, neuropsychiatry, and neuroimaging: qualitative review and recommendations for testing. J Alzheimer’s Dis. 2014;42:S375–82.

    Google Scholar 

  5. Brooks BL, Iverson GL, Holdnack JA, Feldman HH. Potential for misclassification of mild cognitive impairment : A study of memory scores on the Wechsler Memory Scale-III in healthy older adults. J Int Neuropsychol Soc. 2008;14:463–78.

    Article  PubMed  Google Scholar 

  6. Loewenstein DA, Acevedo A, Potter E, Schinka JA, Raj A, Greig MT, et al. Severity of medial temporal atrophy and amnestic mild cognitive impairment: selecting type and number of memory tests. Am J Geriatr Psychiatry. 2009;17:1050–8.

    Article  PubMed  Google Scholar 

  7. Summers MJ, Saunders NLJ. Neuropsychological measures predict decline to Alzheimer’s dementia from mild cognitive impairment. Neuropsychology. 2012;26:498–508.

    Article  PubMed  Google Scholar 

  8. Bertram L, Lill CM, Tanzi RE. The genetics of Alzheimer disease: back to the future. Neuron. 2010;68:270–81.

    Article  CAS  PubMed  Google Scholar 

  9. Barnes J, Carmichael OT, Leung KK, Schwarz C, Ridgway GR, Bartlett JW, et al. Vascular and Alzheimer’s disease markers independently predict brain atrophy rate in Alzheimer's Disease Neuroimaging Initiative controls. Neurobiol Aging. 2013;34:1996–2002.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Swartz RH, Stuss DT, Gao F, Black SE. Independent cognitive effects of atrophy and diffuse subcortical and thalamico-cortical cerebrovascular disease in dementia. Stroke. 2008;39:822–30.

    Article  PubMed  Google Scholar 

  11. Nestor SM, Rupsingh R, Borrie M, Smith M, Accomazzi V, Wells JL, et al. Ventricular enlargement as a possible measure of Alzheimer’s disease progression validated using the Alzheimer's disease neuroimaging initiative database. Brain. 2008;131:2443–54.

    Article  PubMed Central  PubMed  Google Scholar 

  12. Madsen SK, Gutman BA, Joshi SH, Toga AW, Jack CR, Weiner MW, et al. Mapping Dynamic Changes in Ventricular Volume onto Baseline Cortical Surfaces in Normal Aging, MCI, and Alzheimer’s Disease. Multimodal Brain Image Anal. 2013;8159:84–94. Third Int Work MBIA 2013, held in conjunction with MICCAI 2013, Nagoya, Japan, Sept 22, 2013 Proc/Li Shen, Tianming Liu, Pew-Thian Yap, Heng Huang, Dinggang Shen, Carl-Fre.

    Article  Google Scholar 

  13. Chou YY, Lepore N, Saharan P, Madsen SK, Hua X, Jack CR, et al. Ventricular maps in 804 ADNI subjects: correlations with CSF biomarkers and clinical decline. Neurobiol Aging. 2010;31:1386–400.

    Article  PubMed Central  PubMed  Google Scholar 

  14. Nestor SM, Gibson E, Gao FQ, Kiss A, Black SE. A direct morphometric comparison of five labeling protocols for multi-atlas driven automatic segmentation of the hippocampus in Alzheimer’s disease. Neuroimage. 2012;66C:50–70.

    Google Scholar 

  15. Leung KK, Bartlett JW, Barnes J, Manning EN, Ourselin S, Fox NC. Cerebral atrophy in mild cognitive impairment and Alzheimer disease: rates and acceleration. Neurology. 2013;80:648–54.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Duchesne S, Valdivia F, Mouiha A, Robitaille N. Single time point high-dimensional morphometry in Alzheimer’s disease: group statistics on longitudinally acquired data. Neurobiol Aging. 2015;36:S11–22.

    Article  PubMed  Google Scholar 

  17. Carmichael O, Schwarz C, Drucker D, Fletcher E, Harvey D, Beckett L, et al. Longitudinal changes in white matter disease and cognition in the first year of the Alzheimer disease neuroimaging initiative. Arch Neurol. 2010;67:1370–8.

    Article  PubMed Central  PubMed  Google Scholar 

  18. Drane D, Yuspeh R. Demographic characteristics and normative observations for derived-Trail Making Test indices. Neuropsychiatry Neuropsychol Behav Neurol. 2002;15:39–43.

    PubMed  Google Scholar 

  19. Ivnik RJ, Malec JF, Tangalos EG, Petersen RC, Kokmen E, Kurland LT. The Auditory-Verbal Learning Test (AVLT): Norms for ages 55 years and older. Psychol Assess. 1990;2:304–12.

    Article  Google Scholar 

  20. Tombaugh T. Normative Data Stratified by Age and Education for Two Measures of Verbal Fluency FAS and Animal Naming. Arch Clin Neuropsychol. 1999;14:167–77.

    CAS  PubMed  Google Scholar 

  21. Ivnik RJ, Malec JF, Smith GE, Tangalos EG, Petersen RC. Neuropsychological tests’ norms above age 55: COWAT, BNT, MAE token, WRAT-R reading, AMNART, STROOP, TMT, and JLO. Clin Neuropsychol. 1996;10:262–78.

    Article  Google Scholar 

  22. Shirk SD, Mitchell MB, Shaughnessy LW, Sherman JC, Locascio JJ, Weintraub S, et al. A web-based normative calculator for the uniform data set (UDS) neuropsychological test battery. Alzheimers Res Ther. 2011;3:32.

    Article  PubMed Central  PubMed  Google Scholar 

  23. Weintraub S, Salmon D, Mercaldo N, Ferris S, Graff-Radford NR, Chui H, et al. The Alzheimer’s Disease Centers’ Uniform Data Set (UDS): the neuropsychologic test battery. Alzheimer Dis Assoc Disord. 2009;23:91–101.

    Article  PubMed Central  PubMed  Google Scholar 

  24. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: Report of the NINCDS- ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's disease. Neurology. 1984;34:939–44.

    Article  CAS  PubMed  Google Scholar 

  25. Schwarz C, Fletcher E, DeCarli C, Carmichael O. Fully-automated white matter hyperintensity detection with anatomical prior knowledge and without FLAIR. Inf Process Med Imaging. 2009;21:239–51.

    Article  PubMed Central  PubMed  Google Scholar 

  26. Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (ROC) curves. CJEM. 2006;8:19–20.

    PubMed  Google Scholar 

  27. Hanley J, McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36.

    Article  CAS  PubMed  Google Scholar 

  28. Jak AJ, Bondi MW, Delano-Wood L, Wierenga C, Corey-Bloom J, Salmon DP, et al. Quantification of five neuropsychological approaches to defining mild cognitive impairment. Am J Geriatr Psychiatry. 2009;17:368–75.

    Article  PubMed Central  PubMed  Google Scholar 

  29. Clark LR, Delano-Wood L, Libon DJ, McDonald CR, Nation DA, Bangen KJ, et al. Are empirically-derived subtypes of mild cognitive impairment consistent with conventional subtypes? J Int Neuropsychol Soc. 2013;19:635–45.

    Article  PubMed Central  PubMed  Google Scholar 

  30. Bondi MW, Edmonds EC, Jak AJ, Clark LR, Delano-Wood L, McDonald CR, et al. Neuropsychological criteria for mild cognitive impairment improves diagnostic precision, biomarker associations, and progression rates. J Alzheimers Dis. 2014;42:275–89.

    PubMed Central  PubMed  Google Scholar 

  31. Edmonds EC, Delano-Wood L, Clark LR, Jak AJ, Nation DA, McDonald CR, et al. Susceptibility of the conventional criteria for mild cognitive impairment to false-positive diagnostic errors. Alzheimers Dement. 2014;11:415–24.

    Article  PubMed  Google Scholar 

  32. Yu J-T, Tan L, Hardy J. Apolipoprotein E in Alzheimer’s disease: an update. Annu Rev Neurosci. 2014;37:79–100.

    Article  CAS  PubMed  Google Scholar 

  33. Jedynak BM, Lang A, Liu B, Katz E, Zhang Y, Wyman BT, et al. A computational neurodegenerative disease progression score: method and results with the Alzheimer’s disease Neuroimaging Initiative cohort. Neuroimage. 2012;63:1478–86.

    Article  PubMed Central  PubMed  Google Scholar 

  34. Gomar JJ, Bobes-Bascaran MT, Conejero-Goldberg C, Davies P, Goldberg TE. Utility of combinations of biomarkers, cognitive markers, and risk factors to predict conversion from mild cognitive impairment to Alzheimer disease in patients in the Alzheimer’s disease neuroimaging initiative. Arch Gen Psychiatry. 2011;68:961–9.

    Article  PubMed  Google Scholar 

  35. Heister D, Brewer JB, Magda S, Blennow K, McEvoy LK. Predicting MCI outcome with clinically available MRI and CSF biomarkers. Neurology. 2011;77:1619–28.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  36. Landau SM, Harvey D, Madison CM, Reiman EM, Foster NL, Aisen PS, et al. Comparing predictors of conversion and decline in mild cognitive impairment. Neurology. 2010;20:230–8.

    Article  Google Scholar 

  37. Richard E, Schmand BA, Eikelenboom P, Van Gool WA: MRI and cerebrospinal fluid biomarkers for predicting progression to Alzheimer’s disease in patients with mild cognitive impairment: a diagnostic accuracy study. BMJ Open. 2013;3. doi:10.1136/bmjopen-2012-002541

  38. Stephan BCM, Tzourio C, Auriacombe S, Amieva H, Dufouil C, Alpérovitch A, et al. Usefulness of data from magnetic resonance imaging to improve prediction of dementia: population based cohort study. BMJ. 2015;350:1–10.

    Article  Google Scholar 

  39. Ramirez J, McNeely AA, Scott CJM, Masellis M, Black SE. White matter hyperintensity burden in elderly cohort studies. The Sunnybrook Dementia Study, Alzheimer Disease Neuroimaging Initiative, and Three-City Study. Alzheimers Dement. 2015. doi:10.1016/j.jalz.2015.06.1886.

    PubMed  Google Scholar 

  40. Gorelick PB, Scuteri A, Black SE, DeCarli C, Greenberg SM, Iadecola C, et al. Vascular contributions to cognitive impairment and dementia: a statement for healthcare professionals from the american heart association/american stroke association. Stroke. 2011;42:2672–713.

    Article  PubMed Central  PubMed  Google Scholar 

  41. Arsenault-Lapierre G, Whitehead V, Belleville S, Massoud F, Bergman H, Chertkow H. Mild cognitive impairment subcategories depend on the source of norms. J Clin Exp Neuropsychol. 2011;33:596–603.

    Article  PubMed  Google Scholar 

Download references


We gratefully acknowledge financial support from the Canadian Institutes of Health Research (#125740 & #13129), the Linda C. Campbell Foundation, and Heart & Stroke Foundation Canadian Partnership for Stroke Recovery. BLC is the recipient of a L’Oréal Canada for Women in Science Research Excellence Fellowship, JR receives partial funding from the Canadian Vascular Network, and SD is a Research Scholar from the Fonds de recherche du Québec – Santé. Additionally, we graciously thank the Sunnybrook Health Sciences Centre, Hurvitz Brain Sciences Program at the Sunnybrook Research Institute, Brill Chair Neurology, and the University of Toronto for financial and salary support (SEB). We are grateful to Dr. Larry Leach who provided valuable suggestions regarding methodology. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database ( As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at:

Author information

Authors and Affiliations



Corresponding author

Correspondence to Brandy L. Callahan.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

BLC: study conception and design, analysis and interpretation of data, manuscript drafting, and approval for publication; JR: contribution to conception and design of study, interpretation of data, critical revision of manuscript for important intellectual content, and approval for publication; CB: analysis and interpretation of data, critical revision of manuscript for important intellectual content, and approval for publication; SD: contribution to interpretation of data, critical revision of manuscript for important intellectual content, and approval for publication; SEB: contribution to interpretation of data, critical revision of manuscript for important intellectual content, and approval for publication. All authors agree to be accountable for all aspects of the work.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Callahan, B.L., Ramirez, J., Berezuk, C. et al. Predicting Alzheimer's disease development: a comparison of cognitive criteria and associated neuroimaging biomarkers. Alz Res Therapy 7, 68 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: