Delirium became a diagnostic entity in American psychiatry only in 1980, when it was formally recognized in DSM-III.1 The history of the concept before that time and the evolution of diagnostic criteria since then have been reviewed by Tucker2 and Liptzin.3,4
DSM-IV diagnostic criteria,5 published in 1994, were based on a literature review,4 discussion of the limitations of DSM-III-R,6 and two data sets, one at the University of Pennsylvania7 and the other at Harvard University.8 DSM-IV attempted to identify the core symptoms of delirium, operationalize the diagnostic criteria, and account for the presence of dementia. However, the validity (sensitivity and specificity) of the new criteria have yet to be determined among patients with or without dementia.4
Because the validity of these criteria has implications for clinical practice and research, the primary objective of this study was to compare the sensitivity and specificity of DSM-IV diagnostic criteria for delirium with the criteria of DSM-III and ICD-109 among elderly medical inpatients with or without dementia. Because there was a major change in the definition of criterion A from DSM-III-R to DSM-IV (inattention versus clouding of consciousness), a secondary objective of this study was to examine the effect that change had on sensitivity and specificity. Another secondary objective was to compare the sensitivity and specificity of different numbers of symptoms of delirium in the same population.
This study was a secondary analysis of data collected in two concurrent studies on delirium: a randomized controlled trial of management of delirium and a nonexperimental prospective study of the prognosis of delirium that included nondelirious subjects. The study was conducted at St. Mary's Hospital, a 400-bed primary acute care university-affiliated hospital in Montreal. A study nurse was responsible for patient screening and enrollment in the two studies. Only patients aged 65 years and over who were admitted from the emergency department to the medical services were included in the studies. We excluded patients with a primary diagnosis of stroke, those admitted to the oncology unit, those admitted to the intensive care unit or the cardiac monitoring unit unless they were transferred to a medical ward within 48 hours of admission, and those who did not speak English or French.
The study nurse administered the Confusion Assessment Method (CAM)10 to subjects whose initial Short Portable Mental Status Questionnaire (SPMSQ)11 score was 3 or more or whose nursing notes indicated symptoms of delirium. The study nurse used various sources to complete the CAM—chart, family, and nursing staff—and assessed the patient at several points in time if necessary. Delirium was diagnosed if the patient met DSM-III-R criteria for delirium. Those whose initial SPMSQ score was less than 3 and those whose initial score was 3 or more but who did not meet DSM-III-R criteria for delirium were rescreened with the SPMSQ daily for the following week. The CAM was readministered if the SPMSQ score increased or there was evidence from the nursing notes of symptoms of delirium. Nondelirious subjects were selected from among the patients who were screened for delirium and were found free of this condition. To balance the distributions of age and prior cognitive impairment among patients with and without delirium, the sampling method for nondelirious patients took into account the patient's age and initial SPMSQ score. Thus, nondelirious subjects were selected from among the patients aged 70 years and over, and only a subsample of patients with SPMSQ scores of less than 3 were included. Prior to enrollment, cognitively intact patients were asked to provide informed consent, and patients with delirium or dementia were asked for their assent to participate in the study and a family member was asked to provide informed consent.
The Delirium Index (DI)12 and the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE)13 were completed within 24 hours of the diagnosis by a research assistant who had no knowledge of the patient's diagnosis. The research assistant had been trained only to rate the presence and severity of the seven symptoms on the DI at one point in time.
Other measures completed by the study nurse or research assistant included the Clinical Severity of Illness,14 the Mini-Mental State Examination (MMSE),15 the Barthel Index,16 and, for premorbid data, the Instrumental Activities of Daily Living questionnaire from the Older Americans Resources and Services Instrument (OARS).17
The SPMSQ is a widely used, observer-rated, 10-item questionnaire that evaluates orientation, memory, and concentration. Scores range from 0 (no impairment) to 10 (severe impairment). The test-retest reliability is reported to be 0.8.11 At a cut-off point of three or more errors, the instrument is reported to have a sensitivity of 0.84 and a specificity of 0.89 in identifying medical inpatients with organic brain syndromes.18
The CAM is a structured instrument that operationalizes the ten symptoms of delirium specified in DSM-III-R: acute onset, fluctuating course, inattention, disorganized thinking, altered level of consciousness, disorientation, memory impairment, perceptual disturbances, psychomotor agitation or retardation, and sleep/wake disturbance. The CAM diagnosis of delirium was previously validated against the clinical judgment of a psychiatrist and found to have a sensitivity of 97% and a specificity of 92%.10 In our study, the study nurse's diagnosis of delirium had a sensitivity of 0.89 and a specificity of 1.0 compared with a consensus diagnosis.19 In an earlier study,20 interrater agreement (kappa) for the individual symptoms of delirium ranged from 0.64 to 1.0 (n = 11). In this study, they ranged from 0.28 to 1.0 (n = 14). The earlier study included only severe and typical cases of delirium, whereas this study included a broader range of cases, which accounts for the lower kappa values.
The DI is an instrument developed by our group for the measurement of the severity of symptoms of delirium. It is based solely on observation of the individual patients, without additional information from family members, nursing staff, or the patient's medical chart. The first five questions of the MMSE constitute the minimum basis of observation. Seven of the ten symptoms assessed on the CAM (including disorders of attention, thought, consciousness, orientation, memory, perception, and psychomotor activity but excluding acute onset, fluctuation, and sleep/wake disturbance) are rated on the following impairment scale: 0 = absent, 1 = mild, 2 = moderate, 3 = severe. The total score ranges from 0 (no symptoms) to 21 (maximum severity). Patients who are unresponsive are scored at maximum severity on inattention, disorganized thinking, disorientation, and memory impairment. We established the interrater reliability, concurrent criterion validity, and sensitivity to change of the DI in 30 patients with delirium who were rated simultaneously and independently by one or two research assistants and a geriatric psychiatrist on up to four occasions. The concordance coefficient between DI ratings by psychiatrists and research assistants was 0.88, and between two research assistants it was 0.78. Pearson correlation coefficients were 0.84 between the DI and the Delirium Rating Scale (DRS) and 0.71 between change in the DI and the DRS. The interrater agreement (kappa) for the individual symptoms of delirium in this study ranged from 0.48 to 1.0 (n = 43).
The IQCODE assesses the presence of dementia prior to admission on the basis of the responses of an informant who has known the patient for at least 5 years; the score is an average of the 16 item scores, each rated from 1 (much improved) to 5 (much worse). In the original publication,13 in which a cutoff of 3.32 was used to identify patients with dementia, the sensitivity was 0.79 and the specificity was 0.82. In a later validation study of a French-language version of the questionnaire conducted in Quebec,21 a cutoff of 3.6 had a sensitivity of 0.75 and a specificity of 0.96. We used a cutoff of 3.51 to identify patients with dementia.
The Clinical Severity of Illness was assessed at baseline by the study nurse; scores range from 1 (minimal) to 9 (most severe). The MMSE measures cognitive function on a scale of 0 (poor) to 30 (excellent). The Barthel Index measures independence in personal care activities; we used a modified scoring system22 that ranges from 0 (dependent) to 100 (independent). The OARS Instrumental Activities of Daily Living questionnaire was administered to an informant to assess premorbid function (prior to current illness but not more than 1 month before admission); the scale ranges from 0 (dependent) to 16 (independent).
Patients were classified into four groups according to their delirium and dementia status: both delirium and dementia, delirium only, dementia only, and neither delirium nor dementia. Descriptive statistics of demographic and clinical characteristics and the frequency of symptoms measured by the CAM and the DI were calculated separately for each of these groups.
The accuracy of each set of diagnostic criteria was estimated by two different approaches. In the first, we assumed arbitrarily that the DSM-III-R was the criterion standard and estimated the sensitivity, specificity, positive predictive value, and negative predictive value of DSM-III, DSM-IV, and ICD-10 criteria with respect to it. We also analyzed the data for criterion A—clouding of consciousness with inattention—in three different ways: clouding of consciousness alone; clouding of consciousness and inattention; or clouding of consciousness or inattention. We excluded from the analysis the criterion requiring a putative medical cause of delirium because this criterion was met by all of our elderly medical inpatients and therefore had no discriminant value.
In the second approach, we used a latent class model23,24 to simultaneously estimate the prevalence of delirium and the sensitivity and specificity of DSM-III-R and the three options for criterion A in DSM-III, DSM-IV, and ICD-10. Latent class models have been widely used to estimate prevalence and diagnostic accuracy from non—gold standard diagnostic tests. This approach does not require DSM-III-R to be arbitrarily selected as a criterion standard and hence is more realistic, since perfect tests seldom exist in practice.
We analyzed results from pairs of tests—DSM-III-R with each of the other tests—using a latent class random effects model described in Dendukuri and Joseph25 that adjusts for the dependence between tests and uses a Bayesian approach for inference. The Bayesian approach is advantageous because it allows us to explicitly account for prior uncertainty in the accuracy of DSM-III-R in the analysis. Prior information on the sensitivity and specificity of the CAM assessment (using DSM-III-R criteria) for a similar population was available from a previous study by our group.19 The prior means (and 95% credible intervals) of the sensitivity and specificity of DSM-III-R were 0.83 (0.61, 0.97) and 0.94 (0.79, 0.99), respectively. An exact analytical solution is impossible for this model, so we used a Gibbs sampler, as described in Joseph et al.,26 in which random samples are drawn from the posterior distributions of the parameters by simulation methods. After ensuring convergence, summary statistics of the parameters can be estimated on the basis of these random samples. We ran 10,500 iterations of the Gibbs sampler, the first 500 to ensure convergence and the next 10,000 for inference.
During the study enrollment period, there were 4,085 medical admissions, of which 1,552 (38.0%) were screened for delirium. The reasons for exclusion were: admission to oncology (452), admission to intensive care or coronary care units (377), transfer to long-term care (332), language barrier (301), stroke (289), not sampled or missed (181), refused screening (164), previously enrolled in study (127), transferred or discharged (113), communication problem (105), residence outside geographic area (69), died (20), and other (3). Of the 1,552 patients screened, 187 met DSM-III-R criteria for delirium; 174 nondelirious subjects were also enrolled in the study. We excluded 19 patients with delirium and 20 nondelirious subjects because data on dementia status were missing because of failure to interview an informant, leaving 168 patients with delirium and 154 nondelirious subjects in the study sample.
At enrollment, there were significant differences between the four patient groups with respect to age, gender, living arrangements, and the measures of function and illness severity (t1). Patients with delirium only were more likely to be male and living at home with others. Patients with delirium, with or without dementia, had higher clinical severity and lower Barthel Index scores than the two groups without delirium. Patients with dementia only were more likely to be female.
The frequencies of symptoms of delirium using the CAM or the DI are presented in t2 and t3, respectively. All symptoms of delirium (except memory impairment among patients with dementia) are more frequent among patients with delirium than patients without delirium, although the frequencies and differences in frequencies between groups with or without delirium are smaller when measured by the DI.
The sensitivity and specificity of DSM-III, DSM-IV, and ICD-10 diagnostic criteria (compared with those of DSM-III-R) are similar among patients with or without dementia, although the specificity of DSM-IV is lower (66% vs. 78%) among patients with dementia when criterion A is defined as clouding of consciousness or inattention (t4). The sensitivity and specificity of DSM-III, DSM-IV, and ICD-10 criteria in assessing the whole sample are greatly affected by the interpretation of criterion A involving clouding of consciousness and inattention. When clouding of consciousness alone is a required symptom or when both clouding of consciousness and inattention are required symptoms, the sensitivity of the different sets of diagnostic criteria are low. However, when either clouding of consciousness or inattention is a required symptom, the sensitivity is markedly higher.
When the sensitivity and specificity of DSM-III, DSM-IV, and ICD-10 criteria in the whole sample are compared when criterion A is interpreted as requiring either clouding of consciousness or inattention, DSM-IV criteria are the most sensitive (100%) and ICD-10 criteria are the least sensitive (61%); DSM-III and ICD-10 criteria are the most specific (91%) and DSM-IV criteria are the least specific (71%). The relatively low sensitivity of ICD-10 can probably be explained by its requiring five rather than three (DSM-IV) or four (DSM-III) psychopathology criteria for a diagnosis of delirium. The relatively low specificity of DSM-IV can be accounted for by 44 patients diagnosed with delirium according to DSM-IV criteria but not diagnosed with delirium according to DSM-III-R criteria, because of the absence of disorganized thinking. Most of these 44 patients were hypoactive and probably had little verbal production.
The results of latent class analysis for the whole sample are presented in t5. The sensitivity and specificity of DSM-III-R remain quite high even though the test is not perfect. However, DSM-III-R has a lower sensitivity compared with DSM-III and DSM-IV, which are more inclusive. The computed sensitivity and specificity of the remaining tests were very similar to the values presented in t4. This is because most subjects with inattention alone, who had delirium according to DSM-III-R but not according to the other standards, were classified as truly having delirium according to the latent class model.
The sensitivity and specificity of the number of symptoms of delirium (as determined by the CAM or the DI), irrespective of the type of symptom, are presented in t6. The presence of seven or more symptoms on the CAM yields a sensitivity and specificity of 98% and 76%, respectively, among patients with dementia and a sensitivity and specificity of 95% and 83%, respectively, among patients without dementia. An examination of t1 suggests that among these seven or more symptoms, inattention, disorganized thinking, disorientation, memory impairment, fluctuation, and acute onset were those most often present.
The presence of five or more symptoms on the DI yields a sensitivity and specificity of 61% and 85%, respectively, among patients with dementia; three or more symptoms yield a sensitivity and specificity of 83% and 63%, respectively, among patients without dementia. An examination of t2 suggests that among these five or more symptoms, memory impairment, disorientation, and inattention were those most often present.
The primary purpose of this study was to compare the sensitivity and specificity of DSM-IV criteria for delirium with DSM-III and ICD-10 criteria. We found that DSM-IV criteria were more sensitive than those of DSM-III or ICD-10 but less specific, although the lower specificity was accounted for by the inclusion of patients who had been excluded by DSM-III-R criteria because they were hypoactive/nonverbal and did not demonstrate disorganized thinking. Thus, DSM-IV criteria seem to be the most inclusive. This finding applied to elderly medical inpatients with or without dementia.
In addition, our results suggest that criterion A, clouding of consciousness with inattention, is best interpreted as either clouding of consciousness or inattention in DSM-IV, DSM-III, and ICD-10. When the criterion is interpreted as clouding of consciousness alone or both clouding of consciousness and inattention, the sensitivity of the three sets of criteria decreases markedly.
The requirements for seven or more symptoms on the CAM among patients with or without dementia or five or more symptoms on the DI among patients with dementia are probably not useful. However, the requirement for only three or more DI symptoms among patients without dementia suggests that further simplification of the criteria for delirium may be possible in this population.
These results have implications for clinical practice and research. In clinical practice, use of DSM-IV criteria would probably minimize false negatives (particularly among patients with hypoactive symptoms) and encourage treating clinicians to look for possible causes of delirium that might include underlying medical illness or the toxic effects of medications. In research studies where it is important that the study population be as homogeneous as possible, there may be concerns that the use of DSM-IV criteria would lead to the inclusion of false positive cases; however, our examination of the 44 patients diagnosed with delirium according to DSM-IV but not DSM-III-R criteria suggests that the use of DSM-IV criteria identifies only patients with delirium but is more likely to include patients who are hypoactive, nonverbal, and do not show disorganized thinking.
This study has two strengths. First, the numbers of patients with or without delirium or dementia in the study were relatively large. Second, the symptoms of delirium were rated independently using two different valid and reliable instruments.
Several limitations are worth noting. First, the study was a secondary analysis of data collected for a randomized trial and a prognosis study. Second, our exclusion of patients admitted to the oncology unit or the intensive care unit and those with a diagnosis of stroke (related to the purposes of the two studies on which the analysis is based) may limit the generalizability of the results. Third, our criterion standard, DSM-III-R criteria for delirium, may be controversial; however, the results of the Bayesian analysis that accounted for the imperfect nature of the criterion standard were similar to those of the principal analysis. Fourth, the interrater agreement (kappa) for some symptoms on the CAM and the DI were low; nonetheless, they were similar to values reported for the Delirium Symptom Interview. Fifth, we used the IQCODE to determine dementia status. The sensitivity and specificity of this instrument are high, but it has not been validated with patients presenting with delirium; nevertheless, the reported demographic and clinical characteristics were consistent with the classification into the four groups. Finally, the prevalence of delirium in this population was 52% (168 of 322 patients). Although this high prevalence is unlikely to have affected sensitivity or specificity, it could be expected to raise the positive predictive value and lower the negative predictive value.
In conclusion, among elderly medical inpatients, DSM-IV criteria (when criterion A is interpreted as either clouding of consciousness or inattention) seem to be the most inclusive set of criteria yet proposed. Moreover, these criteria seem to be equally useful whether or not patients have dementia.