The Descriptor Differential Scale: applying psychophysical principles to clinical pain assessment

&NA; The Descriptor Differential Scale (DDS) applies psychophysical principles to clinical pain assessment. It contains 12 descriptor items for each pain dimension assessed. For each item, subjects indicate if their pain either is equal in magnitude to that implied by the anchoring descriptor, or how much greater or lesser on a 10‐point graphic scale. The method permits collection of multiple responses, reducing scaling error, and assesses both pain magnitude and scaling consistency. Ninety‐one patients completed the sensory intensity and unpleasantness forms of the DDS at both 1 and 2 h after surgical extraction of a lower third molar. Results show that the DDS satisfies standard psychometric criteria for reliability, objectivity and item homogeneity. The coefficients found satisfy standard psychometric criteria and improve after elimination of inconsistent profiles.


Introduction
With the notable exception of the McGill Pain Questionnaire [ll], most measures of clinical pain magnitude use a single measurement obtained from a visual-analog, numerical, or verbal scale of pain intensity. These simple scales are readily completed and easily scored, and have produced reliable data in many investigations of pain and pain control methods.
Despite their apparent utility, these scales treat pain as a simple unitary dimension, varying only in intensity. They also provide only one measurement per trial, potentially resulting in increased variability in comparison to psychophysical proce-dures that collect 20-100 observations per trial. In addition, they rely on a spatial judgment or a specific response choice. Consequently they are vulnerable to several scaling biases, including the tendency to repeatedly use the same category or part of a line (reducing sensitivity in assessment of analgesics) or remembering a past discrete response (rather than the magnitude of previous pain in a study of pain memory).
This study presents data from the Descriptor Differential Scale (DDS), which is based on verbal descriptors used extensively for experimental pain assessment. Previous studies demonstrated the reliability, obj~ti~ty and validity of verbal scales of sensory intensity (e.g., mild, moderate, intense) and unpleasantness (e.g., annoying, unpleasant, distressing) quantified by ratio-scaling procedures [4,5]. Additional experiments using pain sensations produced by electrical stimulation of the skin or teeth or by heat applied to the skin have shown that these dimensions respond differentially to various manipulations, including administration of narcotic analgesics [2,3]. information about expected sensations [12], or hypnotic suggestion [lo].
Conventional psychophysical methods present multiple stimuli of varying intensities and collect multiple responses comparing the sensations evoked by these stimuli to an implicit or explicit subjective standard. These methods cannot be applied directly to the assessment of clinical pain. since there is only a single, uncontrolled 'stimulus.' The DDS, however, performs the converse operation. It presents multiple subjective standard stimuli and collects responses comparing these to the clinical pain stimulus.
Like psychophysical procedures, it collects multiple responses that both minimize response variability and allow an evaluation of scaling consistency.
The multiple descriptors cover the entire pain range, minimizing floor and ceiling effects. These features suggest that the DDS may be less sensitive to biases associated with the use of a single discrete or visual analog response. In addition, the evaluation of scaling consistency may identify non-complying patients or those who perform poorly in a scaling task. These patients can compromise experimental studies of pain and analgesia and may provide unreliable information in a clinical evaluation.

Methods
Ninety-one dental patients ages 18-39 completed the sensory intensity and unpleasantness forms of the DDS at 1 and 2 h after surgical removal of a lower third molar under lidocaine with 1 : 100,000 epinephrine local anesthesia. The sensory intensity form is shown in Fig. 1 Each descriptor item is centered over 21 horizontal dashes with a minus sign (' -') over the left dash and plus sign (' +') over the right dash. Subjects indicate their pain in relation to each descriptor by checking a single dash in relation to the descriptor. A check on the dash under the descriptor indicates pain sensation equal to the intensity implied by the descriptor, a check on a dash to the left indicates pain intensity proportionally less than that implied by the descriptor. a check to the right indicates pain intensity proportionally more intense.
the descriptor. If it is greater than the descriptor. they check a dash to the right that indicates the degree to which their pain is greater. If their pain magnitude is less than that implied by the descriptor, they make an appropriate check to the left of the center dash. The result for each descriptor is a rating of pain magnitude on a O-20 scale that indicates amount of pain in relation to that descriptor, with 10 indicating a magnitude equal to that implied by the word. The unpleasantness form is similar, with 12 randomly placed descriptors of unpleasantness (see Table I) in place of sensory intensity descriptors.

Reliability
Responses to each DDS form were recorded for each individual descriptor item and for a total mean score averaged over all 12 items. Figs. 2 and 3 compare the total scores for all patients obtained at hours 1 and 2 for the sensory intensity and unpleasantness forms. Means and standard errors were 10.4 * 0.40, 10.3 * 0.37 for sensory intensity and 8.7 k 0.41, 9.1 + 0.38 for unpleasantness. The Pearson product-moment correlations between hours 1 and 2 were 0.82 for sensory intensity and 0.78 for unpleasantness. Inspection of the figure shows a linear relation with outliers reporting minimal pain at hour 1, or showing a decrease in pain between hours 1 and 2. This . . * . Fig. 3. Unpleasantness repeat reliability. Unpleasantness scores obtained at hour 2 (mean 9.1, S.E. 0.38) are plotted against those at hour 1 (mean 8.7, S.E. 0.41) for 91 subjects. Scores above the line of unit slope increased, and those below decreased, over the hour. Repeat reliability, shown by the correlation between the scores, is 0.78. The open circles identify individuals who scored 4 or greater on any SDC consistency measure.

UNPLEASANTNESS-HOUR 1
Eliminating these subjects increased reliability to 0.83. measure provides an indication ity.
of repeat reliabil- Table I shows Pearson product-moment correlations between hours 1 and 2 for individual items. These item reliability coefficients ranged from 0.61 to 0.83 (mean, 0.71) for the sensory intensity scale and 0.43 to 0.71 (mean, 0.59) for the unpleasantness scale. Item homogeneity can be expressed by correlating the scores on each item with the overall scale score computed from all other items [l]. These Pearson product-moment homogeneity coefficients also are shown in Table I. They range from 0.64 to 0.91 for sensory intensity and from 0.58 to 0.91 for unpleasantness.

Item reliability and homogeneity
The sensory intensity items show an 'inverted-U' function with coefficients maximal for the middle descriptors and decreasing for extreme descriptors.
This pattern is found at both hours 1 and 2 (R = 0.73). The unpleasantness descriptors do not show this The first column shows Pearson product-moment correlations between hours 1 and 2 for each individual descriptor item and for ;I total score for the whole form. The second column shows the same correlations for 'consistent profiles' (SDC scores less than 4 on the sensory intensity (n = 75) or unpleasantness (n = 65) form). Columns 3-6 show homogcnc~ty coefficients for each item. deftned as the Pearson product-moment correlation between the score on that item and an average score from the remaining items. Columns 3 and 5 show coefficients for all subjects for hours 1 and 2. columns 4 and 6 show the same correlations for 'consistent profiles' (SDC scores less than 4 on the sensory intensity (n = 75) or unpleasantness (n = 65) form). The repeat-reliability of these homogeneity coefficient< is shown by Pearson product-moment correlations between hours I and 2.  indicates a high degree of correspondence between this individual's responses and those of a similar group using the same response methodology.
It also demonstrates a unique feature of this method. In addition to scaling pain magnitude, this procedure simultaneously determines the amount of intensity or unpleasantness implied by each descriptor. This property, described in detail in an unpublished report [9], allows comparisons of the word values produced during a pain rating with (1) word values determined separately by either the same or different procedures (Fig. 5A), (2) word values from a group in the same situation (Fig. 5B), or (3) word values determined previously by that individual (Fig. 5C). These comparisons check if subjects use the words in a manner consistent with their prior usage as well as with the group usage. Fig. 5A and B compare the responses of a single illustrative individual to those of a group. This comparison results in a measure of 'to-group' consistency.
In contrast, Fig. 5C shows a measure of 'individual' consistency.
The same individual's responses, taken at hour 2, are plotted against that individual's responses observed at hour 1. The high correlation (R = 0.96) between these responses indicates that this subject was consistent with his previous response choices.
The individual-consistency plot shown in Fig.  5C provides information about both consistency and analgesia.
Analgesia is assessed by the decrease in vertical distance from the diagonal in this figure. The reduction in vertical distance indicates that for each individual item, the reported pain after an intervention (shown on the ordinate) is less than the pain reported before the intervention (shown on the abscissa). Analgesia also may be assessed simply as the reduction in total response following administration of a putative analgesic. Scaling consistency can be assessed by the correlation method described above or by other mathematical models. Kwilosz et al.
[8] described a simple linear model that assesses consistency for each subject using an index equivalent to the standard deviation of the vertical distance from the diagonal for each descriptor data point. This standard deviation consistency (SDC) index is inversely related to consistency. Consistency Each point is the response to a specific sensory intensity descriptor identified in Table I. A shows the responses of an individual at hour 2 plotted against mean resporses from the other 90 subjects. B shows responses of this ind; (dual at hour 2 against responses made at hour 1. Both plots show inconsistent responding, the points are non-linear. decreases as the standard deviation increases; a standard deviation of 0 indicates data points parallel to the diagonal, representing perfect consistency.
Using this model, analgesia is repre- sented by the mean of the differences from the diagonal, and performance consistency is represented by the standard deviation of these differences.
The ' to-group' and 'individual' standard deviation consistency measures (SDC) for Fig. 5B and C are 1.09 and 1.45, respectively. Fig. 6 shows to-group (A) and individual (B) SDC measures for another subject of 5.83 and 8.16. This subject used the descriptors in a manner inconsistent with both himself and a similar group. Fig. 7 shows 'to-group' consistency scores for the sensory intensity form observed at hour 2 plotted against those found at hour 1. Mean SDC and standard errors were 2.45 + 0.16 and 2.65 f 0.16 for hours 1 and 2 respectively. Most subjects show high consistency (low SDC) as indicated by the bunching of the scores near the origin. A few subjects performed poorly during the first, second, or both hours. The present analysis uses an SDC criterion of 4 to classify inconsistent subjects. This criterion, shown by the dashed lines, separates the data points bunched near the origin from clear outliers. Statistically, this criterion is greater than 2.5 standard deviation units from the mean togroup or individual SDCs for either the sensory intensity or unpleasantness scales. Thus these inconsistent cases represent less than 1% of the right tail of a normally distributed population. Inspection of individual response patterns of subjects with to-group SDC scores greater than 4 (see Fig.  6B) show that they could not or did not adequately perform the scaling task. Analysis of subjects with to-group SDCs less than 4 shows that SDC improved significantly (t (74) = 2.12, P < 0.05) between hours 1 and 2. Unpleasantness scores also show a similar plot (hour 1 mean and standard error = 2.92, 0.15; hour 2, 2.39, 0.12). Performance of subjects with to-group SDCs less than 4 also improved between hours 1 and 2 (t (65) = 3.32, P < 0.01). Thus for both sensory intensity and unpleasantness forms, subjects became more consistent with a group usage between the first and second assessment times.
These results suggest that these scales can identify individual cases of poor consistency. Elimination of these cases would not bias outcome, since to-group SDC was uncorrelated with pain magnitude for both the sensory (hour 1, R = 0.09, 2, R = 0.11) and unpleasantness (hour 1, R = 0.24; 2, R = 0.12) forms.

Fig. 8 shows individual
SDC computed between hours 1 and 2 plotted against the to-group SDC at hour 2 for the sensory intensity form. This plot assesses item objectivity by comparing variance within an individual to variance between an individual and a similar group. This figure shows results similar to the to-group x to-group plot shown in Fig. 7. Scores bunch at the low end of the unit diagonal and a cut-off of 4 identifies outliers who did not perform consistently either with themselves or with a group norm. The equivalence of the individual (mean 2.51) and to-group (2.45) SDC scores indicates high item objectivity, since the similarity between individually determined and group-determined word scales is the same as the similarity of individually determined word scales over time. An individual's use of the verbal items is predicted a priori equally well by that individual or by the usage of a similar group.  ual SDC) and between hour 2 and a mean response from the rest of the group at hour 2 (to-group SDC). This figure shows within-SDC plotted against between-SDC for the sensory intensity scores from all subjects. It compares how well a subject's responses at hour 2 are predicted by his or her own responses at hour 1 versus how well they are predicted by a group norm. The dashed lines identify subjects with SDCs less than 4 that are consistent with both themselves and a group norm. The remaining subjects were inconsistent with either themselves, the group norm, or both.

Subjects with an individual
or a to-group SDC of 4 or greater were classified as 'inconsistent.' Eliminating these subjects improved sensory intensity repeat reliability from 0.82 to 0.84 and unpleasantness repeat reliability from 0.78 to 0.83. These subjects are identified in Figs. 2 and 3. Many show a paradoxical decrease in pain between hours 1 and 2. Table I shows the effect of inconsistent subjects on item reliability and homogeneity.

Discussion
These results show that the psychophysically based DDS satisfies standard psychometric criteria. Pain intensity and unpleasantness were reliable over a 1 h period for both mean scores and for scores derived from individual items. The obtained whole-form and item reliability coefficients can be influenced by several factors. Because of the 1 h retest interval, patients may remember and mark their previous responses, resulting in artifactually high reliability. This possibility is unlikely considering the nature of the task and the obtained item reliabilities.
The scales are designed to reduce memory effects by presenting multiple descriptors in randomized sequence and requiring a 21-point rating scale response in relation to each descriptor.
The ability to remember previous responses is assumed to be less than in scales in which specific items are chosen. This difficulty and the range of the reliability values shown in Table I strongly suggest that the duplication of previous responses was not a significant factor in this study.
The use of repeat correlations as reliability coefficients also assumes that pain levels varied considerably between patients and varied little within patients. The known variability of postoperative pain magnitude and the limited range of postoperative dental pain in comparison to more severe pain syndromes should lower repeat correlations. Thus, the obtained reliability is likely a conservative estimate, less than that expected with a population exhibiting a broad range of chronic pain magnitudes.
Item analyses further showed that each item was highly correlated with a mean scale derived from the other items, indicating that each scale is composed of a homogenous item pool. The sensory form showed a unique pattern of item scale correlations, maximal for descriptors of intermediate pain intensity and decreasing for more extreme intensities.
This pattern suggests that the intermediate descriptors such as 'moderate' may provide a more representative measure than extreme descriptors such as 'faint' or 'extremely intense.' Alternatively, they may reflect floor and ceiling effects; the extreme dashes are checked more often for these descriptor items than for the less extreme items. Expanding the scale beyond 21 response possibilities may eliminate this effect. The unpleasantness item-scale correlations did not show the 'inverted U' pattern to the same degree as the sensory items. However, these correlations appeared to be specific to each descriptor, since the relative pattern did not vary over time.
These stable patterns suggest that certain descriptors in the DDS may provide a better measure of the dimension assessed. Like other test instruments, overall homogeneity could be improved by an iterative process in which less homogeneous items are replaced by new items in a series of validation stages.
The DDS also produces independent measures of scaling consistency, a feature found in many psychometric instruments but not systematically applied to pain assessment. The structure of the DDS permits several types of consistency measures, including a correlation method and linear or non-linear variance models. The present analysis used a linear variance model [8] that assessed performance consistency either between an individual and a group norm or within an individual over time. The equivalence of these individual and to-group scores indicates that the magnitude associated with each descriptor item is similar within the group, supporting previous experimental findings [4]. These performance scores identified invalid profiles from subjects who either did not or could not scale their pain in a consistent manner. The correlation between the hour 1 and 2 consistency scores and the performance improvement over time suggest that scaling consistency may be a skill that improves with practice.
The reliability and homogeneity coefficients were improved by the elimination of invalid profiles. This improvement suggests that the inclusion of inconsistent subjects adds little and can possibly degrade the outcome of clinical trials relying on subjective pain reports. Similar to the use of validity scales in personality research, the identification and elimination of inconsistent subjects may significantly improve the power and efficiency of clinical pain studies. The assessment of the negative influence of inconsistent subjects and the identification of factors such as intelligence, personality, social or demographic variables that predict inconsistent subjects are logical goals of further research with this instrument.
In statistical terms, the present consistency model and cut-off criterion minimize type I error, labeling subjects as inconsistent when they are not. All subjects who adequately scale their pain are included, only those who clearly do not consistently perform the scaling task are excluded. The model, as with most measures, is susceptible to type II error. It includes individuals with marginal consistency and may include individuals who produce responses internally consistent but not representative of their pain sensation. It identifies many, but not all, instances of poor performance.
The item homogeneity coefficients provide preliminary information about content validity. Additional preliminary evidence for content and construct validity is provided by another study of postoperative dental pain associated with third molar extractions.
The DDS was used to assess pre-, intra-, and post-operative dental pain before and after double-blind intravenous administration of placebo, naloxone, fentanyl or diazepam to patients also receiving local anesthesia. Although no pain should be experienced under local anesthesia, patients reported a significant increase in pain magnitude after naloxone in comparison to placebo. Analysis of DDS responses showed a significant difference in the unpleasantness, but not the sensory intensity, forms [6,7].
The present form of the DDS and consistency analysis are preliminary steps in the development of an adequate pain assessment tool. The linear model may not adequately assess consistency in cases of faint or very intense pain, when many extreme categories are checked. The format may be difficult for some patients and requires comprehension of the verbal items. The DDS may not be useful as a single measure in all clinical situations. It may form part of an ideal test battery that assesses pain in stages that increase in both resolution and difficulty. Consistency measures at each stage would determine the responses that are accepted. All individuals would use the simplest scales, and those capable of using more sophisticated scales would be given an opportunity to do so.
In summary, these data suggest that the DDS is a reliable instrument that assesses pain magnitude and scaling behavior. It identifies inconsistent responding, and eliminating inconsistent profiles 2xx improves its psychometric properties. Further studies are needed to assess the validity of the separate sensory intensity and unpleasantness scales. their sensitivity, and the influence of scaling consistency on the assessment of pain and analgesia.