Imperfect Invariance to Object Translation in the Discrimination of Complex Shapes

The positional specificity of short-term visual memory for a variety of 3-D shapes was investigated in a series of ‘same’/‘different’ discrimination experiments, with computer-rendered stimuli displayed either at the same or at different locations in the visual field. For animal-like shapes, we found complete translation invariance, regardless of the interstimulus similarity, and irrespective of direction and size of the displacement (experiments 1 and 2). Invariance to translation was obtained also with animal-like stimuli that had been ‘scrambled’ by randomizing the relative locations of their parts (experiment 3). The invariance broke down when the stimuli were made to differ in their composition, but not in the shapes of the corresponding parts (experiments 4 and 5). We interpret this pattern of findings in the context of several current theories of recognition, focusing in particular on the issue of the representation of object structure.

1 Introduction Any visual system that aims to attain object constancy must address Ho« ffding's problem: how to treat equivalently two images in which the same object appears in different locations in the field of view (Neisser 1967). (1) The common phrasing of this problem suggests that the human visual system is expected to achieve constancy or invariance in the face of translation as a matter of routine (even when normalization of the stimulus location through eye movements is discounted). In practice, the issues surrounding translation invariance turn out to be more complicated. In the present study we explore the conditions under which translation invariance in`same'/`different' shape discrimination starts to break down.

Prior results
Our work has been motivated by reports of non-invariant processing of shapes, which qualify the notion of constancy under translation. Early evidence of an effect of translation on shape perception was provided by Wallach and Austin-Adams (1954). In that study, briefly shown 2-D shapes primed the subjects' perception of ambiguous figures displayed at the same location as the prime, but not the perception of figures shown at an analogous location in a different quadrant. A similar confinement of the facilitation effect to a quadrant has been found in the subliminal priming study of Bar and Biederman (1998).
Data from a number of other experiments, mostly involving 2-D patterns, also indicate that the recognition of novel complex stimuli is not completely invariant under translation (Foster and Kahn 1985;Nazir and O'Regan 1990;Fahle 1997a, 1997b). If, for example, subjects have to determine whether two sequentially flashed random-dot clouds are same or different, decisions are faster and more frequently correct when both stimuli are presented at the same rather than at different locations in the visual field (Foster and Kahn 1985;Cave et al 1994;Dill and Fahle 1997a). This displacement effect has been shown to be gradual (ie larger displacements produce poorer performance), and to be specific for`same' trials. Control experiments ruled out explanations in terms of afterimages, eye movements, and shifts of spatial attention (Dill and Fahle 1997a). Similarly, Larsen and Bundesen (1998) found that the d H measure of performance decreased monotonically with spatial separation, but only if the patterns differed both by a translation and a rotation.
While same^different matching involves only short-term memory in the range of a few seconds, Nazir and O'Regan (1990) also found positional specificity in learning experiments that lasted at least several minutes. They trained subjects to discriminate a complex dot pattern from a number of distractors. Training was restricted to a single location in the parafoveal field of view. Having reached a criterion of 95% correct responses, subjects were tested at three different locations: the training position, the center of the fovea, and the symmetric location in the opposite visual hemisphere. Discrimination accuracy dropped significantly for the two transfer locations, while at the control location the learned discrimination was not different from the training criterion. Dill and Fahle (1997b) have isolated two components of the learning effect in this case. Immediately after the first few trials, subjects recognize patterns at a level clearly above chance. From this rapidly reached level, performance increases in a much slower learning process, until the accuracy criterion is reached. This learning process can last up to several hundred trials. Dill and Fahle showed that accuracy at transfer locations is at about the same level as the performance at the beginning of the slower learning process. This suggests that the fast component öimmediate recognitionöis translation-invariant, while the slower process involving perceptual learning is much more specific to the location of the training.

The role of novelty
The basic requirement imposed on the stimuli in psychophysical studies of invariance is novelty: if the stimuli are familiar, the subjects are likely to have been exposed to their transformed versions prior to the experiments. Because of this constraint, stimuli both in same^different matching and in learning studies tend to be somewhat unnatural and difficult to process. With more familiar patterns, the performance may well be insensitive to retinal translation. Biederman and Cooper (1991) tested subjects with line drawings of familiar objects and asked them to name the object. Repeated presentation reduced the naming latency in a manner independent of the relative location in the visual field of the priming and the test presentations. Part of the priming effect, however, may have been non-visual: Biederman and Cooper found a reduction of the naming latency also if a different instance of the same object class was presented (eg a flying bird instead of a perched one). As pointed out by Jolicoeur and Humphrey (1998), the visual part of the priming effect may be too small to detect an influence of position, size, or other transformations.
Stimulus novelty as such does not automatically lead to a breakdown of translation invariance. Novel 3-D shapes of the`paper-clip' variety, for which significant deterioration of recognition with rotation in depth had been previously reported Edelman and Bu« lthoff 1992), yielded no effects of position (Bricolo andBu« lthoff 1992, 1993;Bricolo 1996). Interestingly, reducing the paper-clips to 3-D`clouds' of points by omitting the limbs that connect the vertices brought back the translation effect, in line with the results of Nazir and O'Regan (1990) and Dill and Fahle (1997b).
In view of all these findings, which range from complete invariance to pronounced effects of translation, much additional work is needed to characterize the conditions under which invariance is to be expected, and to understand the processes that support it. We set out, therefore, to investigate the issue of translation invariance (or the lack thereof ), using as stimuli tightly controlled animal-like shapes (cf figure 1) that were previously found effective in studying the influence of rotation in depth on object recognition (Edelman 1995a). Here, we report experimental results indicating (i) that translation invariance in the discrimination of complex 3-D objects can be imperfect, and (ii) that the imperfection manifests itself when structure, rather than local information, has to be discriminated.

Experiment 1: Discrimination of animal-like objects
In the first experiment we tested positional specificity of`same'/`different' discrimination among six animal-like shapes (shown in the left column in figure 1). An earlier study with this class of objects had yielded highly significant effects of orientation in depth on discrimination rate (Edelman 1995a). Our goal was to determine whether translation has a similar effect on performance.
2.1 Methods 2.1.1 Subjects. Ten observers participated in experiment 1. Except for the first author, they were undergraduate or graduate students from the Massachusetts Institute of Technology, who either volunteered or were payed for their participation in 1 h sessions. All observers had normal or corrected-to-normal vision. At the beginning of a session, observers were shown examples of the animal stimuli and were informed about the design of the experiment (type and locations of stimuli, presentation sequence, and task). They were instructed to keep steady fixation throughout each trial. All subjects were explicitly told that their decisions on pattern identity in each trial should be independent of the stimulus position and should rely only on the identity of the animal.
2.1.2 Apparatus and stimuli. The stimuli were generated and displayed on a Silicon Graphics Indigo workstation (19-inch color monitor; refresh rate 120 Hz). The display was viewed binocularly at a distance of 60 cm. Each animal-like stimulus shape was defined by a set of 56 parameters representing characteristics such as length, diameter, Figure 1. Three levels of similarity (columns) for the six animal-like computer-generated objects. The left column shows the original animals. The similarity between different animals, ie within one column, is increasingly larger in the middle and right columns. The shapes in the top and the bottom rows in this figure are the original objects used in the study of Edelman (1995a); the others are parametric variations, courtesy of T Sugihara, Riken Institute. and orientation of individual limbs (Edelman 1995a). Six animal classes were used throughout the experiments; see figure 1. Stimulus images were about 3 deg wide and 2 deg high, and could appear at four locations in the upper left, lower left, upper right, or lower right quadrant (always at an eccentricity of about 4 deg). The objects in the images were always tilted and slanted in depth at 458 relative to the observer. The surface color of the animal objects was yellow, the background was dark gray and covered the entire computer screen. The stimuli were presented for 100 ms, a time too short to foveate the stimulus by a rapid saccade (Saslow 1967). To avoid afterimages due to delayed phosphor decay, a stimulus presentation was always immediately followed by four masks. These were composed of 20 random cylinders each, and were presented simultaneously at the four possible stimulus locations for 300 ms. Fixation was aided by a yellow spot of about 0.1 deg diameter in the middle of the screen. Decisions were communicated by pressing the left (for`same') or the right (for`different') mouse button. A computer beep provided negative feedback immediately after incorrect responses.

Experimental design.
At the beginning of a trial, the fixation spot appeared for 1 s, followed by the brief display of the first animal stimulus at one location, and the random-cylinder masks displayed at all four locations. After the second presentation of the fixation spot (1 s), the second animal either appeared at the same location (control) or at one of the other three positions corresponding to lateral, vertical, and diagonal transfer (figure 2). Lateral and vertical transfer corresponded to displacements of about 5.5 deg, while the diagonal displacement was 8 deg. The onset asynchrony of the two animal stimuli was 1.4 s. This long interval and the employment of masks after the first and second stimuli abolish the effects of apparent motion and iconic afterimages (Phillips 1974). For each`same' trial, the computer randomly chose one of the six animals; in`different' trials, two different animals were randomly selected. Successive trials were separated by a 1 s interval.
Experiment 1 comprised 288 trials, (2) divided into three blocks. Observers initiated a block by pressing a mouse button. Trials in each block were balanced for identity (same versus different), quadrant in the visual field, and four displacement conditions (control, and lateral, vertical, or diagonal transfer), which were presented in a randomized order.  The second stimulus may appear in the same location (control condition), or it may be translated to generate the lateral, vertical, and diagonal conditions.

Results
For each of the subjects in this and all the following experiments, percentages of correct responses and mean response times (RTs) were calculated separately for each of the four displacement (control, lateral, vertical, diagonal) and two identity (same, different) conditions. Trials with RTs longer than 3 s (0.42%) were discarded prior to any further analysis. The mean RT was 488 ms; the correct response rates ranged from 77.0% to 94.5% (mean 85.0%). Figure 3 shows the correct response rates, averaged across the ten observers. For same' trials, a 6.9% difference was observed between the control condition, ie when both animals were presented at the same location, and the mean of the three transfer conditions (89.4% compared to 82.5%). For`different' trials, all conditions yielded approximately the same performance. These qualitative observations were confirmed by a two-way analysis of variance (ANOVA), testing the influence of translation (control, lateral, vertical, diagonal) and identity (same, different) variables. Neither of the main effects was significant, but there was a marginal interaction (F 3 72 2X17, p 5 0X1), reflecting differential effects of transfer in`same' versus`different' trials. A posteriori effects (3) estimated separately for`same' trials turned out to be significant for the following contrasts: control versus others (F 4X1, p 5 0X05), control versus diagonal (F 4X5, p 5 0X04); the contrast for control versus lateral was marginal (F 2X8, p 5 0X10).
In this and the other experiments described below, we also computed, for each level of translation, a bias-free measure of performance derived from signal detection theory (Green and Swets 1966): d H z(H) À z(F). (4) Here, H is the`hit' rate (in the context of the present task, the proportion of correct responses in identity `same' trials), F is the`false alarm' rate (proportion of errors in identity `different' trials), and z is the inverse normal cumulative probability function (z -score). The mean d H in experiment 1 was 2.2, with no discernible effects of translation. Finally, there were no significant RT effects, and no indication of a speed^accuracy tradeoff.

Discussion
In experiment 1, a weak effect of translation could be discerned only when the analysis was restricted to`same' trials. The overall influence of translation was not significant, as indicated by the lack of effect on d H . It should be noted that in many previous studies   of invariance, analysis has been restricted to`same' trials. Following a variety of arguments (eg that a`different' trial does not uniquely correspond to a particular kind of`same' trial, or that recognition can only be investigated for matches, but not for non-matches),`different' trials were either discarded completely or only mentioned in footnotes or appendices. Given the complex nature of decision processes in samed ifferent experiments, such omission of`different' trials may lead to an overestimation of the effects, and may result in a wrong interpretation of the available data.
We chose to concentrate on two possible reasons for the difference between our results and the published findings of incomplete translation invariance for dot cloud and checkerboard stimuli. First, the task in experiment 1 may have been too easy. Dill and Fahle (1997a) report that increasing the similarity between stimuli leads to more pronounced positional specificity. Likewise, Edelman (1995b) found that detrimental effects of changes in orientation are larger for similar than for more distinct objects. The animal shapes may have been too easy to discriminate to allow for any significant effect of translation. Second, although the shapes generated by our computer graphics program may not look entirely natural, the class of real animal objects which inspired them is familiar to human observers. It is likely that our subjects had had prior exposure to thousands of animal images, and they may have seen these images at many different locations in the visual field. Any positional specificity that may be observed with novel stimuli could long be lost for a familiar object class, owing to this pre-experimental learning process. In the remaining experiments, we examine each of these possibilities in turn.

Experiment 2: Similarity and invariance
In experiment 2 we investigated the influence of similarity among animal-like shapes on translation invariance. As pointed out above, evidence from translation studies with dot clouds indicates that a higher degree of stimulus similarity can lead to a stronger effect of stimulus displacement. Because Edelman (1995a) also found an interaction between similarity and invariance for animal-like shapes, we expected to detect a more pronounced positional specificity following an increase in the similarity of the six animals. To test this idea, we created three sets of six animals by interpolating between the original six animals and a`mean' shape that was computed by averaging each of the 56 model parameters across the six class prototypes. We tested each subject with all three levels of similarity (corresponding to the three columns in figure 1). To neutralize serial presentation effects, half of the subjects started with the easiest discrimination task, and then proceeded to the intermediate and the most difficult tests. The remaining observers were tested with the difficult (highly similar) stimulus set first.
3.1 Method 3.1.1 Stimuli. The same apparatus and stimulus conditions as in experiment 1 were used. To control the level of similarity, we varied the parametric difference between the six animals. For that purpose, the mean 56-parameter vector was computed by averaging the six animal vectors. The experimental objects were then made by interpolating between each of the six original parameter vectors and the mean-animal vector. Under this scheme, the smaller the distance between the interpolated objects and the mean animal, the higher the similarity between the interpolated shapes. We varied this distance by multiplying the parametric difference between the mean and the original vectors by a constant (dis)similarity factor f. Three different factor values were used for the experiment: f 1 (corresponding to the original animals), f 0X7, and f 0X4 (note that f 0 would have produced six interpolated animals identical to the mean).
3.1.2 Experimental design. Each subject was tested in three partial experiments, each with stimuli of a single similarity level only. Half of the sixteen subjects started with the original animals (low similarity), followed by medium and high similarity levels, while the remaining subjects were tested in the opposite order. Each part of the experiment consisted of 192 trials, separated into two blocks, and lasted about 15 min. Between successive parts, the subjects were offered a short break. Individual trials followed exactly the same design as in experiment 1. Except for the first author, none of the subjects in this experiment had participated in experiment 1.

Results
Trials with RTs longer than 3 s (1.2%) were discarded prior to any further analysis. The mean RT was 548 ms; the correct response rates ranged from 61.1% to 91.5% (mean 78.6%).
The mean accuracy results are shown in figure 4, which suggests a decrement in the mean correct rate and a strengthening of the effects of translation with increased similarity. A three-way ANOVA (translation6identity6similarity) indicated that similarity of the animals strongly affected performance (F 2 360 52X8, p 5 0X001). Not surprisingly, performance was the best when animal shapes were the least similar to each other (dissimilarity factor f 1X0). As in experiment 1, translation and identity had no significant main effects (F 5 1), but interacted strongly with each other (F 3 360 15X1, p 5 0X001). Other interactions were not significant.
Separate two-way ANOVAs (translation6similarity by identity) revealed, for identity `different', a significant effect of translation only in the high-similarity condition (F 3 60 3X4, p 5 0X02). For identity `same', the effect of translation was significant at all three similarity levels (for f 0X4, or high similarity: F 3 60 4X9, p 5 0X004; for f 0X7: F 3 60 5X3, p 5 0X0026; for f 1: F 3 60 4X5, p 5 0X0065). These results were confirmed by a posteriori contrast analysis.
Although translation had strong effects on correct rate in this experiment, the effects were opposite for`same' and for`different' trials. This conclusion is consistent with the observed pattern of d H values. A general linear models analysis (used instead of regular ANOVA because of the presence of unbalanced cells in the d H data) revealed only one significant effect, that of similarity (F 2 163 36X0, p 5 0X0001). The mean d H values were 1.1 for f 0X4, 1.9 for f 0X7, and 2.2 for f 1X0. Similarity was also the only significant effect for RT (F 2 360 3X9, p 5 0X0215).
As noted above, we had separated our pool of subjects into two, to control for possible serial adaptation effects: eight observers proceeded from easy to difficult tasks, and the other eight were tested in the opposite order. The effects described above for the complete data set were identical for the two subgroups.

Discussion
Even more than in experiment 1, the effect of translation in`same' trials in experiment 2 was offset by nearly opposite effect in`different' trials. Increasing the similarity among the stimuli made both these effects. This result is different from the observations made by Dill and Fahle (1997a), who found that positional specificity increased with a decrease in the discriminability of random-dot clouds and checkerboard patterns. In this sense, the recognition of novel, complex patterns seems to be qualitatively different from the recognition of more familiar objects such as our animal-like stimuli, regardless of the similarity of the latter to each other.
4 Experiment 3:`Scrambled' animals, local feature cues One major difference between our first two experiments and the earlier studies with complex random patterns (Foster and Kahn 1985;Dill and Fahle 1997a) was the general prior familiarity of the subjects with animal-like shapes. Although our computer-rendered shapes were not naturalistic copies of real animals, subjects readily named the animals when being introduced to the experiment and the stimuli. Experiment 3 was designed to test whether the familiarity of the objects, ie their resemblance to already experienced real or toy animals, leads to robust translation invariance that is not observed for novel patterns. To reduce familiarity and still be able to compare results directly with the above two experiments, we rendered the six animals as set of`limbs', while randomizing the location of limbs relative to each other. This produced`scrambled' animals that contained the same`local' features (limbs) as the original ones, but did not form a meaningful object (see figure 5). Additionally, since the configuration of the limbs could be changed from trial to trial, repetition of the stimuli and the possible concomitant learning effects were avoided.

Method
The same apparatus and stimulus conditions as in experiment 1 were used. Scrambled animals were designed from the same set of limbs as the animal models in experiment 1. Instead of composing the seven parts (head, body, two forelegs, two hind legs, tail) into complete 3-D animal objects, each one was translated by small random amounts in three mutually orthogonal directions. In different trials, the second scrambled animal differed from the first one parametrically, in the shapes of its parts. The random scrambling, however, was the same for both animals: homologous parts (eg the heads) were shifted by the same amount in both stimuli. Thus, the subjects could base their discrimination decision on local shape cues. For each trial, the displacement of part types was newly randomized. The design of the individual trials, presentation times, masking, etc were exactly as in experiment 1. Eight subjects were tested in three blocks of 96 trials each. All but two subjects had not participated in experiments 1 or 2.

Results and discussion
Trials with RTs longer than 3 s (0.47%) were discarded prior to any further analysis. The mean RT was 485 ms; the correct response rates ranged from 62.8% to 80.6% (mean 71.4%).
A two-way ANOVA (translation6identity) revealed, as before, a strong interaction between these two variables (F 3 56 8X3, p 5 0X0001), and a main effect of identity (F 1 56 20X6, p 5 0X0001); the main effect of translation was not significant. As can be seen in figure 6, this stemmed from opposing tendencies at the translation control level, where in`different' trials the performance was at chance, whereas in`same' trials it was as high as 86% (in the three translation conditions, correct rates were flat and nearly the same in`same' and`different' trials). The difficulty of the task involving scrambled animal shapes is indicated by the low mean d H value of 1.2 (diagonal: 1.1; control: 1.2; horizontal: 1.3; vertical: 1.4). The effect of translation on d H was not significant. The RT data also showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.
The results of experiment 3 suggest that the meaningful shape of the animal-like stimuli was unlikely to have been responsible for the pattern of performance found in experiments 1 and 2 (namely, a decrease of correct rate with translation only in`same' trials, and no effect on d H ). The scrambling of the animal-like shapes did not, by itself, result in positional specificity. In the next section, we introduce a simple variation on the scrambling method that does lead to a significant overall effect of translation.

Experiment 4:`Scrambled' animals, global configuration cues
Both the identities of local features of an object and their spatial relations can help discriminate it from other objects. In experiment 3, the spatial relations among the parts were identical for the two stimuli in any given trial; the pair only differed in the shapes of the parts employed. In experiment 4, we created a complementary situation: both scrambled animals in a given trial were now composed of identically shaped parts, and differed only in their spatial arrangement. If, for example, the first stimulus was a particular scrambled monkey, then the second stimulus was a differently scrambled monkey (cf the rows in figure 5). In comparison, in experiment 3, the second object would have been, for example, a scrambled dog or mouse (cf the columns in figure 5). Both experiments, therefore, employed the same type of scrambled objects, but separated the effects of local features (part shapes) from those of their global layout (part configuration).

Method
Experiment 4 involved exactly the same experimental procedure as experiment 3, including the same kind of scrambled animals. However, unlike in experiment 3, the stimuli in each trial always consisted of the same set of parts, scrambled in two different manners. Experiment 4 has been carried out at two separate locations, with two distinct subject populations, as detailed below. Experiment 4a, conducted at the Massachusetts Institute of Technology, involved nine subjects, three of whom were new to this experiment series; the remaining six had already participated in this study. Experiment 4b, conducted at Cornell University, (5) involved thirteen subjects, all undergraduates enrolled in a summer course. The data from these two experiments are presented separately in the appendix; an analysis of the pooled data appears next.

Results
Data from the sixteen subjects whose correct rate exceeded 55% were included in this analysis. Trials with RTs longer than 3 s (1.0%) were omitted from further analysis. The mean RT was 494 ms; the correct response rates for the seven subjects ranged from 58.0% to 81.5% (mean 69.9%).
A two-way ANOVA (translation6identity) yielded a marginal main effect of translation (correct rate dropping from 73.7% for control to 66.7% for diagonal; F 3 120 2X5, p 5 0X06); a main effect of identity (correct rate of 67.7% for`different' versus 72.1% for`same'; F 1 120 5X6, p 5 0X02), and the by now familiar very strong interaction (F 3 120 21X8, p 5 0X0001). Figure 7 shows a pattern similar to that of experiments 4a and 4b (cf figure A1 in the appendix): in`same' trials correct rate in the control condition was better than in the three translation conditions, with a relatively uniform performance across the four conditions in`different' trials.
The mean d H in experiment 4 was 1.2. The mean for the diagonal condition was 0.9, for horizontal 1.0, for vertical 1.1, and for control 1.5. Importantly, the effect of translation on d H estimated by ANOVA was significant (F 3 60 4X8, p 5 0X005). A posteriori contrasts between control and the other conditions, which were all significant, confirmed this outcome. The RT data showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.

Discussion
The seemingly slight modification of the task between experiments 3 and 4öfrom discrimination by local features to discrimination by their spatial configurationö produced a considerable difference in the results. In experiment 4, the occurrence of a particular part was not diagnostic for discrimination, unless via a chance occlusion. Unlike the discrimination by local features in experiment 3, the performance based on configurational (structural) cues was not completely invariant to translation.
It is tempting to attribute this distinction to two different subsystems (or stages) of object vision: one that is translation-invariant and allows recognition of local features and one that is at least partially position-specific and is responsible for the processing of configurational or structural information. Note that achieving translation-invariant recognition of a particular stimulus feature implies downplaying its position in the visual field. To be able to discriminate objects solely on the basis of the spatial relations of some simpler features, the system may have to rely on evidence from mechanisms that are not fully shift-invariant. We shall return to this point in the general discussion.

Experiment 5: Chimerae
To explore further the distinction between local and global or structural shape representation, in experiment 5 we used a new class of chimeric objects by randomly combining parts from different animals (see figure 8). Aside from random similarity to`regular' animals, these chimerae were difficult to categorize into familiar animal classes. Another major difference with respect to experiments 1 and 2 was that new chimerae could be created for each new trial, thereby avoiding the development of a classification scheme by the subject. Note that the identification of a particular feature (eg the head) in two chimerae does not necessarily indicate that they are identical, because all the other features may still be different. Subjects, therefore, were forced to attend to the entire configuration of each chimera.  6.1 Method Experiment 5 followed the same basic design as that of experiment 1. The single difference between the two experiments was that while in experiment 1 only the original set of six animals was used, random mixtures of the original models were composed for experiment 5. Each chimera was produced by randomly choosing four components (head, body and tail, forelegs, hind legs) each from one of the six animals. For example, a stimulus could consist of the head of the tiger, body and tail of the monkey, forelegs of the mouse, and hind legs of the horse. In each trial, new components were chosen at random. In`different' trials, both chimerae were randomly different. Experiment 5, like the previous one, has been carried out at two separate locations, with two distinct subject populations. Experiment 5a, conducted at the Massachusetts Institute of Technology, involved eight subjects. Experiment 5b, conducted at Cornell University, involved fourteen undergraduates enrolled in a summer course. As before, the data from these two experiments are presented separately in the appendix. An analysis of the pooled data from the twenty-two subjects appears next.

Results
Trials with RTs longer than 3 s (1.8%) were discarded prior to further analysis. The mean RT was 486 ms; the correct response rates ranged from 64.1% to 87.8% (mean 76.7%).
A two-way ANOVA (translation6identity) showed only the interaction as significant (F 3 168 7X4, p 5 0X0001). Figure 9 reveals a decrease in correct rate with translation iǹ same' trials (and a slight increase in`different' trials). As before, we conducted separate ANOVAs by identity; for`different' trials, the effect of translation was not significant ( p 0X3); in comparison, an ANOVA for identity `same' resulted in a highly significant effect of translation (F 3 84 7X5, p 5 0X0002).
The mean d H in experiment 5 was 1.6. The mean for the diagonal condition was 1.5, for horizontal 1.6, for vertical 1.5, and for control 1.9. The effect of translation on d H was marginal (F 3 84 2X2, p 5 0X09); a posteriori contrasts also showed differences between control and the other conditions (at p 5 0X06 or better). The RT data showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.

General discussion
In the present study we examined the degree of translation invariance in`same'/`different' discrimination of complex 3-D objects. Our major findings can be summarized as follows. First, translation invariance is more likely to hold in trials where the correct response is`different', compared to`same' trials. In other words, it is more difficult to label two objects as same when they are spatially offset; labeling objects as different depends less on the spatial displacement. Second, translation invariance holds for 3-D stimuli (both familiar and unfamiliar) that can be discriminated on the basis of local shape information, but not for stimuli whose only distinguishing cues are configurational or structural. Rather than confirming or refuting in toto earlier reports of translation invariance (Biederman and Cooper 1991;Bricolo and Bu« lthoff 1992;Bricolo 1996) or of position specificity (Foster and Kahn 1985;Nazir and O'Regan 1990;Cave et al 1994;Fahle 1997a, 1997b;Larsen and Bundesen 1998), our results provide a certain insight into the mechanism that underlies the comparison of objects related through translation. Apparently, this mechanism can produce behavior that is more invariant or less so, depending on the conditions. The pattern of results of experiments 1 through 5 suggests that this mechanism treats local cues and configurational or structural information differentially.
It is difficult to reconcile this conclusion either with the structural theories of object representation, which predict complete invariance to translation (Biederman 1987), or with the holistic appearance-based theories, which predict imperfect invariance across the board (Edelman 1995b). There does exist, however, a computational approach to object representation that embodies a distinction between local features and structural information analogous to the one that emerges from our data.
In computer vision, this approach takes the form of appearance-based methods modified to treat structure explicitly. For example, Burl et al (1998) combine`local photometry' (features that are basically templates for small snippets of images) with global geometry' (a probabilistic quantification of spatial relations between pairs or triplets of features). Likewise, Camps et al (1998) represent objects in terms of appearance-based`parts' (6) and their approximate relations. In both these methods, recognition and categorization are based on an interplay of local shape cues and approximate location information. Such hybrid methods constitute an attractive alternative to holistic appearance-based models, if only because they may eventually meet the systematicity (7) challenge for shape representation, without opting for the problematic structural approach (Edelman 1999;Edelman and Intrator 2000).
In the present context, a separate treatment of local and structural cues may explain why the former, but not the latter, support translation-invariant processing (cf our experiments 1 through 3 on the one hand, and experiments 4 and 5 on the other hand). Suppose that`units' selectively responding to local cues (according to the models mentioned above, such cues can be as simple as random snippets of images) are replicated throughout the visual field, and, moreover, that the receptive field of each such unit is spatially localized (rather than extending over all of the central visual field). The response of an ensemble of such units would signal`where' in addition tò what' the stimulus components are, and would, therefore, carry information sufficient for recognition and categorization, as well as for other tasks that may require explicit representation of shape structure (Edelman and Intrator 2000). Importantly, in such a (6) Actually, projections of image fragments onto the principal components of stacks of such fragments. (7) The problem of systematicity (Fodor 1998) refers to the need to represent and manipulate structure explicitly, so that making sense of an object composed of, say, a cube positioned above a sphere would entail an equally successful processing of a sphere above a cube (Hummel 2000). system structure would be represented in a distributed fashion; unlike local features (presumably replicated all over the visual field), it would not, therefore, be amenable to translation-invariant primingöprecisely what we found in the present study.
Encouragingly, neuronal mechanisms corresponding functionally to the shape-tuned units' have been described by a number of groups (eg Fujita et al 1992;Logothetis et al 1995); see the reviews in Logothetis and Sheinberg 1996;Rolls 1996;and Tanaka 1996). In line with the human psychophysics, most of the shape-tuned cells in the monkey respond selectively to some particular views of an object, and nearly equally to a range of stimulus sizes and locations (Tovee et al 1994;Ito et al 1995;Logothetis et al 1995). Finally, the`what where' cells needed specifically for implementing the structure representation scheme outlined above (Edelman and Intrator 2000) have also been found, in cortical areas V4 and posterior IT (Kobatake and Tanaka 1994;Ito et al 1995), and in the prefrontal cortex (Rao et al 1997;Rainer et al 1998). A practical test of these intriguing parallels between psychophysics, neurobiology, and computational considerations should be possible, once the`what where' scheme is implemented as a working model. a posteriori contrast between control and diagonal conditions was significant at a p 5 0X05 level. The RT data showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.

Experiment 5a
Eight observers participated in experiment 5a, which consisted of three blocks of 96 trials each. Experiments 4a and 5a were run with the same subjects on a single day, separated by a 5^10 min break. Four of the subjects were tested with scrambled animals first, and another four started with the chimerae (one of the subjects participated only in experiment 4). No difference between the two groups could be detected in the results.
The single trial with RT longer than 3 s was discarded prior to further analysis. The mean RT was 396 ms; the correct response rates ranged from 67.7% to 87.8% (mean 78.8%).
A two-way ANOVA (translation6identity) revealed a significant main effect of identity (F 1 56 12X9, p 5 0X0007), no main effect of translation, and a significant interaction (F 3 56 4X1, p 5 0X01). Figure A2a shows that, as in experiment 4, in`same' trials correct rate in the`control' condition was better than in the three translation conditions; in`different' trials the performance was quite uniform across the four conditions. In a separate ANOVA for identity different, the effect of translation did not reach significance (F 5 1); in comparison, an ANOVA for identity same resulted in a strong effect of translation (F 3 28 4X9, p 5 0X008).
The mean d H in experiment 5a was 1.8. The mean for the diagonal condition was 1.6, for horizontal 1.8, for vertical 1.6, and for control 2.1. The effect of translation on d H was not significant ( p 0X3), but a posteriori contrasts did show a marginal difference between control and the other conditions ( p 5 0X09). The RT data showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.

Experiment 5b
Fourteen observers participated in experiment 5a, which consisted of three blocks of 96 trials each. Trials with RTs longer than 3 s (2.7%) were discarded prior to further analysis. The mean RT was 537 ms; the correct response rates ranged from 64.1% to 86.7% (mean 75.6%).
A two-way ANOVA (translation6identity) showed only the interaction as significant (F 3 104 4X0, p 5 0X01). Figure A2b reveals the same pattern as in experiment 5a, albeit with larger standard errors. As before, we conducted separate ANOVAs by identity; for`different' trials, the effect of translation was not significant ( F 5 1); in comparison, an ANOVA for identity same resulted in a significant effect of translation (F 3 52 3X4, p 5 0X02).
The mean d H in experiment 5b was 1.5. The mean for the diagonal condition was 1.4, for horizontal 1.4, for vertical 1.5, and for control 1.8. The effect of translation on d H was not significant ( p 0X3), but a posteriori contrasts did show a marginal difference between control and the other conditions ( p 5 0X07), as in experiment 5a. The RT data showed no significant effects. As in the other experiments, there was no indication of a speed^accuracy tradeoff.
On pooling the data A visual comparison between the results of experiments 4a and 4b (figure A1) and between those of 5a and 5b (figure A2) reveals a qualitative similarity between the performance patterns of the different groups of subjects involved in the original and repeated experiments. Specifically, while the performance in`different' trials (dashed lines) showed little dependence on translation, the percentage of correct responses iǹ same' trials (solid lines) decreased with translation. This visual impression is supported by quantitative analyses: the ANOVA produced the same significant identity6translation interaction in experiment 4a as in 4b, and again in 5a as in 5b. Thus, the pooling of the data that we reported in sections 5 and 6 is perfectly reasonable.