Published in Gestalt Theory 21 (2), pp 122-139, 1999.
This paper was awarded the Wolfgang Metzger Award for significant contribution to Gestalt theory.
For further developments of these ideas see
Gestalt Isomorphism and the Primacy of the Subjective Conscious Experience: A Gestalt Bubble Model,
The World In Your Head: A Gestalt view of the mechanism of conscious experience.
According to the naive realist view, the world we see around us is identfied as the objective external world, even though the limitations of our senses and the properties of light allow us to experience only a small subset of the properties of that world. In other words, the naive realist view holds that the world we see is the world itself. This is the natural intuitive understanding of vision that we accept from the earliest days of childhood. The problem with this view however becomes clear on consideration of the role of the eye as the sense organ of vision. For the flow of visual information occurs exclusively in one direction, from the world through the eye to the brain. If the brain is the organ of consciousness, then it cannot in principle experience the world directly, but only indirectly, in response to the two-dimensional images sent to it from the eyes. This fact is in conflict with our subjective experience of objects and surfaces outside of ourselves, because our conscious experience appears to escape the confines of our physical being, to extend into the external world beyond our sensory receptors. The causal chain of vision therefore refutes the naive realist view of vision, as explained by KÖHLER (1929). It is due to this naive realist view therefore that consciousness is often considered to be somehow mysterious, forever beyond our capacity to comprehend, for there is no known physical mechanism that can possibly account for the external nature of visual experience.
The solution to this paradox was discovered centuries ago by Immanuel KANT (1781) by the principle of epistemological dualism. KANT reasoned that we cannot actually experience the world itself as it is, but only an internal perceptual replica of the world. There are, in other words, two worlds of reality, the nouminal and the phenomenal world. The nouminal world is the objective external world, which is the source of the light that stimulates the retina. This is the world studied by science, and is populated by invisible entities such as atoms, electrons, and invisible forms of radiation. The phenomenal world is the internal perceptual world of conscious experience, which is a copy of the external world of objective reality constructed in our brain on the basis of the image received from the retina. The only way we can perceive the nouminal world is by its effects on the phenomenal world. Therefore the world we experience as external to our bodies is not actually the world itself, but only an internal virtual reality replica of that world generated by perceptual processes within our head.
Curiously this most central issue of vision has not received much attention in recent decades, and failure to understand this most significant issue has led to endless confusion in theories of visual representation. For the naive realist view suggests a very much simplified concept of the nature of the internal representation in vision. In the context of naive realism, introspective examination of the internal representation of vision, i.e. examination of the sensation within one's apparent head while viewing the world, reveals an abstract non-spatial entity as the internal code for external objects. This naive realist perspective therefore makes plausible many of the simplistic models of vision proposed over the centuries, and continues to cause confusion in modern neural network models of visual representation.
One reason for the persistent confusion on this issue is due to the fact that even the description of the causal chain of vision is somewhat ambiguous, since it can be interpreted in two alternative ways. Consider the statement that light from this page stimulates an image in your eye which in turn promotes the formation of a percept of the page. The ambiguity inherent in this statement can be revealed by the question "where is the percept?". There are two alternative correct answers to this question, although each is correct in a different spatial context. One answer is that the percept is up in your head, which is correct in the external or naive realist context of your perceived head being identified with your objective physical head, and since your visual cortex is contained within your head, that must also be the location of the patterns of energy corresponding to your percept of the page. The problem with this answer however is that no percept is experienced within your head where you imagine your visual cortex to be located. The other correct answer is that the percept of the page is right here in front of you where you experience the image of a page. This answer is correct in the internal spatial context of the entire perceived world around you being within your head. However the problem with this answer is that there is now no evidence of the objective external page that serves as the source of the light. The problem is that the vivid spatial structure you see before you is serving two mutually inconsistent roles, both as a mental icon representing the objective external page which is the original source of the light, and as an icon of the final percept of the page; i.e. the page you see before you represents both ends of the causal chain. And our mental image of the problem switches effortlessly between the internal and external contexts to focus on each end of the causal chain in turn. It is this automatic switching of mental context that makes this issue so elusive, because it hinders a consideration of the problem as a whole.
I propose an alternative mental image to disambiguate the two spatial contexts. I propose that out beyond the farthest things you can perceive in all directions, i.e. above the dome of the sky, and below the solid earth under your feet, or beyond the walls and ceiling of the room you see around you, is located the inner surface of your true physical skull, beyond which is an unimaginably immense external world of which the world you see around you is merely a miniature internal replica. In other words, the head you have come to know as your own is not your true physical head, but only a miniature perceptual copy of your head in a perceptual copy of the world, all of which is contained within your real head in the external objective world. This mental image is more than just a metaphorical device, for the perceived and objective worlds are not spatially superimposed, as is often assumed, but the perceived world is completely contained within your head in the objective world (KOFFKA 1935, p. 27-36). The advantage of this mental image is that it provides two separate and distinct icons for the separate and distinct internal and external worlds, that can now coexist within the same mental image. This no longer allows the automatic switching between spatial contexts that tends to confuse the issue. Furthermore, this insight emphasizes the indisputable fact that every aspect of the solid spatial world that we perceive to surround us is in fact primarily a manifestation of activity within an internal representation, and only in secondary fashion is it also representative of more distant objects and events in the external world.
Curiously, in the realm of spatial perception this very obvious principle has not been accepted in contemporary psychology. Phenomenological examination of spatial perception reveals a world composed of solid volumes bounded by colored surfaces embedded in a spatial void. Every point on every visible surface is perceived at an explicit spatial location in three- dimensions, and all of the visible points on a perceived object like a cube or a sphere are perceived simultaneously in the form of continuous surfaces in depth. Furthermore, the perception of multiple transparent surfaces reveals that multiple depth values can be perceived at any spatial location. However proposed models of spatial perception very rarely allow for such an explicit representation of depth. MARR's 21/2-D sketch (MARR 1982) for example encodes the spatial percept as a two-dimensional map of surface orientations, like a two-dimensional array of needles pointing normal to the perceived surface. KOENDERINK & VAN DOORN (1976, 1980, 1982) propose a representation where each point in the two-dimensional map is labeled as either elliptic, hyperbolic, or parabolic, together with a number expressing the Gaussian curvature of the perceived surface at that point. TODD & REICHEL (1989) propose an ordinal map where each point in a two-dimensional map records the order relations of depth and/or orientation among neighboring surface regions. GROSSBERG (1987a, 1987b, McLOUGHLIN & GROSSBERG 1998) proposes a depth mapping based on disparity between two-dimensional left and right eye maps. None of these compressed representations are isomorphic with our subjective perception of a full volumetric depth world. In particular, all of these representations have a problem with encoding multiple surfaces at different depths, as in the perception of transparency, or encoding the volume of empty space that is perceived between the observer and a visible surface.
However the abstracted or reduced representation, while undoubtedly an essential component of perception, is not sufficient by itself to account for the nature of visual experience. For the subjective experience of perception is not of an edge image, but of a filled-in surface brightness image. If the retinal ganglion cells do in fact encode only transitions of brightness across image edges, then some process downstream of the retinal image must reverse the process and fill in the surface brightness values to account for the subjective experience of visual perception. In fact the identification of this constructive or generative aspect of perception represents one of the most significant contributions of Gestalt theory.
I propose a perceptual modeling approach, i.e. to model the percept as observed subjectively rather than the neurophysiological mechanism by which it is supposedly subserved. In other words the perceptual model should be expressed in terms of solid volumes bounded by colored surfaces embedded in a spatial void, as observed in visual experience. This perceptual modeling approach must eventually converge with theories of neural representation, at which point it will be possible to relate the perceptual variables of color and shape to neurophysiological variables such as voltages or spiking frequencies as required. In fact, until a mapping is established between subjective experience and the neurophysiological state, a perceptual model is the only valid model to match to psychophysical data, which explicitly measures the subjective experience of perception rather than the corresponding neurophysiological state.
Given this kind of explicit spatial representation of subjective experience, the function of visual perception can now be expressed as a transformation from the two-dimensional visual input (or pair of two-dimensional images in the binocular case) to a solid three-dimensional volumetric representation of the spatial percept generated by that input. Whatever the neurophysiological reality of the perceptual mechanism, at least this information must be encoded neurophysiologically to account for the subjective experience of spatial perception. Merely expressing the problem in these terms eliminates a number of commonly accepted models of spatial representation.
Consider the phenomenon of perspective, for example how railroad tracks viewed in perspective appear to converge to a point in the distance. The reason why they converge has nothing to do with their objective geometrical arrangement, for parallel lines neither converge, nor do they meet at a point. However in perceived space the tracks are observed both to converge and to meet at a point, and that point is perceived at a finite distance beyond which the tracks are no longer represented. This property of perceived space is so familiar in everyday experience as to seem totally unremarkable. And yet this most prominent violation of Euclidean geometry offers clear evidence for the non-Euclidean nature of perceived space. For the two rails are perceived to be straight and parallel throughout their length, even though they are also perceived to meet at a point up ahead and behind, while at the same time passing to either side of a percipient standing between them. The tracks must therefore in some sense be perceived as being bowed, and yet while bowed, they are also perceived as being straight. This can only mean that the space itself must be curved.
The curved properties of perceived space have been quantified in psychophysical experiments dating to observations by HELMHOLTZ (1925). Subjects in a dark room were presented with a horizontal line of point lights at eye level in the frontoparallel plane, and instructed to adjust their displacement in depth until they were perceived to lie in a straight line in depth. The resultant line of lights curves inwards towards the observer, the amount of curvature being a function of the distance of the line of lights from the observer. The HILLEBRAND- BLUMENFELD alley experiments (HILLEBRAND 1902, BLUMENFELD 1913) extended this work with different configurations of lights, and mathematical analysis of the results (LUNEBURG 1950, BLANK 1958) characterized the nature of perceived space as Riemannian with constant Gaussian curvature (see GRAHAM 1965 and FOLEY 1978 for a review). In other words, perceived space bows outward around the observer, as seen in the bowed railway tracks.
The observed warping of perceived space is exactly the property that allows the finite representational space to encode an infinite external space. This property is achieved by using a variable representational scale, i.e. the ratio of the physical distance in the manifold relative to the distance in external space that it represents. This scale is observed to vary as a function of distance from the center of the manifold, such that objects close to the body are encoded at a larger representational scale than objects in the distance, and beyond a certain limiting distance the representational scale, at least in the depth dimension, falls to zero, i.e. objects beyond a certain distance lose all perceptual depth. This is seen for example where the sun and moon and distant mountains appear as if cut out of paper and pasted against the dome of the sky.
LEHAR & McLOUGHLIN (1998) propose a transformation to perceptual space using a polar coordinate system centered on the percipient, in which azimuth and elevation angles are preserved, but the radial distance is encoded in terms of vergence, or angle of convergence between eyes in a binocular system. In other words, point P(a,b,r) in Euclidean space is transformed to point Q(a,b,(pi-v)) in perceptual space, where a and b represent azimuth and elevation angles, while the radial distance r is compressed to the vergence representation v by the equation
Since azimuth and elevation angles are also closed dimensions, this transformation maps the infinity of Euclidean space into a finite spherical space as suggested in Figure 1b.
Figure 1c shows how such a compression of the depth dimension would encode the visual space around a man walking down a road.
The fact that the distortion of this space is not immediately apparent to the percipient is explained by the fact that the percipient's sense of scale is itself distorted along with the space. For example the vertical and horizontal grid lines depicted in Figure 1d would be perceived to be straight and parallel, and separated by uniform intervals.
If the reference grid of Figure 1d is used to measure lines and distances in Figure 1c, the bowed line of the road on which the man is walking is aligned with the bowed reference grid, and therefore is perceived to be straight. Likewise, the vertical walls of the houses in Figure 1c bow outwards away from the observer, but in doing so, they follow the curvature of the reference grid in Figure 1d, and are therefore perceived to be both straight and vertical. Similarly, the houses in Figure 1c would be perceived to be of approximately the same size and depth, although the farther houses are experienced at a lower perceptual resolution. This distortion of the perceptual reference scale accounts for the paradoxical but familiar property of perceived space, whereby more distant objects are perceived to be both smaller, and yet at the same time to be undiminished in size. This corresponds to the difference in subjects' reports, depending on whether they are given objective v.s. projective instructions (COREN, WARD, & ENNS 1979. p. 500) in how to report their observations, showing that both types of information are available perceptually.
This "picture-in-the-head" or "Cartesian theatre" concept of visual representation has been criticized on the grounds that there would have to be a miniature observer to view this miniature internal scene, resulting in an infinite regress of observers within observers. PINKER (1984, p. 38) points out however that there is no need for an internal observer of the scene, since the internal representation is simply a data structure like any other data in a computer, except that this data is expressed in spatial form. The little man at the center of this spherical world therefore is not a miniature observer of the internal scene, but is itself a spatial percept, constructed of the same perceptual material as the rest of the spatial scene, for that scene would be incomplete without a replica of the percipient's own body in his perceived world.
However an argument can be made for the adaptive value of a neural representation of the external world that could break free of the tissue of the sensory or cortical surface in order to lock on to the more meaningful coordinates of the external world, if only a plausible mechanism could be conceived to achieve this useful property. The issue therefore is whether we have enough knowledge about the theory of information processing systems to make a judgement about the plausibility of such a rotation invariant representation of spatial structure. The history of psychology is replete with examples of plausibility arguments based on the limited technology of the time which were later invalidated by the emergence of new technologies. The outstanding achievements of modern technology, especially in the field of information processing systems, might seem to justify our confidence to judge the plausibility of proposed processing algorithms. And yet, despite the remarkable capabilities of modern computers, there remain certain classes of problems that appear to be fundamentally beyond the capacity of the digital computer. In fact the very problems that are most difficult for computers to address, such as extraction of spatial structure from a visual scene especially in the presence of attached shadows, cast shadows, specular reflections, occlusions, perspective distortions, as well as the problems of navigation in a natural environment, etc. are problems that are routinely handled by biological vision systems, even those of simpler animals. On the other hand, the kinds of problems that are easily solved by computers, such as perfect recall of vast quantities of meaningless data, perfect memory over indefinite periods, detection of the tiniest variation in otherwise identical data, exact repeatability of even the most complex computations, are the kinds of problems that are inordinately difficult for biological intelligence, even that of the most complex of animals. It is therefore safe to assume that the computational principles of biological vision are fundamentally different from those of digital computation, and therefore plausibility arguments predicated on contemporary concepts of what is computable are not applicable to biological vision.
Indeed many of the most difficult aspects of vision are exactly those that were characterized by the Gestalt movement. A central focus of Gestalt theory was the issue of invariance, i.e. how an object, like a square or a triangle, can be recognized regardless of its rotation, translation, or scale, or whatever its contrast polarity against the background, or whether it is depicted solid or in outline form, or whether it is defined in terms of texture, motion, or binocular disparity. The ease with which these invariances are handled in biological vision suggests that invariance is fundamental to the visual representation. Even in the absence of a neural model with the required properties, the invariance property can be encoded in a perceptual model. In the case of rotation invariance, this property can be quantified by proposing that the spatial structure of a perceived object and its orientation are encoded as separable variables. This would allow the structural representation to be updated progressively from successive views of an object that is rotating through a range of orientations. However the rotation invariance property does not mean that the encoded form has no defined orientation, but rather that the perceived form is presented to consciousness at the orientation and rate of rotation that the external object is currently perceived to possess. In other words, when viewing a rotating object, like a person doing a cartwheel, or a skater spinning about their vertical axis, every part of that visual stimulus is used to update the corresponding part of the internal percept even as that percept rotates within the perceptual manifold to remain in synchrony with the rotation of the external object. The perceptual model need not explain how this invariance is achieved computationally, it must merely reflect the invariance property manifest in the subjective experience of perception. The property of translation invariance can be similarly quantified in the representation by proposing that the structural representation can be updated from a stimulus that is translating across the sensory surface, to update a perceptual effigy that translates with respect to the representational manifold. This accounts for the structural constancy of the perceived world as it scrolls past a percipient walking through a scene, with each element of that scene following the proper curved perspective lines as depicted in figure 1d, expanding outwards from a point up ahead, and collapsing back to a point behind, as would be seen in a cartoon movie rendition of figure 1c. Whatever the computational mechanism behind this remarkable performance, these are the observed properties of the spatial percept.
The fundamental invariance of such a representation offers an explanation for another property of visual perception, i.e. the way that the individual impressions left by each visual saccade are observed to appear phenomenally at the appropriate location within the global framework of visual space depending on the direction of gaze. This property can be quantified in the perceptual model by proposing that the sensory image from the retina is copied onto the front surface of the eye of the perceptual homunculus, from whence that image is projected outward into perceived space in the direction of gaze, taking into account eye, head, and body orientation relative to the perceived world. Proprioceptive and kinesthetic information are used to update the body posture and orientation of the perceptual effigy of the body including the ocular orientation, to ensure that the retinal projection occurs in the appropriate direction in perceived space. In the case of binocular viewing, the projections from the two eyes are crossed in perceptual space, where their intersection in depth defines the three-dimensional binocular percept, as suggested by the projection field theory of binocular vision (BORING 1933, CHARNWOOD 1951, KAUFMAN 1974, JULESZ 1971, MARR & POGGIO 1976).
The percept of the surrounding environment therefore serves as a kind of three-dimensional frame buffer expressed in global coordinates, that accumulates the information gathered in successive visual saccades and maintains an image of that external environment in the proper orientation relative to a spatial model of the body, compensating for body rotations or translations through the world. Portions of the environment that have not been updated recently gradually fade from perceptual memory, which is why it is easy to bump one's head after bending for some time under an overhanging shelf, or why it is possible to advance only a few steps safely after closing one's eyes while walking. Given the rotation invariance of the representation described above, it is immaterial whether the body percept rotates relative to a static world percept as suggested above, or whether the body or head percept remains fixed as the world percept rotates around it, either way would be isomorphic to the subjective experience.
The neurophysiological studies of the cortex using single cell recordings might appear to be inconsistent with the non-anchored representation proposed here. However the only cortical areas which are clearly defined spatial maps are the primary areas, such as the primary visual and somatosensory cortices. Cells in the higher cortical areas, while still somewhat topographic, exhibit progressively reduced spatial specificity, and in the highest level "association cortex" areas cells appear to lose all detectable spatial organization. This is exactly the property that would be expected in a non-anchored representation that is coupled in hierarchical stages to a brain- anchored map. Indeed the location of the parietal cortex between visual and somatosensory areas would suggest its function should be to associate the sensory-surface-mapped areas of vision and touch. But the spaces defined by the surface of the skin and the visual image on the retina can only be meaningfully related in a fully spatial context and by way of a non-anchored representation. It should come as no surprise that non-anchored patterns of activation in the cortex have not been detected in single-cell recordings, since the very nature of the brain-anchored electrode is predicated on an assumption of a brain-anchored representation.
"American psychology all too often makes no attempt to look naively, without bias, at the facts of direct experience, with the result that American experiments quite often are futile. In reality experimenting and observing must go hand in hand. A good description of a phenomenon may by itself rule out a number of theories. ... Without describing the environmental field we should not know what we had to explain." (KOFFKA 1935, p. 73).
This statement remains as true today as it was six decades ago.
ATTNEAVE, F. (1954) Some Informational Aspects of Visual Perception. Psychology Reviews, 61 183-193.
ATTNEAVE, F. (1977) The Visual World Behind the Head. American Journal of Psychology 90 (4) 549-563.
BIEDERMAN, I. (1987) "Recognition-by-Components: A Theory of Human Image Understand- ing". Psychological Review 94, 115-147.
BLUMENFELD, W. (1913) Untersuchungen über die Scheinbare Grösse im Sehraume. Z. Psy- chol., 65 241-404.
BLANK, A. A. 1958 Analysis of Experiments in Binocular Space Perception. J. Opt. Soc. Amer., 48 911-925.
BORING E. G. (1933) The Physical Dimensions of Consciousness. New York: Century.
BROAD, C. D. (1978) Kant - an introduction. Cambridge: Cambridge University Press.
CHARNWOOD J. R. B. (1951) Essay on Binocular Vision. London, Halton Press.
COREN, S. WARD, L. M. & ENNS J. J. 1979 Sensation and Perception. Ft Worth TX, Harcourt Brace.
FOLEY, J. M. (1978) Primary Distance Perception. In: Handbook of Sensory Physiology, Vol VII Perception. R. Held, H. W. Leibowitz, & HJ. L. Tauber (Eds.) Berlin: Springer Verlag, pp 181- 213.
GALLI, A. (1932) Über mittels verschiedener Sinnesreize erweckte Wahrnehmung von Scheinbe- wegung. Arch. f. d. Ges. Psych. 85, 137-180.
GIBSON, J. J. (1966) The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
GRAHAM, C. H. 1965 Visual Space Perception. in C. H. Graham (Ed.) Vision and Visual Percep- tion. New York, John Wiley 504-547.
GROSSBERG, S. (1987a) Cortical dynamics of three-dimensional form, color and brightness perception. I. Monocular theory. Perception & Psychophysics 41 87-116.
GROSSBERG, S. (1987b) Cortical dynamics of three-dimensional form, color and brightness perception. II. Binocular theory. Perception & Psychophysics 41 117-158.
HEELAN, P. A. (1983) Space Perception and the Philosophy of Science Berkeley, University of California Press.
HELMHOLTZ, H. (1925) Physiological Optics. Optical Society of America 3 318.
HILLEBRAND, F. (1902) Theorie der Scheinbaren Grösse bei Binocularem Sehen. Denkschr. Acad. Wiss. Wien (Math. Nat. Kl.), 72 255-307.
HUBEL, D. (1988) "Eye, Brain, and Vision". New York, Scientific American Library.
JULESZ B. (1971) Foundations of Cyclopean Perception. Chicago, University of Chicago Press.
KANIZSA, G. (1979) Organization in Vision. New York, Praeger.
KANT, I. (1781) Critique of Pure Reason.
KAUFMAN (1974) Sight and Mind. New York, Oxford University Press.
KOENDERINK, J. & Van DOORN A. (1976) The singularities of the visual mapping. Biological Cybernetics 24, 51-59.
KOENDERINK, J. & Van DOORN A. (1980) Photometric invariants related to solid shape. Optica Acta 27 981-996.
KOENDERINK, J. & Van DOORN A. (1982) The shape of smooth objects and the way contours end. Perception 11 129-137.
KOFFKA, K. (1935). Principles of Gestalt Psychology. New York, Harcourt Brace & Co.
KÖHLER, W. (1938) The Place of Value in a World of Facts. New York: Liveright.
KÖHLER, W. (1947) Gestalt Psychology. New York: Liveright.
KÖHLER, W. & HELD R. (1947) The Cortical Correlate of Pattern Vision. Science 110: 414- 419.
KÖHLER, W. (1929) Ein altes Scheinproblem. Die Naturwissenschaften 17, 395-401. Reprinted in Henle M. (Ed.) (1971) The Selected Papers of Wolfgang Köhler. New York, Liveright.
LEHAR, S. & McLOUGHLIN, N. (1998) Gestalt Isomorphism II: The Interaction Between Brightness Perception and Three-Dimensional Form. Perception (submitted for publication).
LUNEBURG, R. K. (1950) The Metric of Binocular Visual Space. J. Opt. Soc. Amer., 40 627- 642.
MARR D. & POGGIO T. (1976) Cooperative Computation of Stereo Disparity. Science 194 283- 287.
MARR, D, (1982) Vision. New York, W. H. Freeman.
McLOUGHLIN, N. & GROSSBERG, S. (1998) Cortical Computation of Stereo Disparity. Vision Research 38 91-99.
MÜLLER G. E. (1896) Zur Psychophysik der Gesichtsempfindungen. Zts. f. Psych. 10.
O'REGAN, K. J., (1992) Solving the `Real' Mysteries of Visual Perception: The World as an Outside Memory Canadian Journal of Psychology 46 461-488.
PINKER, S. (1984) "Visual Cognition: An Introduction." Cognition 18, 1-63.
REED E. S. (1988) James J. Gibson and the Psychology of Perception. New Haven CT, Yale Uni- versity Press.
TAMPIERI, G. 1956 Sul Completamento Amodale di Rappresentazioni Prospettiche di Solidi GeometriciSS. Atli dell' XI Congresso Degli Psicologi Italiani, ed. L. Ancona, pp 1-3 Milano: Vita e Pensiero.
TODD, J, & REICHEL, F, (1989) Ordinal structure in the visual perception and cognition of smoothly curved surfaces Psychological Review 96 643-657.