Ph. D. Thesis
Boston University 1994
Attneave [1] discusses the relevance of information theory [41] to the issue of representation in the visual system. According to the tenet of information theory, information compression is an essential component to visual abstraction. This work reveals a duality in the nature of visual processing, which can be separated into two complementary functions, that of a bottom-up abstraction, and of top-down completion of visual information. These complementary functions will be discussed in the following sections, together with a discussion of the Boundary Contour System / Feature Contour System models (BCS / FCS) vision models [18, 19, 21] where the operations of boundary completion and brightness filling-in are examples of the latter function of visual processing. In the following chapters I will present two models that inherit many properties of the BCS model. The first model is a directed diffusion to allow boundary completion to occur across gaps that are larger than the size of the filter responsible for the completion. The second model is a theory that invokes harmonic oscillations in the oriented representation of the model in order to account for boundary completion through image vertices defined by multiple orientations at a single location. This model makes specific predictions about a wide range of visual illusions. Finally, a model of abstraction and representation of visual information, is presented as an extension to the orientational harmonic theory. This model will be shown to have properties of bottom-up top- down resonant matching as seen in the Adaptive Resonance Theory (ART) [5]
The phenomenon of illusory boundaries offers an invaluable tool for probing the mechanisms of visual perception because the illusory nature of these manifestations reveals their origin as being exclusively from within the visual system, rather than due to actual features present in the input. In other words, illusory phenomena factor out the perceptual mechanism from the object of perception. The study of these illusions therefore allows a precise characterization of internal interactions within the visual system. A large body of research has emerged over the last decades with the objective of quantifying precisely the conditions under which illusory phenomena occur, the types of illusions that occur under those conditions, and the exact nature and salience of the resulting illusions.
As a result of these studies, it has become apparent that the mechanism responsible for illusory phenomena exhibits lawful interactions between inducing features, and these interactions can be characterized by general principles that apply in a large number of individual cases. The first serious attempt to codify these laws was made by the Gestalt psychologists [67, 68, 40]. These researchers proposed a list of general and often abstract properties that are seen in the illusory phenomena. The Gestalt grouping laws suggest that similarity, proximity, good continuation, symmetry, good gestalt, common fate, etc. are properties of visual stimuli that tend to support global grouping percepts. While these laws are meaningful in a qualitative sense, they are difficult to quantify with sufficient precision as to make reliable predictions about illusory phenomena. As a result, it was difficult to propose specific mechanisms of visual perception based on these general, empirical laws.
Certain significant observations about the nature of visual perception were however revealed by the Gestalt laws. Chief among these was the concept that visual perception is not a simple feed-forward system amenable to reductionist analysis, because the perception of basic elements or visual primitives appears to be strongly influenced by global groupings of those elements. This poses somewhat of a chicken-and-egg paradox, because the global groupings themselves are defined by the visual primitives of which they are composed. The Gestalt insight therefore was that visual elements interact with one another by way of field-like influences, reminiscent of electrical or magnetic fields, whereby every element is influenced simultaneously by every other element, and global groupings emerge as a result of the simultaneous interactions between local elements, even as local elements emerge under the influence of the global groupings that they engender. Gestalt modelers exemplified this concept by way of physical analogies, such as the soap bubble, whereby a global symmetry is seen to emerge by way of purely local interactions between individual elements in the soap membrane; or the wooden spline, which assumes the shape of a globally smooth curve between clamped endpoints by way of local elastic interactions.
The recent emergence of neural network models has provided a more quantitative paradigm for expression of the field-like interactions proposed by the Gestalt models. Neural network models make use of neural receptive or projective fields defined by smooth spatial functions to mediate the spatial interactions between individual units representing specific visual features. The simultaneous emergence of local features and global groupings can be modeled by way of a parallel analog relaxation in a dynamic neural network model, implemented in computer simulations by iterative calculation of both bottom-up and top-down interactions; the bottom-up to represent the influence of the visual element on the global grouping, and the top-down to represent the influence of the global grouping on the perception of the elements. Neural network models are also consistent with neurophysiology, forming a bridge between theories of mind and brain. On the one hand the visual system detects visual features and represents them in an abstract, compressed form for recognition and recall, while on the other hand it maintains a veridical facsimile of the external visual world in a form available for internal use.
Attneave has formalized the Gestalt notion of field-like interactions within the context of information theory. Specifically, he points out that visual perception involves the two separate and complementary functions of abstraction and completion. Abstraction involves the elimination of redundant information, in order to reduce to manageable proportions the overwhelming volume of data available at the retina. For example, Attneave [1] shows how image information can be encoded more compactly as a function of the transitions which occur at image edges, rather than as an explicit representation of image intensity. This principle is well known in the field of image processing, where image compression techniques convert images to representations with a minimum of redundancy. A fundamental principle behind such techniques relies on the observation that regions of uniform or repeated values can be encoded as a single value for the whole region, together with the boundaries of the region over which that value holds. Since the dimensionality of the boundaries is always less than that of the region they delimit, a considerable savings of storage resource can often be realized. The encoded information itself may contain redundancy which can be further reduced by higher levels of encoding. For example, in the case of an image containing uniform regions, if the boundaries of those regions consist of regular forms such as lines or arc segments, a further reduction can be achieved by encoding the regularity or pattern of those forms, together with the bounds between which they apply. This process can be repeated any number of times as long as there remains some regularity or redundancy in the representation. The result is a compact, hierarchical description of the original data in terms of patterns of regularity found at each level.
The notion that this kind of compression occurs also in natural vision is supported by neurophysiological studies of the retina which indicate that even at this earliest stage of visual processing, the information available at the photoreceptors is transformed by retinal processing into spatial and temporal derivatives of the input. Adelson and Bergen [38] discuss how a spatial derivative of the local average of intensity in two dimensions is similar to the on-center off-surround response of the retinal ganglion cell, and that the derivative taken in one dimension is equivalent to an oriented edge representation, as in the simple cell response of the visual cortex. Within this type of system, a long straight edge would produce a repeated or redundant pattern of response, allowing for a possible compression of such edge information. For example, the edge could be encoded by a representation of the derivative of the orientation, i.e. the curvature. Points of high curvature therefore could be used to represent the bounds of intervening lines of low curvature without loss of information.
Attneave [1] illustrates how the points of highest curvature in a line drawing are sufficient to communicate the subject of the drawing. He presents as evidence "Attneave's Cat", shown in Figure 1 (A). This is a line drawing in which the lines of low curvature are replaced by straight lines, while preserving the recognizability of the cat. Similar evidence is presented by Biedermann, [4] who shows that removal of the low curvature lines from a line drawing preserves its recognizability as shown in Figure 1 (B), while removal of the points of high curvature makes the drawing unrecognizable, as showin in Figure 1 (C). These examples suggest that the points of high curvature may be sufficient to uniquely characterize a figure for human perception, and that the low curvature connecting lines can be reconstructed from the information found at the points of high curvature. In this sense, therefore, the lines of low curvature can be considered redundant information.
Attneave's cat (A); a sketch where the low curvature connecting lines have been replaced by straight lines, preserving only the points of high curvature, or vertices. Biederman's cup showing removal of lines of low curvature (B), or high curvature (C).
Further support for this notion comes from visual illusions such as the Kanizsa figures, as shown in Figure 2 In these figures, short oriented line segments, or inducers, are seen to generate illusory boundaries when the inducers are sufficiently aligned to be connected by low curvature lines. Perception of the illusory line drops off as a function of difference in orientation or displacement between the inducers. These illusions clearly illustrate that the visual system is capable of connecting high (and low) curvature points with low curvature illusory boundaries.
Kanizsa triangle (A) and curved Kanizsa triangle (B) showing the formation of an illusory triangle between inducers.
It is unlikely that this property of the visual system serves only to create visual illusions. A more likely conclusion is that the illusory boundary phenomenon reveals a fundamental mechanism of vision, which serves to complete boundaries that are broken or incomplete, active in real figures as well as illusory figures. If the low curvature boundaries of these illusory figures are represented implicitly by the boundary completion process, then explicit representation of those same boundaries would constitute redundant information. According to information theory, as discussed by Attneave, the visual system would most economically encode visual forms by encoding only vertices and points of high curvature, leaving the intervening edges to be encoded implicitly by the boundary completion process. Indeed, studies of saccadic eye movements such as those by Yarbus in 1957 (described in Hubel [25]) lend further support to this notion by revealing that visual saccades tend to jump between points of high curvature, with little time devoted to the areas in between. This indicates that these high curvature points contain information important for recognition. This raises an important issue on the subject of detection in vision models. Can it be said that a low curvature boundary has been "detected" by the visual system unless its presence has activated a specific cell tuned to that feature, which would remain inactive in the absence of that feature?
Zucker [71], Koenderink [34], and Parent [49] discuss the issue of curve detection in natural vision from a different perspective. In their view, curves represent specific features to be detected by specific curvature detectors. Zucker derives a mathematical form for curve detectors using considerations of cocircularity between oriented edges, to detect low curvature boundaries. In other words, Zucker proposes specific feature detectors for features which are redundant for recognition. A principal problem with this approach is that of combinatorial explosion. If curves are to be detected as explicit features rather than implicit completions, the visual cortex must posses specific detectors for every curvature at every orientation at every spatial location represented in the system. While the number of such detectors required may not be inconsistent with the known physiology of the visual cortex, this kind of combinatorial branching, especially when extended to higher level features beyond curvature, works contrary to the principles of information theory, geometrically increasing rather than decreasing the number of explicit representations of the visual stimulus at higher levels of the representational hierarchy.
The Boundary Contour System (BCS) [18, 19, 21] presents an alternative approach to curve detection that is more consistent with the concepts of information theory. Together with the Feature Contour System (FCS) [18, 19, 21] this model accounts for a wide range of psychophysical phenomena including visual illusions such as the Kanizsa figures. These two models in combination suggest that visual perception involves two distinct but interacting mechanisms, a boundary system which represents image edges and the interactions between them, and a feature system which mediates surface and brightness perception between boundaries represented in the BCS system. It is the grouping properties of the BCS that are responsible for the illusory boundary phenomena seen in the Kanisza figures, and the BCS will therefore represent the principal focus of this study. Figure 3 illustrates the basic architecture of the BCS. The cells at Figure 3(A) represent a layer of light sensitive cells such as the ganglion cells from the retina. The cells at Figure 3(B) represent cortical simple cells that receive input from the ganglion cells through oriented receptive fields, so that different cells at Figure 3(B) respond to edges of different orientations at Figure 3(A). For example the horizontal dark/light cell at Figure 3(B), highlighted in the figure, receives input from the highlighted elliptical region in layer (A), excitatory from the light half and inhibitory from the shaded half. All of the cells depicted at Figure 3(B) receive input from the same spatial location at Figure 3(A) through overlapping receptive fields at different orientations. The cells in layer (C) receive input from pairs of cells in layer (B) which represent edges that are parallel in orientation but of opposite direction of contrast. For example the highlighted cell at Figure 3(C) receives input from the horizontal dark/ light and the horizontal light/dark cells at Figure 3(B), producing an oriented representation that is independent of direction of contrast. The three big blocks at each layer represent three horizontally adjacent locations in the visual field.
Boundary Contour System (BCS) architectural overview. An image layer (A) consists of photodetectors which provide input to a set of contrast sensitive oriented edge detectors (B) in the next level by way of oriented receptive fields. A higher level oriented representation (C) receives input from pairs of cells of opposite direction of contrast in the previous layer resulting in a contrast insensitive response. Finally, a cooperative layer (D) receives input from contrast insensitive oriented cells at adjacent locations in order to respond to extended edges that pass through the central location. A feedback loop performs boundary completion at the central location when the appropriate inputs are found at adjacent locations.
A principal feature of the BCS model is its ability to perform boundary completion between oriented edges that are approximately aligned, like the inducers of the Kanizsa figures which produce illusory boundaries. The mechanism responsible for this boundary completion is a layer of cooperative cells depicted in Figure 3(D) which receives input from layer (C) through large, bipolar oriented receptive fields. Like the receptive fields of layer (B), these bipolar receptive fields occur at every orientation at each spatial location, but unlike those fields the cooperative cell receptive field spans many spatial locations (although only two are shown in the figure) in a direction parallel to the orientation of the inputs preferred by the cooperative cell. For example, the horizontal cooperative cell depicted in the figure has a receptive field that is horizontally aligned to receive input from layer (C) horizontal cells at horizontally adjacent locations.
Parametric studies by Kellman and Shipley [31] show that illusory boundaries can form even when the inducers are not perfectly aligned, although the salience of the boundaries drops off smoothly with increasing misalignment. The BCS model accounts for completion between such misaligned inducers by incorporating a certain spatial and orientational uncertainty in the receptive field of each bipole, that is, each bipole receives somewhat attenuated input from inducers that are nearly but not perfectly aligned in spatial location, spatial orientation, or orientation of the input inducer. It is this spatial and orientational uncertainty in the receptive field that allows the BCS to perform boundary completion across curved boundaries, such as that shown in Figure 2 (B).
Grossberg and Mingolla [18] observe that the boundary completion process occurs only inwards between inducers, never outward beyond inducers. This feature is implemented in the model by a conjunctive requirement between the two lobes of the filter, which specifies that the cooperative cell will not fire unless it receives input from both lobes simultaneously. Neurophysiological measurements from single cells in the visual cortex by von Der Heydt et al. [63] confirm the existence of cells with such properties. These authors report the existence of cells in area 18 of the visual cortex that help to "extrapolate lines to connect parts of the stimulus which might belong to the same object" (p. 1261). They found these cells by using visual images that induce a percept of illusory figures in humans, as in Figure 2. Concerning the existence of a cooperative boundary completion process between similarly oriented and spatially aligned cells they write "Responses of cells in area 18 that require appropriately positioned and oriented luminance gradients when conventional stimuli were used could often be evoked also by the corresponding illusory contour stimuli" (pp. 1261-1262). This is explained in the BCS model by a feedback signal from the cooperative cell to the oriented edge representation at that same spatial location, as shown in Figure 3. The complete BCS model includes many additional spatial and orientational competitive mechanisms that are of less interest for our analysis.
In the context of curve detection, the cooperative cell of the BCS is not a specific curve detector in the sense of Zucker's model of curve detection [71], but rather a generalized curve detector which responds principally to colinear alignment, but will tolerate a range of gentle curvatures, although it does not distinguish between them. A single BCS cooperative cell therefore performs the function of a bank of curvature filters found in the Zucker [71] model. Hence, while the Zucker [71] curve detector is considered as a specialized feature detector tuned to recognize specific curved stimuli, the BCS boundary completion operation serves to reconstruct boundaries between oriented inducers, rather than to recognize the curves as features in their own right. In this sense, the response of the BCS represents a mirroring, or veridical facsimile of the visual input, recreated in terms of dynamic interactions between components of the visual system. The higher level representations used for recognition would presumably be defined in terms only of the points of high curvature, or the vertices, rather than their connecting boundaries. Indeed, it is the existence of the operation of boundary completion which obviates the need for an explicit representation of those boundaries by providing a mechanism capable of reconstructing the boundaries on demand, given the stimulus of the inducers at the vertices. In this sense, the boundary completion mechanism can be seen as an image decompression system for visual recall, converting the higher level vertex representation into a more veridical boundary representation.
One question raised by the above discussion is whether an explicit reconstruction or decompression need actually be performed by the visual system, or whether the compressed version alone suffices for internal use. If the vertices of a Kanizsa figure uniquely characterize that figure, and if the visual system has abstracted that figure to a representation of those vertices, then for what reason would the system need the veridical or decompressed representation? There is indeed much debate on this issue between those who profess that the visual system regenerates perceptions at a low level, for example when filling in data missing due to blind spots or scotomas [55, 56], and those who claim that a high level representation is sufficient explanation for the low level perceptions observed in the blind spot phenomena [8]. Without wishing to become entangled in this debate, we can say with some certainty that on the basis of the Kanizsa illusions alone, it is clear that whether or not the visual system requires a high resolution reconstruction of the illusory forms, it certainly does perform such a reconstruction based only on the vertices, and that this reconstructed figure is sufficiently veridical as to be virtually indistinguishable from an actual geometrical form defined by a luminance difference relative to the background. Indeed, naïve observers often find it difficult to believe that such a luminance difference is not actually present in these illusions.
Kennedy [32] disputes the fact that illusory boundaries are indistinguishable from actual brightness differences seen in real figures. He quotes naïve observers of these figures who report seeing a line that "doesn't exist", or that "isn't there", which would indicate that the observers can clearly distinguish illusory boundaries from real physical boundaries. It is not clear however whether the subject's reported perception is a property of the perceived boundary itself, or whether it is a function of the observer's cognitive expectation that the paper on which the figure is printed should not exhibit such subtle shades of brightness as seen in the Kanizsa figures. Kennedy also discusses three dimensional illusory figures generated by solid three dimensional constructions, which produce illusory boundaries that seem to float in empty space [32]. These figures were shown to children as young as three years old, who reported, in answer to questions, that the figures "cannot be touched" or were "made of nothing", or "made of air". Again, Kennedy concludes that the percept itself is therefore less real than a brightness difference occasioned by a real solid object. An alternative explanation remains however, that the children saw a "real looking" three dimensional boundary, but could see no physical object responsible for producing that boundary, and therefore acknowledged this apparent contradiction between their perceptual sensation and their cognitive understanding of the physical world by calling the anomalous boundary unreal. Indeed, a similar sensation is produced by reflections through a concave mirror, which can produce a "real image" in the optical sense, i.e. a point in space through which light rays cross, producing a full three dimensional image of an object floating in empty space. Optically, the image is real as the reflected object itself, although the perceptual contradiction makes the image seem ethereal or unreal, especially when it is occluded by a hand passing behind it.
Whether or not illusory figures are seen with exactly the same subjective sense of "reality" as real figures is open to question. Nevertheless, these figures do unquestionably produce a percept of brightness that is "real" enough to be measured psychophysically, both by comparison with a test patch, and by nulling experiments where the surface under the illusory form is presented somewhat darker than the surrounding areas in order to exactly cancel the brightness effect [61]. This clearly distinguishes such modal illusory forms, like the illusory triangles in Figure 2 from purely amodal percepts, like the missing "pie slices" of the three dark circles in Figure 2, which are seen to be occluded by the illusory triangle, and are thus perceived invisibly "behind" that figure. The missing arc segments can thus be imagined, but by no means "seen" with the same vividness as the occluding illusory triangle.
Kennedy [32] argues that the "apparent reality" of a figure is indicative of the level of representation in the visual system, which, if one accepts his contention that these figures are seen to be "unreal", would indicate that they are represented somewhat higher in the visual system than real brightness edges. Information theory suggests on the other hand that the quantity of redundant information, or high spatial detail, is a better measure of representational level in the visual system. The fact that the illusory triangle is seen in full spatial resolution, including a specific curvature along its edges in Figure 2 (B), and with a perceived brightness at every point over its entire surface, indicates that the higher level "cognitive" recognition of the triangle elicited by the vertices of the Kanizsa figure is transformed by the visual system into a high resolution (and therefore, by information theory, a low level) percept in the same format as an actual brightness percept of a real triangle, even if that percept is arguably somewhat less perceptually "real". An information theory interpretation of the Kanizsa figure therefore leads to the conclusion that the visual system does indeed perform a low level, high resolution reconstruction of perceived high level figures. In this context therefore, the BCS can be seen as a higher level abstraction or compressed representation of the luminance patterns represented in the retinal image, and the BCS / FCS interaction represents a "top-down completion" or decompression of higher level boundary abstractions to a low level, more complete spatial representation. Within the BCS model the cooperative cells can also be seen as a higher level abstraction of more complete information in the oriented representation, and the top-down feedback from the cooperative layer can be seen as a de-compression or "filling in" of the oriented information at the lower layer.
The properties of illusory boundaries can be studied in order to derive the computational properties of the visual mechanism. Two types of illusory boundaries are observed, those that perform a colinear or smooth curvilinear completion between visual inducers, and those that include a sharp kink or vertex between colinear segments. Psychophysical phenomena suggest that these two types of illusory boundary are related, in that they are seen to occur under similar circumstances, but that a continuous variation of the inducers from a colinear to a vertex configuration produces a sharp transition in the resulting illusory boundary, suggesting that two distinct mechanisms might be involved in the formation of colinear versus vertex boundary completion. In the next section I will discuss the properties of colinear illusory boundaries, and in the following section I will discuss illusory boundary completion through image vertices.
A colinear illusory boundary is seen to appear between visual inducers that are aligned so as to be easily connected by a straight, or nearly straight line. In the case of the Kanizsa figures, the illusory boundary will form parallel to the inducer where they meet, although under other circumstances such as the Ehrenstein illusion the illusory boundary tends to form orthogonal to the inducer, as will be discussed below.
The properties of the colinear illusory boundary has been studied in a number of quantitative psycho-physical tests. Figure 4 summarizes a number of these properties schematically. Shipley & Kellman [59] show that the salience of the illusory contour decreases as a function of spatial separation between the inducers, as suggested schematically in Figure 4 (A). Banton and Levi [2] show that the strength of the illusory contour is a function of the salience of the inducer, as measured either by the contrast of that inducer as suggested in Figure 4 (B), or the size of the inducer relative to the length of the illusory boundary, as suggested in Figure 4 (C). If the size of the inducers and the distance between them are increased proportionately, however, the salience of the illusory contour remains unchanged.
Factors that influence the salience of illusory boundaries, illustrated schematically: distance between inducers reduces salience (A); contrast of inducers increases salience (B); size of inducers relative to the gap to be completed increases salience (C); bending misalignment between inducers reduces salience(D).
Kellman & Shipley [31] show that the strength of the illusory contour varies as the angle between the inducers, as suggested in Figure 4 (D), as long as certain relatability criteria hold. These criteria were derived empirically from psychophysical studies of a number of inducer configurations, and are summarized schematically in Figure 5. The relatability criteria are defined in terms of the linear extensions to the inducers, as indicated by the dotted lines in the figure, and they note that illusory boundaries can only form when the extensions intersect at an obtuse angle, as in Figure 5 (A). If the inducers are parallel, but somewhat mis-aligned, as shown in Figure 5 (B), the extensions to the inducing edges will not intersect, and therefore the edges are not relatable. In fact, Kellman & Shipley note, that an illusory boundary can still be seen under these conditions, but only if the amount of misalignment between the inducers is very small, otherwise no illusory boundary will be seen with such a shearing mis-alignment. Finally, edges are not relatable if their extensions intersect only within the inducer, as shown in Figure 5 (C), i.e. the extensions are only defined beyond the inducers. It is interesting to note that relatable edges can be connected by a single inflection curve, as shown by the shaded line in Figure 5 (A), whereas non-relatable edges can only be joined by a double inflection curve, as shown in Figure 5 (B) and (C). I will refer to a relatable misalignment due to rotation of the inducers as shown in Figure 5 (A) as a bending misalignment, and a non-relatable misalignment due to translation, as shown in Figure 5 (B), as a shearing misalignment. Figure 5 (C) exhibits both a bending and a shearing misalignment.
Relatability criteria defined by intersection of linear extensions of the inducers, illustrated schematically; relatable edges are those whose linear extensions intersect at an obtuse angle (A), non-relatable parallel inducers (B), non-relatable edges because the extension of one intersects the edge of the other, rather than the extension to the edge (C).
Illusory boundaries are also seen to project orthogonal to inducing lines, as illustrated by the Ehrenstein illusion, in Figure 6 (A). If each line of the Ehrenstein figure is considered as a long, thin rectangle, the short side of this rectangle is in fact parallel to the illusory contour, and thus could be considered to be the actual inducer in this case. Parametric studies by Lesher & Mingolla [39] however indicate that the salience of the illusory contour is not a simple function of the length of this short edge, which indicates that a separate orthogonal mechanism is involved in this case. Illusory boundaries are also seen to form at other angles besides exactly orthogonal, as shown in Figure 6 (B), although Nodine [46] shows that the salience of the resultant boundary is somewhat diminished. The illusory boundary therefore appears to be a function both of the local orientation of each inducer, and the global configuration of the inducers relative to each other. One possible explanation for this phenomenon derives from the fact that a line ending, such as those in the Ehrenstein illusion, stimulates oriented edge detectors through a range of orientations, producing a multi-orientation signal at the ends of the lines. An emergent global grouping then "selects" from the oriented responses only those that are consistent with the global grouping. This would explain why a small circular dot, which produces oriented responses uniformly at all orientations, can readily participate in a wide range of grouping phenomena. For example, Kanizsa [28] shows how lines of dots in smoothly curving configurations produce the percept of the smooth curves, as shown in Figure 6 (C). This property makes the small circular dot a very useful tool for the analysis of global groupings, because it factors the global grouping phenomenon from the bias introduced by the orientations of the local inducers. This property of the small circular dot will be exploited later to explore the global grouping phenomenon in the absence of the contribution of the local oriented signal.
Ehrenstein illusion (A); angled Ehrenstein illusion (B); colinear grouping of a line of dots (C); illusory vertex completion through dots (D).
The grouping phenomena I have discussed so far all result in the appearance of a linear illusory boundary. A number of visual illusions result in illusory contours that define sharp vertices. For example the illusory triangle in Figure 6 (D) passes through vertices defined by the three dots. Kanizsa [28] shows that when curves defined by lines of dots are given excessive curvature, the percept of a smooth colinear grouping breaks into a percept of straight line segments joined by sharp vertices located at the dots, as shown in Figure 7 (A) where circles of dots of increasing curvature begin to appear as polygons. Wilson and Richards [69] present psychophysical evidence that the visual system uses two distinct mechanisms for the perception of gentle and steep curves. Subjects were presented with curves constructed of a parabola flanked by straight line segments, as shown in Figure 7 (B), and were asked to discriminate between two similar curves for a range of different curvatures. It was found that the accuracy of discrimination was greater for the more highly curved lines, with a sudden discontinuity at a certain curvature, suggesting that the visual system uses two separate mechanisms for the perception of gentle and steep curves. Lines of dots presented in similar curves illustrate the same phenomenon, as shown in Figure 7 (C), appearing to "break" into a sharp vertex at a certain curvature. These phenomena indicate that illusory boundary formation can occur in two modes, either completing smooth "colinear" curves, or through sharp vertices. The gradual transition between these two modes however suggests that a similar mechanism underlies both types of boundary completion.
Circular grouping changes to polygonal grouping with increased curvature (A); curvature discrimination stimuli used by Wilson and Richards (B); colinear grouping changes to sharp vertex percept with increased curvature (C).
Another example of the unity of colinear and vertex boundary completion is provided by the minimal Ehrenstein figure shown in Figure 8,which can produce the percept of an illusory circle, square, diamond, or amorphous blob. It is interesting that in this figure the corners of the illusory square are not located by inducing dots, as in the case of the illusory triangle in Figure 6 (D), but appear spontaneously in a featureless region of the image.
Ambiguous illusion with minimal Ehrenstein figure (A), with three commonly seen illusory forms (B, C, D).
The vertices defined by the illusory boundaries in Figures 6 (D), 7 (A) and (C), and 8 (B) and (C), all consist of an intersection of two illusory boundaries at that vertex. Illusory boundary completion can also occur through vertices defined by multiple orientations in a large range of combinations. Figure 9(A), (B), (C) and (D) shows arrangements of dots that produce illusory boundaries through vertices defined by one, two, three, and four dots respectively. A distance dependent relationship is seen in the grouping patterns of these dots, where the pattern defined by the nearest neighboring dots determines the percept of the vertex, and masks the groupings defined by more remote dots. For example, in Figure 9 (B) the horizontal grouping is masked by the stronger vertical grouping because the vertical distances are smaller than the horizontal ones. This phenomenon is discussed at length by Zucker et al. [70]. If alternate rows of Figure 9 (A) were removed, the horizontal grouping would emerge, showing that the horizontally adjacent dots are sufficiently close to generate a horizontal grouping, but that that grouping is masked by the nearer vertical grouping. Similarly, in Figure 9 (C) there are three nearest neighbors for each dot, which defines a three-way vertex passing through every dot in this pattern. If rows 2, 3, 6, 7, and 10 were removed from Figure 9 (C), a strong vertical grouping would immediately appear. Similarly, notice how Figure 9 (D) promotes a four-way grouping at each dot, suppressing an alternative diagonal grouping between the dots.
Illusory boundary formation through vertices composed of one (A), two (B), three (C), and four (D) oriented edges.
Two additional properties of illusory figures are worthy of note, and will be of significance in later discussions. First, illusory contours are seen to form inwards between inducers, but they are rarely seen to extend outwards beyond inducers. This phenomenon is seen clearly in Figure 9 (A), where an illusory boundary forms between pairs of dots, but stops abruptly at those dots. It is also seen in Figure 9 (B), (C), and (D) at the boundaries of those figures where the regular lattice of illusory contours between the dots is seen to end abruptly. This is the property of illusory boundaries that motivates the conjunctive constraint in the BCS model, that was discussed above.
Another property of significance to our discussion is the apparent occlusion of visual features behind an illusory figure, as seen in the Kanizsa and the Ehrenstein illusions. For example in the minimal Ehrenstein illusion, a cross shaped configuration is clearly recognized as being occluded behind a central illusory figure. The question for computer models of illusory phenomena is whether these amodally perceived objects should be represented as a part of the percept. The solution proposed by Grossberg and Mingolla [18] is to represent the modal and amodal percepts in separate layers of the system, where the amodal percept, represented by the BCS, would indeed encode the hidden figure as an explicit pattern of activation, whereas the modal percept, represented by the FCS would represent only the occluding form, corresponding to the visible percept of the image.
I have discussed illusory contour formation in terms of two types of boundary completion, colinear, which includes smooth curvilinear completion, and vertex completion. In the next chapter I will discuss neural network models of colinear boundary completion and problems that can arise in connection with these models. I will then propose a model of colinear boundary completion by a directed diffusion of oriented information in a neural network model, as an extension to the BCS model. I will show that the properties of this directed diffusion model are consistent with the properties of illusory contours as observed in the psychophysical studies. In the following chapter I will extend this model to handle the case of boundary completion through image vertices defined by a number of oriented edges that intersect at a point in the image. Figure 27.
Colinear boundary completion is a spatial operation that relates adjacent or nearby oriented edges which are found in an approximately colinear configuration. Colinearity between pairs of oriented edges can be defined as oriented signals that are at the same time parallel in orientation and spatially aligned in a direction that is parallel to that orientation. Figure 10 illustrates this principle. In Figure 10 (A), the barred circles represent two oriented detectors that are responding to a horizontally oriented edge at each location. The colinearity of these local edges depends on both the orientation of each local edge, and the locations of those edges with respect to each other. In this case, for example, colinearity of these edges can be detected by the fact that they are both horizontal, and at the same time horizontally disposed with respect to each other. This global colinearity relation between the two local edges is represented in Figure 10 (B) by the horizontally oriented detector which expects to receive horizontally oriented inputs, indicated by the barred circles, at two horizontally displaced locations, indicated by the dumbell shape.
Colinearity between a pair of oriented inputs (A) implies that they are both parallel and spatially aligned, as suggested by the colinearity detector (B) which is sensitive to pairs of oriented inputs that are both parallel and aligned.
In a neural network model this spatial relation can be detected by a cell with a spatial receptive field that receives colinear input from nearby regions. The dumbell in Figure 10 (B) can be considered as a central cell whose receptive field is sensitive only to horizontally oriented input signals at a specific spatial separation. In fact, the detector depicted in Figure 10 (B) would be intolerant to any deviation from exact colinearity. The properties of illusory boundary completion discussed in the previous chapter indicate a considerable tolerance for small deviations from colinearity, so that a better model would allow a certain tolerance to deviation in both location and orientation of the input signals. This is the approach used in the BCS model, where the bipolar receptive field of the cooperative cell samples two nearby areas for appropriately oriented features. The distance between the sample points determines the separation over which boundary completion can occur.
Psychophysically, illusory contour formation is observed over a wide range of separations. In a neural network model this can be achieved in two ways. The system can be defined at multiple spatial scales, with cooperative cells defined at each of those scales, and, a spatial tolerance can be defined as a property of the receptive field, allowing it to respond over a range of spatial locations at each spatial scale. The BCS model employs a combination of both of these strategies, where spatial tolerance in the receptive field allows a single cell to perform boundary completion through a range of separations, and cells of different scales allow boundary completion through different ranges of separations. In this system therefore there is a trade-off between the spatial tolerance within each spatial scale, and the number of scales needed to cover the total range; the greater the tolerance within each scale, the less scales are required in the system as a whole.
At a single spatial scale, spatial tolerance in the receptive field implies that oriented inputs that are somewhat misaligned spatially from perfect colinearity can still produce a colinearity response, although the magnitude of that response should be diminished as a function of the misalignment in order to conform to the psychophysical data, as described in the previous chapter. This can be represented by a spatial spread in the receptive field of the cooperative cell, with a smooth fall-off in magnitude with spatial displacement, as suggested in Figure 11 (A). The psychophysical studies discussed above also indicate an orientational tolerance in illusory boundary completion, which can be represented by an orientational tolerance in the receptive field, as shown schematically in Figure 11 (B). In this figure each radial line represents a sensitivity in the receptive field to the orientation represented by that line, with a magnitude which is proportional to the length of each line. In both figures 11 (A) and (B), the spatial and orientational tolerance is depicted at discrete displacements and orientations for convenience, although in both cases the symbols represent a continuous response function through a range of locations and orientations, with a smooth fall-off in the response of the receptive field as a function of the deviation from the ideal displacement and orientation. The spatial and orientational tolerances can be combined in a single receptive field, as depicted in Figure 11 (C), which will perform colinear boundary completion through a range of spatial and orientational displacements. The spatial and orientational functions are however not independent, because the cooperative cell at a particular location represents an edge passing through that location, so that a deviation in spatial location requires a corresponding deviation in orientation in order to still pass through the center of the cell. For example, an oriented edge from the left that is displaced upwards would form colinear completion with an edge from the right, displaced downwards, with a common orientation that passes diagonally through the center of the cell at a diagonal orientation. The optimal input orientation could vary therefore as a function of spatial displacement in a radial manner from the center of the filter, as shown in Figure 11 (D). This radial input function is the one employed by the BCS model as originally defined by Grossberg & Mingolla [18], although different input functions have been proposed for smoother boundary completion. For example Cruthirds, Gove, Grossberg & Mingolla [personal communication] have proposed a parabolic input function, as shown in Figure 11 (E), which defines a smoother curve through the center of the filter.
Input tolerances of the cooperative cell receptive field. Spatial tolerance only (A); orientational tolerance only (B); spatial and orientational tolerance (C); related spatial and orientational tolerance with radial input orientation fucntion (D); with parabolic input orientation function (E).
A smooth curve can be approximated well locally by a short line segment tangent to that curve. Even a cooperative cell with a strictly colinear receptive field therefore, as in Figure 11 (D) will produce a good match to smooth curves as long as the receptive field is small relative to the degree of curvature. The response of the cell to a curved stimulus will become increasingly sensitive however to small deviations of the curve from the idealized curvature encoded in the filter. In the case of the radial input function, that idealized curvature is defined by two straight line segments that meet at the center of the filter. This approximation will work well as long as the angle between the line segments remains small, i.e. for lines of low curvature. When the filter is used to complete steeper curves however, errors will be introduced expecially near the edges of the filter. For example, Figure 12 (A) illustrates a parabolic curve that passes throught the center of the filter tangent to the main axis of the filter. The local orientation of the parabola continues to match fairly well with the radial orientation lines near the center, but becomes progressively worse towards the periphery.
Error due to mismatch between input orientation function and actual curves, for a radial (A) and parabolic function (C), showing that the error increases with distance from the center. Error due to the radial input function which predicts the peak response at the intersection of the linear extensions of the inducing edges (B), although the actual illusory boundary is perceived somewhat below that point. Error due to the symmetry of the parabolic input function (D) which predicts an equal response from relatable inputs a and b, as for non-relatable inputs a and g. All of these errors become negligable when the size of the filter is small relative to the curvature of the boundary to be completed.
Another problem with this input function is that for all orientations but the central one parallel to the principal axis of the cell, the path defined by the input function experiences a sharp kink, or orientational discontinuity at the center of the cell. For example, this input function would produce a peak response when the center of the cell is located at the intersection of the linear extensions of the two inducing edges, as shown in Figure 12 (B), rather than somewhat below, where the illusory contour is observed. This can be accounted for by the orientational tolerance function, as will be discussed below.
The parabolic input function aleviates this problem somewhat because it represents a better match to a smooth curve rather than a sharp vertex. The match can only be perfect however when applied to a parabolic input. Figure 12 (C) for example shows that a circular arc segment matches well to the parabolic function near the center of the filter, but again, the match becomes increasingly worse with distance from the center of the filter. The parabolic input function introduces a more serious error however, due to the fact that the optimal curves defined in the two lobes of the filter are actually not independent, but rather the optimal curve in one lobe is determined by the input present in the other lobe. This is shown in the example of Figure 12 (D), where the presence of an oriented input a in the left lobe of the filter defines a parabolic arc through the center of the filter which is consistent with a boundary that passes through a point b in the right lobe of the filter. The essential symmetry of the filter however would dictate that an input g in the right lobe would produce an equally strong response in this cooperative cell. This is a violation of the relatability criteria of Kellman & Shipley, because inputs a and g are parallel but misaligned, and therefore should not produce an illusory contour, while a and b should.
Some of the problems described above can be aleviated by the orientational tolerance function, whereby the input need not match exactly the function defined by the filter at each location in order to produce an appreciable response in the cooperative cell. For example, in Figure 12 (A) an orientational tolerance would allow a large response for this configuration despite a certain orientational mis-match at the distal parts of the filter. The tolerance function could also account for the lower curvilinear boundary in Figure 12 (B).The tolerance function however introduces errors of its own to the system. Figure 13 (A) shows how a large orientational tolerance in even the radial input function would allow two oriented inputs related by a shear misalignment to produce a response in the cooperative cell in violation of the relatability criteria, resulting in an illusory contour with a double inflection curve. The magnitude of the observed shear misalignment tolerance, or the separation within which boundary completion can continue to occur through such a shear misalignment is a function both of the orientational tolerance itself, and the spatial scale of the filter. For example, Figure 13 (B) and (C) show two filters of different sizes but with the same orientational tolerance, to illustrate how the tolerance scales with the size of the filter. Kellman & Shipley [31] have determined psychophysically that the tolerance to shear misalignment is very small, on the order of 15 minutes of arc. This would indicate that either the orientational tolerance, or that the scale of the cooperative receptive fields must be small. A small orientational tolerance would lead to the other problems described above.
Error due to orientational tolerance for the radial input function which allows non-relatable inputs to produce a response (A); for a given orientational tolerance, the error is larger for larger scale filters (B) than for smaller scale filters (C).
Another problem relating to large scale receptive fields is a spatial averaging effect. When the area sampled by the receptive field is much larger that the region within which the input boundary is to be found, that allows a number of spurious signals to contribute to the response of that same cell, reducing the signal to noise ratio of that cell, and possibly introducing additional errors to the response of that cell.
All of the errors in the receptive field response of the cooperative cell described above are exacerbated when the degree of curvature to be completed is large relative to the size of the filter, requiring the use of peripheral regions of the receptive field, where the function is no longer a good approximation to a short line tangent to the curve at the center of the filter. The use of multiple spatial scales is unlikely to help in this case because the location of the center of the illusory boundary is determined by the large scale cell which spans the gap between the inducers, and the smaller spatial scales serve only to bridge the smaller gaps between the inducers and that central point. If an error at the largest scale results in a bad location for that central point, the smaller scale filters will also bridge to that erroneous location. All of these problems can however be diminished to negligible proportions by the use of small scale filters to perform boundary completion. The difficulty with the use of small scale filters lies in the conjunctive constraint, which requires that the filters be at least as large as the gap across which completion is to occur
In the next section I will discuss the feasability of performing smooth long range boundary completion by way of multiple local interactions through short range receptive fields whose spatial extent can be considerably shorter than the gap across which the illusory boundary must form. I will later discuss how the conjunctive constraint can also be implemented as a global emergent property of the local interactions between individual small scale filters.
Boundary Completion by Elastic Interactions
Grossberg [17] discusses the distinction between structural and functional scales in neural architectures by showing how large scale spatial functions of a neural network model do not necessarily imply large scale structural or computational components. Instead, the large scale phenomena can arise as emergent properties of small scale local interactions. This principle is exemplified by the behavior of the wooden spline. In the early days of wooden shipbuilding, flexible wooden splines were used to interpolate smooth curves between fixed reference points in the hull plan by fixing metal spikes at those reference points and bending the splines around them to define the curvature of the hull between those points. The early Gestalt theorists [40, 67, 68] favored this kind of mechanical analogy as models of visual perception, because a mechanism like a spline does not explicitly represent any particular geometrical form, but rather, that form emerges as a natural result of simple elastic interactions between local elements in the spline, which together produce a unified global pattern.
The operation of the spline is analogous to the smooth boundary completion process seen in illusory contours, where the large scale global curvature of the spline arises from the multiple local elastic forces in the spline. This suggests that local forces between small scale cooperative filters can in principle also lead to large scale globally smooth boundaries.
Finite element model of elastic spline for smooth boundary completion.
The physical response of elastic bodies under stress can be analyzed in computer models using the technique of finite element analysis [64], whereby an elastic object is modeled by a number of discrete rigid elemental components interconnected by elastic forces. In the limit, as the size of the elements is reduced while increasing the number of elements, the behavior of the finite element model will approximate the smooth analog behavior of the elastic body. Figure 14 represents a finite element model of a wooden spline, consisting of a segmented chain made up of rigid links that are connected by pivots. Springs spanning across the pivots are attached to the links on either side of the pivot, and apply a rotational force at each pivot which tends to hold the pivot straight, i.e. the spring is in its resting state when the angle between the links at the pivot is zero, and applies a restoring force due to extension or compression as a function of the angle between the links, as suggested in the sketch at the lower right in the figure. The clamps at the ends of the chain represent inputs that constrain the location and orientation of the terminal links, and thereby define the boundary conditions for the relaxation of the chain between them. Whether the spring function is linear or nonlinear, the chain will always relax into a smooth curve between those inducing end points, as long as the spring function is smooth and monotonic increasing. Furthermore, the shape of the resulting curve at equilibrium corresponds to the optimal configuration of the entire chain which minimizes the total deviation from straightness summed over the whole length of the chain. The spring function here acts as an error function in the measure of deviation from straightness, and that error function in turn acts directly on the chain in the form of a physical torque. A simultaneous relaxation of all the spring forces against each other which serves to minimize the total torque at each spring therefore serves also to minimize the total deviation from straightness of the chain. For example, if the spring function is linear, then the equilibrium configuration will minimize the sum of the angles between the links of the chain summed over the whole chain; if the spring function is quadratic, the equilibrium configuration will minimize the sum of the squares of the angles between the links, summed over the length of the chain. This system therefore can minimize the total deviation of the chain from straightness defined by any chosen error criterion by simply matching the spring function to the desired error function.
Finite element analysis with a relaxation algorithm has been used for pattern recognition by matching an elastic template to the visual input and deforming that template in order to minimize the differences between it and the input. Pentland et al. [50] demonstrate a relaxation scheme that matches an input to objects such as the human body or face by distorting the stored elastic template in such a way as to minimize the difference between it and the input to be matched. They state, however, that this system "is not suited to generalized object recognition where the object is not known a priori". In other words, their scheme can recognize only those forms for which a finite element model is defined, and cannot generalize to novel objects.
A more general class of finite element relaxation models is represented by the snakes model [29] which defines an elastic finite element spline (the snake) to represent a generalized smoothly curving boundary. The snake is applied to a visual edge in an image, and a relaxation algorithm then causes the snake to drift into a configuration that minimizes the difference between the snake and the visual edge, while respecting the internal stiffness constraints of the snake. In the simulations, the snake is seen to drift into position along the edge, spanning across missing boundary segments, while maintaining a globally smooth curvature. The snakes model has more general application than a specific finite element model such as that proposed by Pentland et al. [50], because the object represented by the snake, a smooth continuous curve, is a generalized feature that is encountered frequently in most natural images. The snakes model however is still too specific for a biological model of vision because several key parameters, i.e. the length of the snake, the number of snakes, and the initial position of the snake, must be determined a priori by the user.
A model such as the BCS can be considered to be an even more general example of a finite element model, in which the entire visual field can be considered to be populated by "potential snakes", and when a visual input is presented, some of the "potential snakes" become active, and begin to interact with one another, grouping into smooth curves. This model has the distinct advantage that it can represent any number of edges of any length and at any location in the image, and that the selection of appropriate snakes occurs automatically based on the features detected in the input image.
I will now present a model of boundary completion that inherits many of the properties of the BCS model, but performs long range boundary completion by way of small scale local interactions within the cooperative representation of the type illustrated by the finite element model described above. First, I will describe the principles of operation of the model, and show how it can perform boundary completion in a manner similar to the BCS. I will then show how it satisfies the conjunctive constraint by pooling of activation between inducers.
I will now present a model of illusory boundary completion by way of a diffusion of oriented information, with the properties of a generalized finite element relaxation model that performs long range boundary completion by way of short range local interactions. I will first describe the one-dimensional behavior of the model in order to illustrate the operation of the conjunctive constraint by local interactions along the line joining two inducers, after which I will describe the full two-dimensional model, with computer simulations that show how the properties of this model are consistent with the psychophysical properties of illusory boundary completion.
The directed diffusion model inherits many properties of the BCS model [18]. As in the case of the BCS, the directed diffusion model acts in concert with the Feature Contour System (FCS) [21] which represents a brightness percept that is influenced by the action of the directed diffusion model. Like the BCS, the directed diffusion model receives oriented input from a layer of orientation selective cells in which the signal from opposite directions of contrast has been summed, in order to produce an oriented signal that is insensitive to the direction of contrast. The directed diffusion model also performs boundary completion between oriented inducers by way of a cooperative / competitive feedback interaction in the orientational representation, producing the final boundary percept by way of a parallel relaxation between all of the forces active within the system, in a manner consistent with the Gestalt notion of analog field-like interactions between local elements. In the BCS model, the cooperative and competitive interactions are computed separately in cooperative and competitive layers, with a feedback signal to close the loop. In the directed diffusion model, all cooperative and competitive interactions take place within a single layer of the model, whereby the feedback occurs by virtue of the fact that the cooperative cells receive input directly from other neighboring cooperative cells, rather than from a previous layer. The cooperative layer of the directed diffusion model therefore represents a lumped version of the entire CC loop in the BCS model.
The principle of operation of the directed diffusion model is an oriented diffusion of cooperative activation outwards from each visual inducer in a direction that is parallel to the orientation of that inducer. Figure 15 (A) depicts two horizontal inducers that stimulate horizontal cells in the oriented layer, which in turn send activation to horizontal cells in the cooperative layer. Interconnections between cells in this layer propagate the horizontal signal in a horizontal direction by way of oriented receptive fields, generating a line of horizontally oriented activation in both directions from each of the two inducers. A pooling of activation between the inducers, shown in Figure 15 (B), fills in the illusory boundary between the inducers.
Two dimensional directed diffusion model. The horizontal input from the oriented layer generates a horizontally oriented response in the cooperative layer (A), which diffuses outward in a horizontal direction. Pooling of activation (B) between the inducers generates the illusory boundary.
In the computer simulations that will be presented in the next section, a standard diffusion equation is employed, i.e. the rate of change of activation in any cell is proportional to the difference in activation between that cell and the average activation of its neighbors, so that activation will tend to spread and diminish from the point of input. An equation of this type is presented below.
In order to allow completion of curved boundaries, a certain amount of "crosstalk" must be allowed between adjacent orientations. For example in Figure 16 (A), a horizontal cooperative cell located at the dotted elipse must be able to complete the curved boundary between the two angled fields of activation due to the angled inducers. In the proposed model this is accomplished by yet another form of diffusion of activation, this time between adjacent orientations at the same location in the cooperative representation, as shown in Figure 16 (B) and (C). For example, an isolated horizontal inducer would stimulate the horizontal cooperative cell at that location, represented in the middle layer in Figure 16 (B), resulting in a horizontal diffusion to neighboring regions. The activation of this cell will also partially stimulate the adjacent orientations at the same location, as suggested by the arrows to the upper and lower layers of Figure 16 (C). These cells in turn will propagate a somewhat attenuated signal outwards by diffusion to neighboring regions at their respective orientations. The combined effect of this orientational diffusion is a "fanning out" of the oriented signal by diffusion from an isolated inducer, as indicated in Figure 16 (C), which shows the total diffusion from a single horizontal inducer for three near-horizontal orientations simultaneously. The effect of orientational diffusion is somewhat similar to the orientational tolerance of the BCS cooperative cell, as defined by the input orientation term, which is the feature of the cooperative receptive field that allows it to receive oriented input directly from orientations other than that of the receiving cell.
Orientational diffusion to adjacent orientations, producing orientational fanning.
In the system described so far, the broad receptive field profile of the cooperative cell will lead to a considerable spread of activation with distance from an inducer, resulting in broad, fuzzy illusory boundary formation. In the BCS model this kind of spatial blurring is compensated for by spatial competition in the CC loop which performs spatial sharpening along the peak of broad regions of activation. The competitive receptive field used in the BCS model is an isotropic on-center off-surround feedback interaction which performs spatial competition uniformly in all directions. A better choice for spatial sharpening is an anisotropic competitive interaction that suppresses activation only in a direction orthogonal to the represented orientation, so as not to counteract the competitive interaction in the oriented direction. This anisotropic receptive field can be combined with the cooperative receptive field in the simulations producing a compound receptive field that performs both cooperation (for directed diffusion) and competition (four boundary thinning) simultaneously in a single operation. Application of the anisotropic competitive interaction to the cooperative receptive fields produces the desired boundary completion and thinning of the boundaries, as will be shown in the section on simulation results.
The operation of this processing stage is reminiscent of the Rho Space model of Walters [65] which also attempts to explain the psychophysical phenomena of colinear grouping using a similar orientational cooperative / competitive architecture. The principal difference, as described by Walters herself, is that "the neural networks perform static, noniterative, discrete computations of the type that could be easily implemented in a clocked, discrete, parallel digital structure". Furthermore, Walters model does not make a distinction between the visible or modal percept, as modeled by the FCS, and the invisible, or amodal percept modeled by the BCS, and thus is a less complete model of the perceptual phenomena. Finally, the Walters model is insensitive to adjacent orientations, so that boundary completion around curves is not accounted for.
The purpose of the conjunctive constraint in the BCS model is to account for the phenomenon observed in visual perception that illusory boundaries only form inwards between two inducers; they will not develop outwards beyond an inducer into featureless space. If the large scale cooperative filters of the BCS model are to be replaced by multiple smaller scale filters, some means must be devised to communicate the conjunctive constraint between such cells across the gap between inducers in order to allow the growth of illusory boundaries only in those areas.
One solution to this problem is to allow "boundary completion" to extend outward from isolated inducers as in Figure 15 (A) and (B), but have such fields remain subliminal, or imperceptible, unless they are located between two inducers, and thereby receive activation from both directions, in which case they would become superliminal, or perceptible. This can be done, for example, by establishing a perceptibility threshold in the cooperative representation, in order to render the outward extensions subliminal and thereby imperceptible. An explicit perceptibility threshold however need not necessarily be postulated, since according to Grossberg [18] the grouping performed by the cooperative cells is by itself "invisible", i.e. produces no brightness percept, except by way of interaction with the FCS, where it may influence the diffusion of brightness signal. Since the outward extensions in Figure 15 (A) and (B) are "dead ends", i.e. they do not define an enclosed contour, the brightness signal in the FCS will be able to freely diffuse around them, and thus render them invisible. I will attempt to satisfy the conjunctive constraint therefore by defining a system which performs strong boundary completion inwards between inducers, and only weak boundary signals outward from inducers which generally remain invisible, so as to reproduce the appearance of the conjunctive constraint in conjunction with the FCS brightness diffusion. Sambin [58] proposes a similar scheme involving invisible fields of influence eminating from visual inducers, based on psychophysical observations of the distance dependence of interactions between visual elements.
There is one other aspect of this model that deserves mention at this point, concerning the issue of boundary effects at the edge of the simulation. Since the activation of a cell in this model is influenced by the activation in neighboring units, a special condition exists at the edge of the image, where the units have no neighbors. The purpose of this model is to perform illusory boundary completion between oriented inducers, and to suppress completion outwards from oriented inducers. For this reason, it is desirable to quench any stray outward boundaries such as those depicted in Figure 15, to prevent a boundary completion from occuring outward between the inducers and the edge. For this reason, the nodes at the edges of the simulation were clamped to zero activation. I will show however that in most cases this will have no practical influence on the performance of the model because, except in the case of zero decay, the outward diffusion decays away with distance from the inducer, and thus would have tapered off to nothing anyway.
Layout of one-dimensional directed diffusion simulations. Oriented edges in the image layer activate cells in the oriented layer by way of oriented receptive fields, and the oriented cells in turn stimulate cells in the cooperative layer at those locations. A diffusion of oriented activation in the cooperative layer then spreads the activation outwards from the inducers, as suggested by the simulated activation plot, shown above.
The following simulations represent a one-dimensional line of cooperative cells that lie in a straight line across one or more oriented inducers in a direction parallel to the orientation represented by those inducers, as illustrated in Figure 17. The oriented cells receive input from the image layer by way of oriented receptive fields, and communicate activation upward to the cooperative layer. The plot at the top of the figure shows the activation of those cooperative cells at equilibrium in response to the applied inputs. The cells at either end of the cooperative layer are clamped to zero activation. The cells within the cooperative layer receive input from both the oriented layer, and from each of their adjacent neighbors in the cooperative layer and respond according to the diffusion equation
|
(EQ 1) |
where xi represents the activation of the ith cooperative cell, A is the decay rate (a small positive constant), and Ii represents the input from the ith oriented cell. This equation is a diffusion equation because it tracks the difference between the activation of the cell xi, and the average activation of its two neighbors. In the absence of input Ii, if the activation xi of the ith cell is less than its neighbors, then its activation will grow, whereas if it is greater than its neighbors it will decay. This will tend to spread any pattern of activation (or clamped inactivation) outward from the source, as in the FCS diffusion.
Equation 1 is a linear differential equation that describes the activation of a neuron i as a function of the activation of its immediate neighbors i-1 and i+1. The use of a linear equation is questionable in the context of neural activation because it predicts that the neuron's activation can grow without bound as the input grows without bound. This choice, however, simplifies the analysis and reduces the computational load of the simulations. Furthermore, as I will discuss later, the results that follow should hold when a more realistic activation function is used. In fact, a similar computational simplification has been used to describe the diffusion process in the FCS model [21].
Directed diffusion simulation with a single input and various decay rates. The high activation due to the input located at the arrow spreads outward in both directions by diffusion. In the case of zero decay, the activation spreads all the way out to the clamped nodes at the ends of the simulation.
Figure 18 (A)shows the result of a simulation of this system with 100 nodes in the cooperative layer and a single oriented input of value 1.0, (indicated by the arrow in the figure) and a decay rate of 0.1. At equilibrium, a peak is observed at the location of the input, with a spread of activation in either direction, tapering away with distance from the inducer. If the decay rate is reduced to 0.01, as in Figure 18 (B), the activation spreads a greater distance from the inducer in both directions. With the decay constant reduced to zero, the activation spreads linearly in both directions to the clamped nodes at the ends of the display, which are clamped to zero activation. Examination of the equilibrium state of the diffusion explains this behavior. The equilibrium value of a node xi is given by
|
(EQ 2) |
This equation assumes that the neighboring node values xi-1 and xi+1 remain fixed, i.e. this is a local equilibrium. If = Ii zero, and the decay rate is zero, Equation 2 shows that such a node will equilibrate to the average value of its two neighbors, assuming that they themselves remain fixed. In fact, of course, a change in the value of the ith node xi will directly influence the both the activations xi-1 and xi+1. Nevertheless, at the global equilibrium, the local equilibrium condition of Equation 2 will hold for all nodes with respect to their neighbors. If all the nodes between the inducer and the clamped nodes at the ends in Figure 18 (C) satisfy this equation, then at the equilibrium of the whole system, their activations must define a straight line that performs an interpolation between the value of the node over the inducer and the activation of the clamped nodes, which are held at zero, as seen in Figure 18 (C). The deviation from this linear interpolation seen in Figure 18 (A) and (B) due to a non-zero decay rate leads to an overall lowering of activation as well as a faster spatial decay. This result suggests that in Figure 18 (A) the activation pattern due to the single input must spread all the way to the clamp points, i.e. node 99 must be greater than zero when full equilibrium is established, even though its actual value may be exceedingly small. In practice however the actual range of such spatial diffusion will be limited by the mathematical precision of the simulation, i.e. beyond a certain distance from the input the activation value becomes so small as to be rounded to zero in the computer, and further spread of activation is quenched. A similar roundoff would also occur in a biological implementation when the activations of the nodes that are far enough from the input become so low as to be lost in the noise of baseline activation. The range of diffusion in such a system therefore is mathematically infinite, but practically finite.
The set of simulations that follow explore the phenomenon of pooling of activation between pairs of inducers. Figure 19 (A), (B) and (C) plot the equilibrium values of cooperative cells in response to two inducers at the same separation but with decay rates set to 0.001, 0.0001, and 0.0 respectively. The pooling effect can be seen in the activations of nodes that are between the two inducers, which remain higher than they would be due to either inducer alone. The effect is particularly pronounced with low decay rates, and becomes "perfect pooling" with a zero decay rate, where the activations perform a linear interpolation between the activations at the two inputs. With a non-zero decay rate however the activations of the nodes between the inducers falls off non-linearly with distance from the inducers, reaching a minimum at the midpoint. This distance dependent decay determines the range across which boundary completion can occur. The directional diffusion model predicts that a shorter illusory boundary will also form outwards from the two inducers, as well as the stronger illusory boundary that forms inwards between the inducers, as seen in the decaying pattern of activation between the inducers and the clamped endpoints.
Effect of decay rate on the range of boundary completion between two inducers. With a zero decay rate, the boundary performs a linear interpolation between the activations at the two inducers, marked with arrows.
In the following section I will show that this model makes several predictions that are in qualitative agreement with psychophysical studies by Shipley & Kellman [59] and Banton & Levi [2], that indicate that the salience of the illusory boundaries in a Kanizsa figure diminishes as the distance between the inducers is increased. Furthermore, the salience increases with the "strength" of the inducing edges, whether measured as the length of the edge, or the contrast across that edge. I will now show that the directed diffusion model is consistent with these findings.
Figure 20 shows the behavior of the system in response to two inputs as the separation between the inducers varies in the range of 50, 40, 30, and 20 nodes for Figure 20 (A), (B), (C), and (D) respectively. The simulations show that the strength of the illusory boundary diminishes steadily with distance between the inducers. This can be seen by the diminishing height of the plot of activations between the inducers. This is shown graphically in Figure 21, which plots the equilibrium activation of the node that is midway between the two inducers, i.e. where the boundary is the weakest, for a set of simulations using the same parameters as above, as the separation between the inducers is varied.
Simulation of pooling of cooperative activation between two inducers, showing how the salience of the illusory boundary, indicated by the height of the activation plot between the inducers, diminishes with separation between the inducers.
Plot of "salience" of illusory boundary measured at its midpoint, as a function of the separation between two isolated inducers.
Psychophysical studies by Banton & Levi [2] indicate that the salience of illusory boundaries is also a function of the contrast of the input stimulus, the greater the contrast, the more salient the resultant illusory boundary. Greater contrast of the oriented inducer would translate to a larger magnitude in the response of the oriented cells of the system. This is illustrated in Figure 22 (A), (B), and (C), which plot the response of the directional diffusion system to two oriented inputs of magnitudes 1, 2, and 3 respectively, resulting in progressively greater activations along the entire length of the illusory boundary.
Effect of contrast, as represented by the input magniutde, on the salience of the illusory boundary, showing that the salience of the illusory boundary, as indicated by the height of the plot between the inducers, is increased as the input magnitude varies through one (A), two (B), and three (C).
Psychophysical studies by Shipley & Kellman [59] and Banton & Levi [2] also show that the salience of the illusory boundary is a function of the length of the oriented inducer, a longer inducer producing a more salient illusory boundary. Figure 23 (A), (B), and (C) show the response of the directional diffusion system in response to inputs of magnitude 1, but with a spatial extent of one, two, and three adjacent nodes respectively. Due to the pooling of activation between adjacent nodes, the local response to such adjacent inducers is not only spread out, but produces a peak of greater magnitude than the response to isolated inducers. This results in a more salient illusory boundary, or a greater range of boundary completion, as seen in the psychophysical studies.
Effect of spatial extent of inducer on the equilibrium activation value. The arrows indicate input signals with a width of one (A), two (B), and three (C) pixels, showing how the height of the plot, indicating salience, and the spatial extent of the diffusion, both increase with the width of the input stimulus.
Finally, psychophysical studies by Kojo, Liinasuo, & Rovamo [36] indicate that the same factors that were found in the other studies to increase the salience and range of illusory boundaries, i.e. greater length of the inducers, and less separation between inducers, were also found to increase the speed with which the illusory boundaries are perceived, presumably as a result of a faster propagation of the illusory boundaries between the inducers. This phenomenon is also consistent with the directional diffusion model because a higher peak of activation will result in a faster rate of diffusion, as shown in Figure 24. In this figure, inputs of various magnitudes are presented for a fixed interval of time, 100 time units in this case, rather than allowed to run to equilibrium, after which time the range of the spread of activation was measured at some arbitrary perceptibility threshold (50 was chosen in this case) indicated by the horizontal dotted lines in the figure. It can be seen that the smaller magnitude inputs have not spread as far in these 100 time units as have the larger magnitude inputs, indicating that the rate of diffusion of activation is a function of the magnitude of the input. Since we have shown in the previous experiment that a longer input stimulus also results in a greater magnitude of activation, this result generalizes to the conclusion that longer inputs will also spread activation faster than shorter inputs of the same magnitude, as was found in the psychophysical studies.
Effect of input magnitude on speed of diffusion. In these simulations the system was not allowed to run to equilibrium, but instead only to 100 iterations. In that time, the range of the illusory boundary was measured at a threshold of 50, showing that a larger contrast input, indicated by a larger input magnitude, propagated a greater distance in the same time than a smaller input magnitude, through a range of 10 (A), 20 (B), and 30 (C) arbitrary units corresponding to spiking frequency.
The foregoing simulations have revealed that the directional diffusion model can explain certain significant properties of illusory boundary formation. Specifically, the directional diffusion model shows that global boundary completion can be achieved by way of purely local interactions between units, as can the conjunctive constraint, so that the large scale receptive fields seen in the BCS model are not required to reproduce these large scale phenomena. Furthermore, the fine-grained architecture of the directional diffusion model avoids some of the problems that result from the large and complex receptive fields of the BCS. The directional diffusion model is also consistent with a body of psychophysical data on illusory boundary formation, specifically that illusory boundary salience, completion range, and completion speed are a function of inducer length, inducer magnitude (contrast), and inducer proximity. I will now demonstrate how the directed diffusion model can be extended to two dimensions, and will present simulations of the two dimensional system.
The architecture for the two-dimensional simulation of the directed diffusion is depicted in Figure 25. An input image is projected onto the image layer which consists of a two dimensional matrix of cells Ixy. Cells Oxyr in the oriented layer receive activation from the input layer by way of oriented receptive fields centered at location (x,y), and with orientation r. This is accomplished by way of spatial convolution of the input image with the oriented filters, such that
|
(EQ 3) |
where Fijr is the oriented edge detector. The absolute value function in this equation implements the insensitivity to direction of contrast proposed for the BCS model.Twelve orientations were represented in this manner in the simulations (only four are shown in the figure for simplicity) which corresponds to twenty-four possible orientations in a full contrast image. In the directed diffusion simulation, this oriented image convolution was bypassed, and instead, a simulated oriented image was applied directly to the oriented layer, in the form of individual points of oriented activation in the oriented layer, simulating the response of the oriented layer to an isolated oriented edge in the input layer. While in nature, such an isolated oriented signal would never be found, this approximation was sufficient to demonstrate the properties of the orientational interactions due to directed diffusion.
Directed diffusion simulation architecture. Oriented edges in the image layer stimulate activation in oriented cells of the oriented layer by way of oriented receptive fields. The oriented signal is then propagated up to cooperative cells at the same location. The cooperative cells also receive input from adjacent cells of like orientation in the same cooprative layer, resulting in a feedback interaction within the cooperative layer.
Oriented cell activation is collected by the left and right lobes of the cooperative cel receptive fields, described by the functions L and R as defined below.
|
(EQ 4) |
|
(EQ 5) |
Each of these terms is made up of a product of two Gaussian functions. The first Gaussian, referred to as the orientation term, is a function of deviation from the central orientation r, with a standard deviation of sr. This function produces a peak along a line of orientation r. The second Gaussian, known as the radial term, decays with increasing spatial separation from the center of the filter. The product of these two terms defines a two-dimensional oriented receptive field that extends outward in the oriented direction.
The only difference between the equations for L and R is in the exponent of the orientational Gaussian term, which determines the direction of the receptive field, such that the orientations of L and R differ by p. The arctangent function used in these equations is the two argument atan2 function, as defined by :

A cooperative cell Cxyr at location (x,y) and of orientation r receives input both directly from the oriented cell Oxyr at location (x,y) and orientation r, as well as through the "left" and "right" lobes of its bipolar receptive field Lxyr and Rxyr from adjacent cooperative cells of the same orientation:
|
(EQ 6) |
|
(EQ 7) |
The resultant activation of the cooperative cell is governed by the differential equation
|
(EQ 8) |
In this equation, the first term is the passive decay term governed by the decay rate A, a positive constant. Following this is a difference of two terms that represents the difference between the activation of the cell Cxyr and the average activations in the regions of the cooperative layer governed by the left and right lobes of the cooperative receptive field, where N is a normalizing constant. In the absence of direct input from the oriented layer, this differential equation tracks the difference terms to equilibrate at an activation between the average activations in the two lobes. As for the 1-dimensional case, if the decay term A is zero, the equilibrium point is exactly midway between the neighboring activations, resulting in a linear interpolation. The last term in Equation 8 is the direct input from the oriented layer, which biases the equilibrium state towards the pattern of activation present in the oriented layer. In the computer simulations, the equilibrium value of Equation 8 was used, which is in the form
|
(EQ 9) |
I have mentioned earlier the necessity for a certain cross-talk between adjacent orientations in order to account for curved boundary completion, as suggested in Figure 16. In the two-dimensional simulations this was accomplished by a certain diffusion of the oriented filter response between adjacent orientations. First, the filter response is computed for each oriented filter as described in Equations 6 and 7, i.e. each oriented filter receives input exclusively from like oriented cooperative cells. Then, the filter response is modified by an orientational blurring with adjacent oriented responses at that same location, using the formulae
|
(EQ 10) |
|
(EQ 11) |
which were evaluated at each iteration, where f is a small positive diffusion function. In this manner, for example, a strong horizontal oriented response would stimulate a weaker response in both adjacent orientations, and in the next iteration those adjacent responses would diffuse to more distant orientations with still further diminished magnitudes, as suggested in Figure 16.
The spatial competition required for thinning of the illusory boundaries was accomplished by replacing the orientational Gaussian term of Equations 12 and 13 with an orientational difference of Gaussians, which produced inhibitory side-lobes flanking the excitatory orientational peak on either side. In other words, equations 12 and 13 were replaced by
|
(EQ 12) |
|
(EQ 13) |
The mathematical shape of these receptive fields is depicted in Figure 26 (A), where the background gray denotes zero filter values, while lighter shades denote positive values, and darker shades denote negative values. The figure shows both the "left" and "right" receptive fields for a horizontal cooperative cell in one plot, to indicate the total field of influence of a single cell. In the computer simulations, the actual field used was further quantized as illustrated in Figure 26 (B). Note that the two pixels that are horizontally adjacent to the central pixel of the filter have near zero values. This is an effect of the quantization, as shown in Figure 26 (C), which depicts a magnified section of that field just to the left of the center of the function. The central pixel and the pixel to its left average between both positive and negative functional values, resulting in a near zero sum. It is only when the width of the central excitatory lobe approaches the width of a pixel that the resultant value averaged over the area of the pixel becomes significantly positive. As will be seen in the next section, this quantization error has a small influence on the simulation results, without however affecting the general properties of the model.
Receptive field profile for directed diffusion cooperative cell with inhibitory side-lobes.
Figure 27 shows the results of a computer simulation of the directed diffusion system, where the activation in the cooperative layer is plotted in response to two horizontal inputs at various separations. The level of cell activation is denoted by the darkness of the gray shade in these plots, with darker pixels representing higher activation. Figure 26 (A), (B) and (C) show the equilibrium activations for inputs separated by 10, 15, and 20 pixels respectively, with a decay rate value of A = 0.01. The location of the inputs is visible as a pair of very dark pixels, surrounded by horizontal regions of activation that they stimulate. Note that the pixels immediately adjacent to the most active nodes over the inducer exhibit less activation than pixels that are somewhat farther away. This is an effect of the quantization discussed in the previous section, whereby the cooperative receptive field values are near zero immediately adjacent to the central pixel of the filter. In a finer scale simulation this effect would be absent. Note that the salience of the illusory boundary as measured by the activation levels of the cells located between the inducers diminishes as the separation between the inducers is increased, as observed in the psychophysical data reported above. Also, boundary completion is seen to occur over distances greater than the size of the receptive field of the cooperative cell, which spans five pixels in each direction from a central pixel.
Directed diffusion system response to two parallel oriented inputs as a function of separation between the inducers.
Figures28 (A), (B) and (C) show the response of the cooperative cells to two oriented inputs as a bending misalignment is applied. The strength of the illusory boundary is seen to diminish smoothly as a function of the angle of the misalignment. According to the relatability criteria of Kellman & Shipley [31] the inducers remain relatable until the angle between them becomes 90 degrees, at which point they become abruptly non-relatable. In this model the illusory boundary fades away continuously with increasing angle between the inducers.
Directed diffusion system response to two oriented inputs as a function of angle between the inducers.
Figure 29 (A), (B) and (C) show the response of the cooperative cells to two oriented inputs as a shearing misalignment is applied, i.e. as the parallel inducers are shifted laterally relative to one another. The shear varies from zero in Figure 29 (A), two pixels in Figure 29 (B), and three pixels in Figure 29 (C). The strength of the illusory boundary is seen to diminish rapidly even at very small shear values, as noted by Kellman & Shipley [31]. The illusory boundary disappears completely at a shear of only three pixels.
Directed diffusion system response to two parallel oriented inputs as a function of stagger, or lateral shift between the inducers.
I have described a model that is similar to the BCS, that performs two-dimensional boundary completion between oriented inducers in a manner consistent with psychophysical observations of the formation of illusory boundaries. The mechanism I propose generates these illusory boundaries as an emergent property of local interactions between individual elements in the system in a manner that is consistent with the field-like interactions suggested by Gestalt psychology. In the next chapter I will discuss the issue of image vertices where multiple orientations are present at the same spatial location. Both the BCS and the directed diffusion model have difficulty with this condition, because of competition between orientations at a single location. The model I will present in the next chapter will address these issues, and will lead to many more predictions of perceptual phenomena.
In chapter 2 I presented a neural network model that performs illusory boundary completion in a colinear, or smooth curvilinear manner that is consistent with observed psychophysical data. I have shown earlier in chapter one however that certain visual illusions indicated that boundary completion can also occur through image vertices consisting of some number of edges that meet at a point. In this context, the colinear boundary can be seen as a special case of a vertex composed of two edges separated by 180 degrees which meet at the center of the vertex. In this chapter I will show how the mechanism of directed diffusion can be extended to perform both colinear and vertex completion with a single mechanism by way of harmonic interactions in the orientational representation that promote orientational periodicity of edges that meet at a vertex. I will show that this mechanism predicts many properties of vertex completion that are consistent with perceptual grouping. Finally, I will show that this model leads to compression of information of the type discussed by Attneave.
The directed diffusion model presented in the previous chapter was designed specifically to perform boundary completion along a straight line or curved edge. Specifically, oriented input in one lobe of the bipolar receptive field propagates oriented activation in the direction of the other lobe. The shape of the cooperative receptive field can thus be considered as a "template" of the idealized colinear vertex. A similar observation can be made about the BCS model.
In order to generalize colinear boundary completion to perform vertex completion, cells can be defined whose receptive fields represent different vertex types defined by the intersection of one, two, three, or more edges that intersect at a point in a variety of different angular configurations. For example, Figure 30 (A) illustrates a cell that represents an "L" vertex defined by the intersection of two edges separated by 90 degrees. In this vein, Grossberg & Mingolla [19] propose such specialized receptive fields for boundary completion through image vertices. A large number of vertex patterns would have to be represented at each spatial location, and each of these receptive field combinations would have to be reproduced at all orientations, just as in the case of the colinear cell. Figure 30 (B) illustrates the combinatorial explosion that can result from a representation of every combination of any number of edges at all possible orientations. It may be possible to alleviate this problem by compression or coarse coding, e.g. by allowing each pattern to represent a range of similar patterns, in the same manner that the colinear cell in the directed diffusion model represents a range of smooth curvilinear edges besides the strict colinear one of a perfectly straight edge. Even with coarse coding however, the representation of vertices by way of specific cells with hard wired receptive fields represents a combinatorial expansion in the representation, similar to the curvature representation of the Zucker [71] model.
The representation of multiple orientations at a single spatial location by way of specialized receptive fields, for example a right angled vertex detector (A), and a combinatorial assembly of other vertex representations (B).
Heiko Neumann [44] proposes a more general solution in the form of a "rosette" of receptive fields at every orientation, as shown in Figure 31, with a separate cell body receiving input from each of the oriented monopolar receptive fields. The conjunctive constraint is implemented in this model by a requirement that at least two cells must receive oriented input in order to allow boundary completion through the vertex. This mechanism is able to perform boundary completion around vertices of any combination of two or more orientations. One problem with this model is that it does not account for boundary completion through a vertex defined by a single oriented edge, of the type shown in Figure 9 (A), where a boundary is seen to form between two adjacent dots, but it does not extend beyond those dots, so that the vertex defined at each dot consists of one edge only. Furthermore, in Figure 9 (B), (C), and (D), this model would predict boundary completion to all adjacent dots, not just the ones that are immediately adjacent, so that the vertices in Figure 9 would all appear as stars made up of a great number of orientations. A full model of vertex completion would have to account for the distance dependent behavior discussed in Chapter 1.
Architecture of the generalized cooperative cell, or "rosette" of cells around a central vertex, each of which received oriented input only from one direction and orientation from that vertex, so that different vertex configurations are represented by particular patterns of activation in the ring of cells.
An advantage of Neumann's "rosette" model is that orientational combinations are no longer encoded by hard-wired receptive fields, but rather they are represented by patterns of activation in the ring of oriented cells. Boundary completion in this architecture therefore consists of the completion of particular patterns of activation in the oriented cells. The question remains however which of the many possible particular vertex patterns should be completed for a specific input stimulus, and what computational architecture might be responsible for performing such a completion.
An interesting view on this problem can be found again in Attneave's discussion of information theory [1]. Attneave suggests that the Gestalt principles of perceptual grouping represent patterns of regularity or redundancy in the visual world. In the context of Neumann's rosette model, regularity can be seen in the orientational representation as a periodicity of the oriented signal around the ring of cells. This periodicity can be encoded by a Fourier decomposition of the pattern of activation in the orientational representation, as suggested by Figure 32. For example, a vertex consisting of four edges equally spaced in a "+" vertex would be represented by the fourth coefficient of the Fourier series, because the edges subdivide the full circle into four equally spaced segments, representing an orientational frequency of four edges per cycle. A three way vertex with arms separated by 120 degrees would be represented by the third coefficient. A straight line through the vertex would be represented by the second coefficient, corresponding to the periodicity of two edges separated by 180 degrees. A vertex defined by a single orientation extending in one direction from the vertex represents the first coefficient of the Fourier series. The zeroth coefficient represents the DC term of the Fourier transform, i.e. a feature consisting of orientations in all directions simultaneously representing the overall magnitude of the oriented signal at all orientations. This coefficient corresponds to a small circular dot centered at the vertex, which would stimulate cells of all orientations simultaneously. The first five harmonics from R0 to R4 and the vertices that they represent are depicted in Figure 32. These harmonics represent the first five terms of a circular harmonic series that can be used as a descriptor for the configuration of lines joined at a vertex. One advantage of a harmonic representation is that the translation invariance of a linear Fourier representation corresponds to a rotation invariance in circular Fourier representation, so that each coefficient represents the corresponding vertex pattern in a rotation invariant manner.
Decomposition of orientational combinations into a Fourier series of orientational frequency showing the Fourier coefficients, the corresponding pattern of orientational activity, the vertex pattern represented by that coefficient, and the symmetry of each harmonic.
The rotation invariance of this representation also represents a form of information compression, owing to the fact that each coefficient, represented by a single value, corresponds to an entire pattern of orientational activity, and the rotation invariance means that all rotations of that same pattern correspond to the same Fourier coefficient. Notice also that the periodicity in the orientational representation corresponds to a measure of the symmetry in the represented pattern, as shown on the right in Figure 32. This is consistent with Attneave's observation that symmetry represents a redundancy in the visual world for the purpose of information compression.
Other vertex combinations besides those represented by the first five harmonics of the Fourier series can be obtained by combinations of the harmonics, as is true for any Fourier representation. Figure 33 shows the harmonic combinations that represent the "L" vertex (A), the "V" vertex (B), and the "T" vertex (C). For example, by analysis or simulation, one can see that the "L" vertex results from a combination of R1, R3 and R4. Note the similarity to the "V" vertex, which is composed of R1, R2 and R4. In fact, the relative magnitudes of the second and third coefficients will determine the exact angle between the two arms of this figure.
Construction of additional vertex types by combinations of the harmonics. The "L" vertex is composed of the first, third, and fourth harmonics (A), the "V" vertex is composed of the first, third, and fourth harmonics (B), and the "T" vertex is composed of the first four harmonics (C).
A number of different mechanisms can be proposed to implement a Fourier filtering of the orientational signal, which requires a boosting of the orientational periodicity in the oriented signal in the cells of the rosette of cooperative cells. Grossberg [17] has shown that periodicity in a neural network architecture can be achieved by a recurrent cooperative-competitive field of neurons connected by spatial receptive fields with an on-center, off-surround profile.
An alternative scheme is suggested by the fact that a simple harmonic resonance, such as an acoustic wave in a resonant cavity, automatically performs a Fourier filtering of exactly the sort required in this model. Portnoff [52] has shown that sound waves in a uniform lossless tube satisfy the following pair of equations
|
(EQ 14) |
|
(EQ 15) |
where p = p(x,t) is the variation in sound pressure in the tube at position x and time t, u = u(x,t) is the variation in volume velocity flow at position x and time t, r is the density of air in the tube, c is the speed of sound, and A = A(x,t) is the cross sectional area of the tube. Rabiner and Schafer [54] show how this equation can be solved to derive the frequency response of the uniform lossless tube, i.e. the ratio of input to output volume velocities as a function of frequency. Figure 34 (A) illustrates this frequency response function, showing periodic peaks corresponding to the poles of the equation, where the response function goes to infinity. These peaks indicate the harmonics of the tube, the lowest one being the fundamental, i.e. the frequency at which one half wave exactly fits in the tube. In an actual physical tube this response function will be somewhat different, as shown in Figure 34 (B), which shows the frequency response function for a uniform tube with yielding walls, friction, and thermal loss. The periodic peaks are still in evidence, although they no longer go to infinity, and the whole function remains within some bounding envelope that diminishes with increasing frequency. The reason why the envelope function decreases with frequency is that the radiation impedance, or resistance to oscillation, increases with higher frequencies. The negative portions of the frequency response in Figure 34 (B) are frequencies at which oscillation is actively suppressed by the harmonic properties of the tube.
Frequency response of a uniform lossless tube showing harmonic peaks corresponding to the fundamental frequencies (A), and a similar response in a lossy tube where the higher harmonics are suppressed by the impedance of the tube (B). The computer simulations employed the discrete approximation shown in (C).
The principles of harmonic resonance are common to oscillations of sound waves in a resonant cavity, elastic vibrations in a solid object, alternating electric current in an electrical conductor, electromagnetic fields such as a microwave maser or visible light in a laser, and even chemical harmonic resonances in a reaction-diffusion system. In fact, harmonic resonances are a property of all physical systems.
There are several different ways in which harmonic resonances can be supposed to occur in a neural system. Grossberg & Somers [19] show how the oscillatory firing of cortical neurons can be modeled by excitatory and inhibitory interactions between connected neurons in a variety of neural architectures, including the BCS model of cortical visual processing. As an alternative, synchronous firing has also been observed in neurons that are connected by way of gap junctions - specialized synapses which physically connect the cytoplasm of adjacent neurons allowing a direct flow of ionic current between them, producing a direct electrotonic coupling with a transmission speed that is orders of magnitude faster than synaptic transmission. Kandel and Siegelbaum [26] have shown that transmission across electrical synapses is very rapid, and that electrical synapses can cause a group of interconnected cells to fire synchronously. A slight propagation delay between the cells in the syncyctium due either to the transmission delay through the gap, or the natural capacitance and inductance of the neurons, would allow for a phase shift between the synchronously firing cells and thereby establish standing waves of electrical activation consisting of periodically alternating regions of activation and quiescence.
Whatever the exact mechanism responsible for the harmonic resonance, the properties of such resonating systems are mathematically related, and can therefore be modeled independent of the specific implementation involved.
A property of these harmonic resonances is that they are governed only by simple interactions between local elements, and yet these systems are capable of producing a wide array of spatial patterns. The scale of the interaction between vibrating elements, local molecular forces, is much smaller than the size of the emergent waveforms. A principal property of harmonic systems is a tendency to form regular periodic patterns that are integer multiples of some fundamental wavelength, which is determined by the dimensions of the physical system. For example, the harmonic frequencies of a resonant acoustic cavity establish patterns of standing waves that subdivide the cavity into integer numbers of equal intervals, as shown in Figure 35(A).
Acoustical harmonics in a linear tube(A), and a circular tube(B), showing how alternating regions of high and low amplitude oscillations subdivide the cavity into integer numbers of equal intervals. In the linear tube the harmonic pattern is fixed by the boundary conditions at the ends of the tube, while in the circular tube the harmonic patterns can be established at any orientation.
Whereas cavities can be tuned to any fundamental frequency, in the case of an enclosed circular tube, the harmonics subdivide the interval from 0 to 2p into integer multiples of the full circle, as shown in Figure 35 (B), which is no longer an arbitrary measure. While the spatial pattern of nodes is fixed in a linear tube by the boundary conditions at the ends of the tube, in the circular tube a harmonic pattern can occur at any rotation.
I will now present a model of orientational harmonics which performs a Fourier filtering of the orientational signal in the cooperative rosette by way of harmonic resonances in those cells. The effect of this filtering is to complete and regularize any periodicity that is inherent in the activation pattern of the rosette.
Architecture for orientational harmonic simulation showing a rosette of cooperative cells Ci that receive input from both oriented cells Oi and from cooperative receptive fields Li that receive input from the cooperative layer.
The general architecture for the orientational harmonic model is depicted in Figure 36. A ring of N cooperative cells Ci receive input from N oriented cells Oi from the oriented layer, as well as N inputs from the cooperative receptive field response Li which receive input from regions of the cooperative layer by way of monopolar receptive fields. These receptive fields are defined exactly as in the directed diffusion model Li and Ri, in Equations 12 and 13. The final pattern of activation of the cooperative cell is also influenced by harmonic oscillations within the cooperative rosette, which can be calculated as another input Hi to the cooperative cell. The activation of the cooperative cell is therefore governed by the differential equation
|
(EQ 16) |
The inputs to this equation are Oi, the oriented signal, and Li is the cooperative activation in an adjacent neighborhood as sampled by the cooperative receptive field Li by the equation
|
(EQ 17) |
The harmonic oscillations within the ring of cooperative cells perform a Fourier filtering of the cooperative signal within the rosette by the filtering function depicted in Figure 34 (B). In the simulations, this function was approximated by a finite comb function with values of unity at the harmonic frequencies, and zeros elsewhere, as shown in Figure 34 (C). A Fourier filtering with this simplified function can be equivalently calculated by convolution with a series of sinusoids Fj with orientational frequency j defined by
|
(EQ 18) |
for filter Fj of the harmonic j of a total of M harmonics, calculated for N values. A zeroeth component filter is also constructed in order to measure the DC component of the orientational signal. The filter for the zeroeth harmonic is defined by
|
(EQ 19) |
where c is a positive constant. These filter profiles look exactly like the waveforms sketched in Figure 32. The filters are convolved with the N cooperative cell values of the N orientations producing a set of response values r given by
|
(EQ 20) |
for each harmonic coefficient j and for each oriented location i. A total response R is computed for each harmonic by summing over the individual responses at each orientation using the formula
|
(EQ 21) |
The magnitude of this coefficient for each harmonic j represents the magnitude of the response of the system to this harmonic. For example, a large value of R2 would indicate the presence of a strong second harmonic component in the pattern of orientations in the cooperative cells.
So far what has been described is a feedforward process to detect the presence of various orientational harmonics in the cooperative signal, much like a Fourier analysis of that signal. The harmonic resonances in the ring of cells also influence the resultant pattern in those cells by constructive and destructive interference between competing harmonics in the representation. This was calculated by summing the total harmonic contribution Hi of all the various harmonic responses for each cell in the ring of cells, using the formula
|
(EQ 22) |
<