Walky Talky: Case Notes 2

« The Fat Lady Sings | Main | Pros and Cons »

November 13, 2006

Case Notes 2

The human visual system is pretty remarkable in various ways, many of which probably didn't become altogether apparent until people started trying to reproduce them technologically. Even defining just what constitutes vision is not always easy; a reasonable working definition (due to important early machine vision researcher David Marr) holds vision to be the process of forming a description of a scene -- the objects present, their surface qualities, the lighting conditions, etc -- given an image of it (eg, on the retina of the eye). This turns out to be extremely difficult because only a fraction of the relevant information is directly present in the image; the rest must be deduced, with the deductive process recruiting all sorts of domain-specific knowledge.

Marr argued that, rather than starting with the particular ways vision is implemented by different existing creatures, as a biologist or neuroscientist might, it would make more sense to try to map the space of potential solutions to the vision problem in the abstract and then consider the implementation details on the basis of that higher level understanding. This approach can lead to models that are rather far removed from the physiological mechanisms of seeing.

The subset of vision under consideration here is the perception of motion, and in particular that of second-order motion, where the movement is carried not by luminance boundaries (the edges of shapes, let's say) but by changes in texture, contrast, flicker etc. (See here for an example that may clarify the idea.) Such motion is interesting because, while people readily perceive it, some standard models are either incapable of explaining it or require significant fudging to do so. (The latter does not necessarily invalidate them -- nature fudges stuff all the bloody time -- but it at least casts doubt.)

Models of motion perception consider the input across the visual field over time, typically represented as a space-time diagram (most often with only one spatial dimension, like a single scan-line, so the diagram can be nicely 2D). Motion is recognised when neurons are stimulated by particular patterns in this space-time domain.

A simple model of how this might work is the Reichardt correlation model, which is actually pretty similar to the Jeffress model for hearing mentioned in the previous case notes: a particular neuron -- or, less prejudicially, a filter -- tuned to motion in a particular direction, is fed by stimuli from different points in the visual field via various delay lines; when the stimuli arrive simultaneously it registers motion of the corresponding velocity.

There are many problems with this model that we shall ignore, but one important one is that the input stimuli are unlikely to be conveniently arranged for the viewer's benefit: there will often be spatial periodicity as well as motion, in which case there is more or less limitless scope for aliasing. To reduce this, we incorporate some form of directional differencing -- ie, subtracting the effects in one direction from those in the opposite direction, cancelling periodic aspects and leaving (more of) the things we're interested in.

Given arbitrary connectivity within such a setup -- which is to say, imagining that each such motion-sensing filter can be tuned to any combination of temporal and spatial correspondences -- we could probably identify most forms of motion. However, nature tends to be rather parsimonious, while artificial implementations are always constrained by computational complexity, so we really need a higher-level model of the sort of pattern such filters might reasonably pick up. (The directional differencing mentioned above is a rudimentary example of such a pattern.)

A common model (or really the basis for a whole range of them) is the motion energy model, in which a pair of filters, oriented differently in space-time -- assume orthogonal for simplicity -- are combined. The motion energy is then the sum of the squares of the differences in the two directions. Often this model is transformed Fourierwise into the frequency domain, but either way it can't spot second-order motion.

A possible response to this failure is to posit additional parallel perceptual channels tuned to the different kinds of motion stimulus: one vision subsystem identifies the first-order movement, while another is tuned to second-order. Such a model can, with appropriate dicking around (surely "rectification"? -- Ed), be persuaded to work, but it reeks of epicycles.

A preferable model would recognise both first- and second-order motion in a single processing pathway, and this can be managed by having the sensors tuned to gradient differences rather than luminance differences. In mathematical terms, motion in this model is detected from the ratio of partial derivatives of signal intensity with respect to space and time.

Now, this is a dismayingly abstract model, and it gets worse: to get around the perceptual singularities that would be introduced whenever the denominator is zero, it is necessary to add in higher-order derivatives both above and below the line. Such hackery makes for a fairly robust model, able to correctly identify both kinds of motion and even make a good stab at such degenerate inputs as simultaneous overlapping coherent motion in different directions (which to a human observer suggests transparency: seeing through a layer of things moving in one direction to another layer moving in the other), but it's asking a lot of the visual cortex to be constantly doing all that partial differentiation.

As it turns out, at least some of this mathematical complexity can be rendered into more physiologically-plausible terms as convolution kernels -- the space-time patterns mentioned earlier to which neurons might be tuned. There is experimental evidence that brain cells can be receptive to input patterns cognate with the multiple orders of partial derivatives in this gradient model.

Still, there do seem steps along the way where this model is far too enslaved by its own -- inevitably approximate -- mathematics. Nature, as we've already observed, is profoundly pragmatic, and she'll perpetrate any kind of sleazy kludge in the pursuit of evolutionary fitness. Why on Earth would she piss around with the human formalities of arithmetic, let alone calculus, when a bunch of plastic lookup tables might achieve the same results?
Posted by matt at November 13, 2006 05:19 PM

Comments

Something to say? Click here.