We only perceive a small fraction of the information that enters our eyes. What information is represented and what is discarded? Freeman and Simoncelli (2011) introduced a neat method of psychophysically testing image-based models of visual processing. If images that produce identical model responses also appear identical to human observers, it implies that the model is only discarding information that does not matter for perception (and conversely, retaining image structure that matters). The images are metamers: physically different images that appear the same (the term originates in the study of colour vision).
Our latest paper extends this approach and sets a higher bar for model testing. In the original study, Freeman and Simoncelli synthesised images from a model of peripheral visual processing, and showed that observers could not tell two synthesised images apart from each other at an appropriate level of information loss (in this case, the scaling of pooling regions spanning into the visual periphery). However, observers in these experiments never compared the model images to the original (unmodified) images. If we’re interested in the appearance of natural scenes, this is not a sufficient test. To take one extreme, a “blind” model that discarded all visible information would produce images that were indiscriminable from each other, but would no doubt fail to match the appearance of a natural source image.
We extend this approach by having observers compare model-compressed images to the original image. If models are good candidates for the kind of information preserved in the periphery, they should succeed in matching the appearance of the original scenes (that is, the information they preserve is sufficient to match appearance).
We apply this logic to two models of peripheral appearance: one in which high spatial frequency information (fine detail) is lost in the periphery (simulated using Gaussian blur), and another in which image structure in the periphery is assumed to be “texturised”. We had observers discriminate images using a three-alternative temporal oddity task. Three image patches are presented consecutively; two are identical to each other, and one is different. The “oddball” could be either the original or the modified image. The observer indicates whether image 1, 2 or 3 was different to the other two. If the images appear identical, the observer will achieve 33% correct, on average.
Our results show that neither a blur model nor a texture model are particularly good at matching peripheral appearance. Human observers were more sensitive to natural appearance than might be expected from either of these models, implying that richer representations than the ones we examined will be required to match the appearance of natural scenes in the periphery. That is, the models discard too much information.
Finally, we note that appearance matching alone is not sufficient. A model that discards no information would match appearance perfectly. We instead seek the most compressed (parsimonious) model that also matches appearance. Therefore, the psychophysical approach we outline here must ultimately be paired with information-theoretic model comparison techniques to adjudicate between multiple models that successfully match appearance.
Wallis, T. S. A., Bethge, M., & Wichmann, F. A. (2016). Testing models of peripheral encoding using metamerism in an oddity paradigm. Journal of Vision, 16(2), 4.
Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature Neuroscience, 14(9), 1195–1201.