New Paper

Open peer review!

Our manuscript “Detecting distortions of peripherally-presented letter stimuli under crowded conditions” (see here) has received an open peer review from Will Harrison. Thanks for your comments Will! They will be valuable in improving the manuscript in a future revision.

Hurrah for open science!*

* as suggested here, from now on “Open Science” will just be called “Science”, and everything else will be called “Closed Science”.

New paper: testing models of peripheral encoding using appearance matching

We only perceive a small fraction of the information that enters our eyes. What information is represented and what is discarded? Freeman and Simoncelli (2011) introduced a neat method of psychophysically testing image-based models of visual processing. If images that produce identical model responses also appear identical to human observers, it implies that the model is only discarding information that does not matter for perception (and conversely, retaining image structure that matters). The images are metamers: physically different images that appear the same (the term originates in the study of colour vision).

Our latest paper extends this approach and sets a higher bar for model testing. In the original study, Freeman and Simoncelli synthesised images from a model of peripheral visual processing, and showed that observers could not tell two synthesised images apart from each other at an appropriate level of information loss (in this case, the scaling of pooling regions spanning into the visual periphery). However, observers in these experiments never compared the model images to the original (unmodified) images. If we’re interested in the appearance of natural scenes, this is not a sufficient test. To take one extreme, a “blind” model that discarded all visible information would produce images that were indiscriminable from each other, but would no doubt fail to match the appearance of a natural source image.

We extend this approach by having observers compare model-compressed images to the original image. If models are good candidates for the kind of information preserved in the periphery, they should succeed in matching the appearance of the original scenes (that is, the information they preserve is sufficient to match appearance).

We apply this logic to two models of peripheral appearance: one in which high spatial frequency information (fine detail) is lost in the periphery (simulated using Gaussian blur), and another in which image structure in the periphery is assumed to be “texturised”. We had observers discriminate images using a three-alternative temporal oddity task. Three image patches are presented consecutively; two are identical to each other, and one is different. The “oddball” could be either the original or the modified image. The observer indicates whether image 1, 2 or 3 was different to the other two. If the images appear identical, the observer will achieve 33% correct, on average.

Our results show that neither a blur model nor a texture model are particularly good at matching peripheral appearance. Human observers were more sensitive to natural appearance than might be expected from either of these models, implying that richer representations than the ones we examined will be required to match the appearance of natural scenes in the periphery. That is, the models discard too much information.

Finally, we note that appearance matching alone is not sufficient. A model that discards no information would match appearance perfectly. We instead seek the most compressed (parsimonious) model that also matches appearance. Therefore, the psychophysical approach we outline here must ultimately be paired with information-theoretic model comparison techniques to adjudicate between multiple models that successfully match appearance.

You can read our paper here, and the code, data and materials are also available (you can also find the code on Github).

Wallis, T. S. A., Bethge, M., & Wichmann, F. A. (2016). Testing models of peripheral encoding using metamerism in an oddity paradigm. Journal of Vision, 16(2), 4.

Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature Neuroscience, 14(9), 1195–1201.

New paper: Unifying saliency metrics


How well can we predict where people will look in an image? A large variety of models have been proposed that try to predict where people look using only the information provided by the image itself. The MIT Saliency Benchmark, for example, compares 47 models.

So which model is doing the best? Well, it depends which metric you use to compare them. That particular benchmark lists 7 metrics; ordering by a new one changes the model rankings. That’s a bit confusing. Ideally we would find a metric that unambiguously tells us what we want to know.

Over a year ago, I wrote this blog post telling you about our preprint How close are we to understanding image-based saliency?. After reflecting on the work, we realised that the most useful contribution of that paper was buried in the appendix (Figure 10). Specifically, putting the models onto a common probabalistic scale makes all the metrics* agree. It also allows the model performance per se to be separated from nuisance factors like centre bias and spatial precision, and for model predictions to be evaluated within individual images.

We re-wrote the paper to highlight this contribution, and it’s now available hereThe code is available here.


Kümmerer, M., Wallis, T.S.A. and Bethge, M. (2015). Information-theoretic model comparison unifies saliency metrics. Proceedings of the National Academy of Sciences.

*all the metrics we evaluated, at least.

NOTE: I don’t endorse the ads below. I’d have to pay wordpress to remove them.



How close are we to understanding image-based saliency?

How well can we predict where people look in stationary natural images? While the scope of this question addresses only a fraction of what it means to understand eye movements in natural environments, it nevertheless remains a starting point to study this complex topic. It’s also of great interest to both cognitive scientists and computer vision researchers since it has applications from advertising to robotics.

Matthias Kümmerer has come up with a statistical framework that answers this question in a principled way. Building on the nice work of Simon Barthelmé and colleagues, Matthias has shown how saliency models can be compared in units of information (i.e. using log-likelihoods). Since information provides a meaningful linear metric, it allows us to compare the distance between model predictions, a baseline (capturing image-independent spatial biases in fixations) and the gold standard (i.e. how well you could possibly do, knowing only the image).

So how close are we to understanding image-based saliency? Turns out, not very. The best model we tested (a state-of-the-art model from 2014 by Vig, Dorr and Cox) explained about one third of the possible information gain between the baseline and the gold standard in the dataset we used. If you want to predict where people look in stationary images, there’s still a long way to go.

In addition, our paper introduces methods to show, in individual images, where and by how much a model fails (see the image above). We think this is going to be really useful for people who are developing saliency models. Finally, we extend the approach to the temporal domain, showing that knowing about both spatial and temporal biases, but nothing about the image, gives you a better prediction than the best saliency model using only spatial information.

The nice thing about this last point is that it shows that Matthias’ method is very general. If you don’t think that measuring where people look in stationary images tells you much about eye movements in the real world, that’s fine. You can still use the method to quantify and compare data and models in your exciting new experiment.

A paper that goes into much more detail than this blog post is now available on arXiv. In particular, saliency experts should check out the appendices, where we think we’ve resolved some of the reasons why the saliency model comparison literature was so muddled.

We’re close to submitting it, so we’d love to hear feedback on the pitch and nuance of our story, or anything that may not be clear in the paper. You can send me an email to pass on your thoughts.

When the paper is out we will also be making a software framework available for model comparison and evaluation. We hope the community will find these to be useful tools.

As usual, everything that appears below this line are ads and not endorsed by me. I don’t make money from this site; I would have to pay WordPress to remove the ads.