Quick link: Slow science

Deborah Apthorp pointed me to this excellent post by Uta Frith. It reflects a lot of my own feelings about how science could be improved (though I haven’t worked out how to balance these ideas with my personal need to get a permanent job in a fast science environment).

It reminds me of an idea that I heard from Felix Wichmann (not sure on origin) that every scientist would only be allowed to publish some number (e.g. 20) papers in their entire career. This would create pretty strong incentives not to “salami slice”, and to only publish results that you felt were deeply insightful. Of course, how to evaluate people for tenure / promotion in such a system is unclear.

Quick link: How to give a good talk

I came across this blog on how to give good scientific talks. I’m up to the second post and I agree with almost all of it so far. Eliminating words on slides and using presenter notes is something I’ve recently started to do. It takes some more practice but I feel like people are indeed more engaged in the talk.

Exploring GLMs with Python

A few months ago Philipp Berens and I ran a six week course introducing the Generalised Linear Model to graduate students. We decided do run the course in Python and to use the IPython Notebook to present the lectures. You can view the lecture notes here and get the source code here.

The six lectures were:

  1. Introduction by us, then student group presentations of a points of significance article that we hoped would provide a refresher on basic statistics.
  2. Finishing off student presentations (which took too long in the first week), followed by an introduction to Python, dataframes and exploratory plotting.
  3. Design matrices
  4. Interactions
  5. ANOVA and link functions
  6. Current issues in statistical practice, in which we touched on things like exploratory vs confirmatory data analysis, p-hacking, and the idea that statistics is not simply a cookbook one can mechanically follow.

What worked

  • In general the feedback suggests that most students felt they benefitted from the course. Many reported that they enjoyed using the notebook and working through the course materials with a few exercises themselves.
  • The main goal of the course was to give students a broad understanding of how GLMs work and how many statistical procedures can be thought of as special cases of the GLM. Our teaching evaluations suggest that many people felt they achieved this.
  • The notebooks allow tying the theory and equations very concretely to the computations one performs to do the analysis. I think many students found this helpful, particularly in working through the design matrices

What didn’t

  • Many students felt the course was too short for the material that we wanted to cover. I agree.
  • Some students found the lectures (for which weeks 2–5 involved me working through notebooks and presenting extra material) boring.
  • Despite the niceness of the Anaconda distribution, Python is unfortunately still not as easy to set up as (for example) MATLAB across different environments (primarily Windows, some OSX). We spent more time than we would have liked troubleshooting issues (mainly to do with different Windows versions not playing nicely with other things).
  • We didn’t spend a lot of time discussing the homework assignments in class.
  • Some students (graduate students in the neural behavioural school) are more familiar with MATLAB and strongly preferred that we teach the course using that.

If I were to run the course again

I think the main thing I would change if I ran the course again would be to have it run longer. As a “taste test” I think six weeks was fine, but to really get into the materials would require more lectures.

I also think it would be beneficial to split the content into lectures and practicals. I think the IPython notebooks in the way I used them would be excellent for teaching practical / small class tutorials, but that the lectures probably benefit from higher-level overviews and less scrolling over code*.

I would also plan the homework projects better. One student suggested that rather than having the introduction presentations at the beginning of the course, it would be nice to have the last lecture dedicated to students running through their analysis of a particular dataset in the notebook. I think that’s a great idea.

Finally, I would ignore requests to do the course in MATLAB. Part of why we wanted to use Python or R to do the course was to equip students with tools that they could continue using (for free) if they leave the university environment. Perhaps this is more valuable to undergraduates than PhD students (who already have some MATLAB experience), but I think it’s good for those students to get exposed to free alternatives as well. Plus, the IPython notebooks are just, well, pretty boss.

*I know you can hide cells using the slide mode, but I found slide mode in general quite clunky and so I opted not to use it.

New paper: contrast, eye movements, movies and modelling

Sensitivity to spatial distributions of luminance is a fundamental component of visual encoding. In vision science, contrast sensitivity has been studied most often using stimuli like sinusoidal luminance patterns (“gratings”). The contrast information in these stimuli is limited to a narrow band of scales and orientations. Visual encoding can be studied using these tiny building blocks, and the hope is that the full architectural design will become clear if we piece together enough of the tiny bricks and examine how they interact.

Real world visual input is messy and the interactions between the building blocks are strong. Furthermore, in the real world we make about 3 eye movements per second, whereas in traditional experiments observers keep their eyes still.

In this paper, we attempt to quantify the effects of this real-world mess on sensitivity to luminance contrast. We had observers watch naturalistic movie stimuli (a nature documentary) while their gaze position was tracked. Every few seconds, a local patch of the movie was modified by increasing its contrast. The patch was yoked to the observer’s gaze position, so that it always remained at the same position on the retina as the observer moved their eyes. The observer had to report the position of the target relative to their centre of gaze.

We quantify how sensitivity depended on various factors including when and where observers moved their eyes and the content (in terms of low-level image structure) of the movies. Then we asked how well one traditional model of contrast processing could account for our data, comparing it to an atheoretical generalised linear model. In short, the parameters of the mechanistic model were poorly constrained by our data and hard to interpret, whereas the atheoretical model did just as well in terms of prediction. More complex mechanistic models may need to be tested to provide concise descriptions of behaviour in our task.

I’m really happy that this paper is finally out. It’s been a hard slog – the data were collected way back in 2010, and since then it has been through countless internal iterations. Happily, the reviews we received upon eventual submission were very positive (thanks to our two anonymous reviewers for slogging through this long paper and suggesting some great improvements). I’ve got plenty of ideas for what I would do differently next time we try something like this, and it’s something I intend to follow up sometime. In the meantime, I’m interested to see what the larger community think about this work!

To summarise, you might like to check out the paper if you’re interested in:

  • contrast perception / processing
  • eye movements
  • naturalistic stimuli and image structure
  • Bayesian estimation of model parameters
  • combining all of the above in one piece of work

You can download the paper here.

The data analysis and modelling were all done in R; you can find the data and code here. Note that I’m violating one of my own suggestions (see also here) in that the data are not shared in a text-only format. This is purely a constraint of file size – R binaries are much more compressed than the raw csv files, so they fit into a Github repository. Not ideal, but the best I can do right now.