programming

My current direction in scientific computing

During my PhD I learned to program in Matlab. I’d never done any programming before that, and I found it to be a rewarding experience. As is typical for people in vision science, I did pretty much everything in Matlab. Stimuli were generated and presented to human subjects using the CRS Visage (in my PhD; programming this thing can be hell) and now the excellent Psychtoolbox. Early on in my PhD I also moved away from SPSS to doing data analysis in Matlab, too.

An early project in my postdoc (see here) involved some more sophisticated statistical analyses than what I had done before. For this, Matlab was an absolute pain. For example, the inability (in base Matlab) to have named columns in a numerical matrix meant that my code contained references to column numbers throughout. This meant that if I wanted to change the order or number of variables going into the analysis I had to carefully check all the column references. Ugly, and ripe for human error.

Cue my switch to R. For statistical analyses R is pretty damn excellent. There are thousands of packages implementing pretty much every statistical tool ever conceived, often written by the statistician who thought up the method. Plus, it does brilliant plotting and data visualisation. Add the ability to define a function anywhere, real namespaces and the excellent R Studio IDE and I was hooked. I would try to avoid using Matlab again for anything on the analysis side but some light data munging (this is also wrapped up in my preference for science in open software).

For several years now I’ve been doing pretty much everything in R. For our latest paper, I also did my best to make the analysis fully reproducible by using knitr, a package that lets you include and run R analyses in a LaTeX document. You can see all the code for reproducing the analysis, figures and paper here. I’m going to work through the work flow that I used to do this in the next few blog posts.

While R is great for stats and plotting, unfortunately I’m not going to be able to fully replace Matlab with R. Why? First, last I checked, R’s existing tools for image processing are pretty terrible. A typical image processing task I might do to prepare an experiment is take an image and filter it in the Fourier domain (say, to limit the orientations and spatial frequencies to a specific band). I spent about a day trying to do this in R a year or so ago, and it was miserable. Second, R has no ability to present stimuli to the screen with any degree of timing or spatial precision. In fact, that would be going well outside its intended purpose (which is usually a bad idea – see Matlab).

So my “professional development” project for this year is to learn some Python, and test out the PsychoPy toolbox. In addition I’m interested in the data analysis and image processing capabilities of Python – see for example scikit-learn, scikit-image and pandas. I’ve had some recent early success with this, which I’ll share in a future post. It would be so great to one day have all my scientific computing happen in a single, powerful, cross platform, open and shareable software package. I think the signs point to that being a Python-based set of tools.