My current direction in scientific computing

During my PhD I learned to program in Matlab. I’d never done any programming before that, and I found it to be a rewarding experience. As is typical for people in vision science, I did pretty much everything in Matlab. Stimuli were generated and presented to human subjects using the CRS Visage (in my PhD; programming this thing can be hell) and now the excellent Psychtoolbox. Early on in my PhD I also moved away from SPSS to doing data analysis in Matlab, too.

An early project in my postdoc (see here) involved some more sophisticated statistical analyses than what I had done before. For this, Matlab was an absolute pain. For example, the inability (in base Matlab) to have named columns in a numerical matrix meant that my code contained references to column numbers throughout. This meant that if I wanted to change the order or number of variables going into the analysis I had to carefully check all the column references. Ugly, and ripe for human error.

Cue my switch to R. For statistical analyses R is pretty damn excellent. There are thousands of packages implementing pretty much every statistical tool ever conceived, often written by the statistician who thought up the method. Plus, it does brilliant plotting and data visualisation. Add the ability to define a function anywhere, real namespaces and the excellent R Studio IDE and I was hooked. I would try to avoid using Matlab again for anything on the analysis side but some light data munging (this is also wrapped up in my preference for science in open software).

For several years now I’ve been doing pretty much everything in R. For our latest paper, I also did my best to make the analysis fully reproducible by using knitr, a package that lets you include and run R analyses in a LaTeX document. You can see all the code for reproducing the analysis, figures and paper here. I’m going to work through the work flow that I used to do this in the next few blog posts.

While R is great for stats and plotting, unfortunately I’m not going to be able to fully replace Matlab with R. Why? First, last I checked, R’s existing tools for image processing are pretty terrible. A typical image processing task I might do to prepare an experiment is take an image and filter it in the Fourier domain (say, to limit the orientations and spatial frequencies to a specific band). I spent about a day trying to do this in R a year or so ago, and it was miserable. Second, R has no ability to present stimuli to the screen with any degree of timing or spatial precision. In fact, that would be going well outside its intended purpose (which is usually a bad idea – see Matlab).

So my “professional development” project for this year is to learn some Python, and test out the PsychoPy toolbox. In addition I’m interested in the data analysis and image processing capabilities of Python – see for example scikit-learn, scikit-image and pandas. I’ve had some recent early success with this, which I’ll share in a future post. It would be so great to one day have all my scientific computing happen in a single, powerful, cross platform, open and shareable software package. I think the signs point to that being a Python-based set of tools.

Advertisements

7 comments

  1. Hi Tom

    I’m interested in what you’re talking about, and I know I’m going to be encountering Python sooner rather than later (I just figure I’ll deal with it when I get to it). And R definitely, clearly, is valuable (I’m slowly getting acquainted with it, though I haven’t made use of it yet).

    But I don’t really understand the issue you’ve brought up. From my limited perspective, just about everyone knows Matlab and is comfortable communicating computational ideas in .m files. You can read this different ways (e.g. my world hasn’t really intersected yet with a non-Matlab lab), but it is what it is. So this lingua franca status is a big reason, beyond being hugely familiar with it, to not want to move out of Matlab. What exactly do I gain by moving my computational work to Python?

    -Andrew

    1. Hi Andrew, thanks for commenting. Like I say in the post, I’m trying out Python this year as an experiment, and I will likely be doing some things with Matlab for years to come (largely because of legacy code). However, what I can say from my very limited experience with Python (3 days over Christmas coding some image processing) as far as a “lingua franca” goes is that the basic syntax of numpy arrays is very similar to Matlab. I think that anyone comfortable with Matlab could port simple scripts over to Python without much trouble, and more importantly could read Python scripts without making too many leaps from Matlab code. For simply porting single scripts over, I reckon you would be comfortable in Python within a week.

      There are a couple of issues I have with “communicating computational ideas in .m files”. One small but extremely nice thing is that every Python script will have a bunch of “imports” at the front. This tells the user of your script what other packages they need to get the script to run. I don’t know about you, but whenever I send someone a Matlab script it ends up in an email back-and-forth because I’ve forgotten to include some other .m files of functions that are on my Matlab search path somewhere but not explicitly loaded. A pain, and bad for version control.

      Second of course is the free software bit. If I don’t have Matlab, you can’t communicate your idea in an .m file to me (Octave blah blah, but last I tried Octave it was painful). Yes, most institutions pay for Matlab licenses, but in my recent experience it’s increasingly painful to actually use. For example, Tübingen has recently swapped to network-only licenses, which means that if I want to open Matlab and I’m not at the university, I need to VPN (must have internet connection) or need to remember to check-out a license before I leave the office.

      So what do you gain? Well, I guess I’ll have a better answer for you in a year, but I would say that you don’t lose much. Your lingua franca will translate, like how Danish speakers understand Swedish… Finally, since Python is a real programming language built to do anything, you can do anything in it – from processing text files to writing web pages.

      1. Those are all good points, and I think my position at present is just lack of knowledge (and lack of intersection, yet, with cases where I need a new suite of tools). But I do dread a situation where I’m in a lurch because of some network license thing. If I didn’t have something so basic at my fingertips I would be pretty frustrated and seeking new options.

        On communication, I’ve either learned to use single scripts to illustrate something straightforward to someone remotely, or to put together concise packages of functions to do more complex jobs, which is a result of so much of that missing .m file here and there (some 4 line operation you decide needs its own file then forget about)… But it’s mainly that it’s worked so far, which like the lingua franca issue, is a type of inertia. So matlab has a lot of inertia on its side. (And a good way to deal with a problem with a lot of inertia is to slowly transfer it, piece by piece, to a better system, so a yearlong transition sounds about right.)

        I expect to be learning Python by the end of the year though, it sounds like it’s going to be a part of my next position (after the Monash thing). It’s good to have someone on the frontier! Press on!

        -andrew

  2. Great post! I’m looking forward to learning some Python! I guess the real issue here is that, since one has to deviate from Matlab at some point (i.e., any serious data analysis) and to learn a new tool (R), why not use the open-source software in the first place? Python holds a great promise.

  3. You might enjoy http://abandonmatlab.wordpress.com/. I’ve gotten a kick out of some rants there. Apparently they are a user (or developer?) of psychopy.

    I suspect that you wont find one single language that satisfies all of your needs/wants/desires, even python. My approach recently has been to use a wide variety of programming languages. I’ve found this helpful in my field (electrical engineering/telecommunications/mathematics) because it encourages me to think in many of different ways and different notations. MATLAB is extremely common in engineering fields, to the point were many people think almost entirely in finite dimensional linear algebra (matrices and vectors). Sometimes there are problems, considered difficult or fiddly, that are actually simple, if you think in something other than MATLAB.

    All this is potentially overkill for psychologists.

  4. Hi Tom,
    I love your blog. Since you haven’t mentioned it, I thought I’d point you to a new scientific computing language, Julia (http://julialang.org) that promises to fulfill your dreams by combining the advantages of Matlab, Python, C++, and R (what we would call an “eierlegende Wollmilchsau”; see more ambition here: http://julialang.org/blog/2012/02/why-we-created-julia/). Some great developers of the tools that we have been using so far have undertaken this daunting endeavor. And some of them seem to have moved from R to Julia (e.g., Doug Bates, the father of linear mixed effects modelling toolboxes in R). Currently, Julia is still very much in development, but you can already import functionality from python etc.
    Have a look!
    Martin

    1. Hi Martin, thanks for the comments. Yes, I’ve heard of Julia as the next big thing on the horizon for analysis (I follow http://www.johnmyleswhite.com/, who has done some development for them). I agree it looks very cool, but as far as I’m aware there’s no plan (or in my opinion, need) to have any experiment / stimulus presentation capability in Julia. So it wouldn’t do *everything* that experimental psychologists need it for. Still, calling it from Python sounds cool!

      In any case, as Rob said (previous post), I could probably roll back my final comment (a single language that dominates at everything) and be happy with using several different open source languages that have different strengths but interface well with one another. Still, I’m intending to move further away from Matlab.

      P.S. for some reason I got your comment twice; once from two different accounts. I deleted the other.

Comments are closed.