Matlab and R - statistical arrays, how to work with them

Oct 11, 2010 11:33

R is a great, open source language for doing statistics. MATLAB is a powerful tool and language for all sorts of numerical computations, popular in academic settings.

In R, one common workflow is to collect some data, then put it into a data.frame, which is a lot like a spreadsheet. Each column can have a name, and a particular data type, so you can mix nominal and numeric and other variables in a single data.frame. The MATLAB version of this only comes in the Statistics Toolbox. It's called a dataset array.

Now, a large portion of R is centered around manipulating data.frames in fiendish and ingenious ways, but MATLAB is much less specialized. Still, there are some work-alikes that I have been finding (usually after implementing them myself... oops).

R ToolMATLAB Tooldata.framedataset
Hadley Wickham's reshape packagemelt: unstack

cast: stack and grpstats.

Surprisingly, not reshape
mergejoin

Those are the ones that I've wished I had, and then eventually found. If you can think of others that it would be useful to add to the table, say so in the comments!

By the way, if you program in MATLAB and want to write simple unit and behavior tests, I've ported a (basic) version of Python's doctest library to MATLAB. So you can write a usage example for your function in its help comment, and then automatically run the example to make sure the function still behaves as expected. This is also vaguely similar to R's Rd documentation format, which has an \examples section where you can run examples and compare their output with known-good results. This is done by R CMD check.

Doctest for Matlab on Matlab File Exchange: doctest

On BitBucket: here
Previous post Next post
Up