Oct 12, 2015 12:00
art,
temperature,
fraud,
death,
viadrdoug,
patriarchy,
society,
women,
law,
trees,
movies,
care,
3d,
economics,
usa,
engineering,
crows,
josswhedon,
stereotypes,
cooking,
research,
reviews,
architecture,
nazis,
links,
hitler,
history,
healthcare,
technology,
sexism,
nature,
rivers,
achievement,
intelligence,
students,
design,
marijuana,
banking,
money,
legalisation,
internet,
perspective,
disney,
animation,
photos,
avengers,
tax,
viaswampers,
replication,
disabilities
A lot of my papers are a combination of mathematics, computer code and data.
The mathematics (if I don't screw up) is replicable with effort (we rarely publish enough "steps" to follow an argument without intellectual work because papers are limited in length). The data may be proprietary but if it is not then I try to publish this somewhere. The code is the problem. So code is code... but a few months ago someone emailed me about a seven year old paper "send me the code".
I still had the code. If I did not I am relatively sure none of the other authors would have had it -- I wrote the code and did all the runs. The code was stored in a subversion repository -- but not all academics use even that. Version control came late to academia compared with industry because usually people are coding on their own.
Because I am still friends with the people who I worked with then I still have access to the subversion repository and it has not been shut down. Had it been shut down the code would be gone except for local copies on my machines. I might be able to find a copy of the code from a backup -- but how often do you check your "seven years ago" backup?
Then having got the code I had to compile it. That was the big problem. The default parameters to the compiler had changed loads in the intervening time. The makefile didn't work. After a lot of effort I figured out that it needed a particular compiler flag (-std=c99 I think) to build.
Having done that I then needed the results analysis code -- this was in perl. Just not the version of perl currently stored on my machine.
You *could* make this kind of thing replicable by having a VM image... and having something to run that VM image on that still exists seven years later. But at the time this would have been kind of a ridiculous proposition as the VM emulator would not have been powerful enough to run the code in reasonable time.
Nowadays I guess I could save a vagrant image and hope that in seven years time vagrant is still a usable thing.
Networking and systems papers are even harder as you're testing the performance of particular machines in combination with the hardware. Your paper won't be replicated unless the person also has access to that hardware. After seven years there's not a chance of this.
Sure... it's not ideal but it's a surprisingly hard problem to solve. Some colleagues are trying to solve it by insisting that papers come with a VM and scripts that do essentially "build paper". This only works if you don't need real hardware/networking performance.
Reply
But with the state of worries about quality of research I suspect that replication is going to be something that there's a bigger and bigger demand for.
Reply
Hmm... yes and no. My original degree is physics and to some extent I still have the mindset. A lot of famous studies were non-replicable.
Millikan's oil drop is a pretty famous experiment to physicists -- not sure about outside the field. It's notoriously tetchy. I'm a pretty bad experimenter (too impatient I think) and I hated doing it -- but it's a famous experiment about the charge on the electron. Apparently Millikan's original with modern stats analysis had so much uncertainty it essentially proved nothing. Completely unrepeatable junk result in the terms of those guys. But in fact it was the "right" experiment. It was repeatable in the better sense that it was the right thing to do and could be repeated and refined along the years.
Another classic was Eddington's WWI eclipse measurements in Principe which was the canonical first "confirmation" of the predictions of General Relativity (and also rather neatly allowed the awkwardness of Eddington being a conscientious objector -- he was "forced" to do fieldwork in Principe instead of serving in WWI). Again the errors were (by modern standards) larger than the measurement and it's not replicable without a time machine or another convenient eclipse.
Individual studies don't get replicated but the ideas that they propound either become confirmed by further experiment, refuted by further experiment or ignored completely (in which case it doesn't matter).
I really enjoyed this book
http://www.amazon.co.uk/The-Golem-Second-Edition-Classics/dp/1107604656
It's got some great case studies of times when various disciplines have split on the possibility or otherwise of a certain experiment/method. One of the most compelling was about the flatworm memory experiments. Which is a classic tale about how hard it is to repeat results, except I'm not sure what the moral really is. Eventually the answer will be known but sometimes it takes a long time to get there.
http://www.theverge.com/2015/3/18/8225321/memory-research-flatworm-cannibalism-james-mcconnell-michael-levin
Reply
This is the sort of thing that DVCS improves on, of course - if you'd started a similar project today, you might (I'm guessing) have naturally used git rather than svn, in which case any local copy lying around on any machine or backup you could find would have automatically come with a copy of the complete history, and the availability of the upstream server wouldn't be so critical.
Of course that wouldn't solve the rest of the problems, like the scripts not running right in up-to-date Perl and similar bit-rot. But it would be a start, at least.
Reply
Today, for example, hosting repos on github makes sense and some of my papers refer to that as a code repos. In seven years will it be there? Bet now.
Reply
In maths, for example, it seems increasingly that everybody who is anybody posts their papers on the arXiv, so I suppose the right answer would be that the arXiv should provide a means of hosting a git repository alongside the PDF, and that any paper on there with a vital computational component (which in maths, I expect, would be less about replicability and more in the 4CT 'computer-assisted proof' sort of space) would take advantage of that. (Bet they don't, though.)
In a discipline where papers are still mostly in hard-copy journals, that might be (even) harder to arrange...
Reply
Of course journals themselves don't last forever and do fuck up. I discovered after about 5 years that the journal with my most cited paper had screwed up and not ever put my paper online (link error). Nobody seemed to have noticed this as it was available at my web site and IIRC on arXiv so continued to be cited at the journal where it wasn't available except in hard copy.
Reply
Reply
Reply
The scenario you want to avoid is that the paper is still out there claiming some result, and the critical supporting code isn't.
Reply
There are exceptions of course but how many? Yes, Perelman's paper on the Poincare conjecture -- but he's Perelman. He could scrawl it on a loo wall, someone would put it online for him and it would live on.
So papers will (mostly) survive arXiv dying anyway if the journal they are in survives. :-)
Reply
Rationale: the point of a journal is not the physical publication and distribution of the paper, which the arXiv does better anyway; the real added value is the selection and peer review which winnows the great mass of proto-papers out there into ones that are judged by sensible people to be both correct and important.
So you upload your preprint to the arXiv, you submit to the journal by sending you a link, and if it passes peer review, then a link to that arXiv entry appears on the journal website.
Reply
TBPH, the likelihood of arxiv going down without warning and with no backup is negligible though.
(Goodness knows how many people analyse arXiv anyway as part of their research -- I bet a good chunk of the network science community is holding a local copy.)
Reply
Leave a comment