Leave a comment

steer October 12 2015, 12:02:02 UTC
Re: Replicability -- it's much harder than you think ( ... )

Reply

andrewducker October 12 2015, 12:46:48 UTC
Oh, I'm sure it's harder with some subjects than others. And replicating it when you're dealing with specific machine configurations clearly isn't going to go well.

But with the state of worries about quality of research I suspect that replication is going to be something that there's a bigger and bigger demand for.

Reply

steer October 12 2015, 13:32:57 UTC
I suspect that replication is going to be something that there's a bigger and bigger demand for.Hmm... yes and no. My original degree is physics and to some extent I still have the mindset. A lot of famous studies were non-replicable ( ... )

Reply

simont October 12 2015, 14:09:20 UTC
Because I am still friends with the people who I worked with then I still have access to the subversion repository and it has not been shut down. Had it been shut down the code would be gone except for local copies on my machines.

This is the sort of thing that DVCS improves on, of course - if you'd started a similar project today, you might (I'm guessing) have naturally used git rather than svn, in which case any local copy lying around on any machine or backup you could find would have automatically come with a copy of the complete history, and the availability of the upstream server wouldn't be so critical.

Of course that wouldn't solve the rest of the problems, like the scripts not running right in up-to-date Perl and similar bit-rot. But it would be a start, at least.

Reply

steer October 12 2015, 14:16:09 UTC
Hmm... I think the difference is marginal as seven years ago I wasn't storing which checkout version I was using in experiments -- actually I rarely do this now though I know I should (deadlines deadlines). But yes it's something. I think software replication is becoming better. So often though you find that old links lead to dead services whether privately or publicly hosted.

Today, for example, hosting repos on github makes sense and some of my papers refer to that as a code repos. In seven years will it be there? Bet now.

Reply

simont October 12 2015, 14:20:30 UTC
Yes, I was just thinking that really you want the code to be stored alongside the paper, because if the paper itself isn't available any more then you have worse problems than the unavailability of the code.

In maths, for example, it seems increasingly that everybody who is anybody posts their papers on the arXiv, so I suppose the right answer would be that the arXiv should provide a means of hosting a git repository alongside the PDF, and that any paper on there with a vital computational component (which in maths, I expect, would be less about replicability and more in the 4CT 'computer-assisted proof' sort of space) would take advantage of that. (Bet they don't, though.)

In a discipline where papers are still mostly in hard-copy journals, that might be (even) harder to arrange...

Reply

steer October 12 2015, 14:25:59 UTC
I'm not sure that a git repository is likely to outlast arXiv though? Or do you mean arXiv should be a git host in itself?

Of course journals themselves don't last forever and do fuck up. I discovered after about 5 years that the journal with my most cited paper had screwed up and not ever put my paper online (link error). Nobody seemed to have noticed this as it was available at my web site and IIRC on arXiv so continued to be cited at the journal where it wasn't available except in hard copy.

Reply

simont October 12 2015, 14:28:16 UTC
Yes, I meant arXiv should be a git host itself. It needn't be a large and complicated sysadmin job - if you're only hosting a repository for RO access, you can just stick the repository directory somewhere it's accessible over straightforward HTTP and run 'git update-server-info' in it, and then it's basically no different from offering any other static file(s) for download.

Reply

steer October 12 2015, 14:40:06 UTC
Hmm... I'm not sure I'd care to bet which will live longer from arXiv or github?

Reply

simont October 12 2015, 15:14:19 UTC
But that's my point - if arXiv goes away, with all the actual papers on it, then nobody will be picking up a paper from it and saying 'help, I can't reproduce this result!' in the first place. The mathematical community is so dependent on arXiv that they'll need to salvage the data from it if the server itself vanishes, and if they can't, then they have bigger problems anyway.

The scenario you want to avoid is that the paper is still out there claiming some result, and the critical supporting code isn't.

Reply

steer October 12 2015, 15:30:56 UTC
Are very many successful academics are publishing arXiv alone simply because your funding rests on getting things into prestigious journals. So if you say "hey, look, I'm doing great stuff, I'm publishing on arXiv and it's brilliant" your HoD says "What the hell are you thinking, get that into somewhere decent right now".

There are exceptions of course but how many? Yes, Perelman's paper on the Poincare conjecture -- but he's Perelman. He could scrawl it on a loo wall, someone would put it online for him and it would live on.

So papers will (mostly) survive arXiv dying anyway if the journal they are in survives. :-)

Reply

simont October 12 2015, 15:37:59 UTC
A recently emerging phenomenon, and part of the reason I say mathematicians are close to critically dependent on the arXiv, is the concept of an 'arXiv overlay journal', which consists of a set of links to papers on the arXiv.

Rationale: the point of a journal is not the physical publication and distribution of the paper, which the arXiv does better anyway; the real added value is the selection and peer review which winnows the great mass of proto-papers out there into ones that are judged by sensible people to be both correct and important.

So you upload your preprint to the arXiv, you submit to the journal by sending you a link, and if it passes peer review, then a link to that arXiv entry appears on the journal website.

Reply

steer October 12 2015, 15:45:18 UTC
Ah... I wasn't aware of that. Paper copy exists or no (not that a paper copy is that much more reliable but it means there's a good chance a copyright library has a copy)? Good idea though -- makes sense. Any of these journals prestigious yet?

TBPH, the likelihood of arxiv going down without warning and with no backup is negligible though.

(Goodness knows how many people analyse arXiv anyway as part of their research -- I bet a good chunk of the network science community is holding a local copy.)

Reply


Leave a comment

Up