If a scientist needs statistics to prove a point, then the data are not good enough.

Feb 13, 2017 10:13

I tend to subscribe to the above truism (ie: 'if x then y always happens' is the best kind of scientific demonstration). However, sometimes there is just not enough data. As a result I have a statistical challenge (cross posted to facebook) that I hope my more statistically skilled friends might be able to answer:

I have been looking at breakpoints in DNA, and essentially found that all 6 breakpoints cluster in a 1000 bp zone of a sequence that is 3000bp long. I am attempting to assess the odds of them all clustering by chance. I know I need to take into account that I would be surprised if they clustered in any 1000 bp region, not just the specific region I observed, so my current approximation is 1/3^5 (rather than 1/3^6). But I think that this assumes that there are three discrete 1000bp regions, whereas in reality I would need to account for the 'sliding window' reality that any stretch of 1000bp would be fine, so the region is not simply defined by the placement of the first example.

Any suggestions?
Previous post
Up