Comments | quercus: Dear LazyWeb statisticians

quercus

Dear LazyWeb statisticians

Nov 22, 2007 14:03

Anyone remember more of their stats than I do offhand?

I have a distribution function foo() that should return 1..n values in an approximately even random distribution.

To test this, I intend to collect N values returned and produce a histogram of them. This should return exactly n different values, all of which should have at least min instances ( Read more... )

lazyweb, statistics

Comments 11

strangerover November 22 2007, 21:26:51 UTC

argh,
I remember trying something like this in BASIC on a BBC B about 22 years ago...

I also think something in crypto-analysis has a relevent overlap in such functions?

quercus November 23 2007, 00:02:40 UTC

The chi-squared test is what I need, but it's too long since I used it to remember exactly how the terminology works.

ingaborg November 23 2007, 23:56:00 UTC

I ought to know this, so I'll have a go, but I STRONGLY recommend that you get this independently checked!

I think you need a one-tailed normal distribution confidence test. Here's why.

Define "success" to be "a specific number is rolled" (say, 3, but it doesn't actually matter). And you want to be 95 % confident of getting at least min of them.

I *think* the total number of successes, S, is distributed binomially (N,1/n) (i.e. number of trials, probability of success in a single trial). The binomial distribution is a tidy one with mean and standard deviation both equal to number of trials * probability of success, which in this case is N/n. You can approximate this with a normal distribution with mean and standard deviation both equal to (N/n) - this is your critical fact.

So now you need to find N such that p(S < (min-N/n)/(N/n)) <= 0.95. (I've normalized the distribution because tables only give distribution for mean and standard deviation 1 ( ... )

ingaborg November 24 2007, 00:21:09 UTC

Don't know that I got that right...back to first principles...

To get min successes you would expect on average to need to roll min * n times. Because on average you get 1 success every n rolls. That's the jolly old binomial distribution for you. So for example with 100 numbers, to get 5 successes on average you need 500 rolls. Yep, that sounds right.

That's a big enough N and small enough 1/n that the normal approximation is valid. Happy with that too, and still pretty sure that mean and sd of binomial is N/n.

One-tailed confidence test threshold is 1.645 * standard deviation away from the mean. That would be 1.645 * N/n. This is all good so far. It's the last bit I'm struggling with.

Doh! Looks it up on wikipedia. Standard deviation of binomial distribution is sqrt(N/n(1-1/n)). Apologies! Told you to check it...everything should start to work from here on in.

quercus November 26 2007, 16:24:14 UTC

Thanks for all this!

Back at work today, I'll see how I get on with it.

ingaborg November 26 2007, 19:27:16 UTC

ahaha. Sorry it's so rambling: it was late at night and I was pissed! I suggest you cut to the end bit which looked ok to me...

min = N/n - 1.645 * sqrt(N/n(1 - 1/n))

ingaborg November 24 2007, 00:22:38 UTC

So, erm, maybe:

min = N/n - 1.645 * sqrt(N/n(1 - N/n))

Give that a go and see if it gives you anything sensible.

ingaborg November 24 2007, 00:29:23 UTC

Ick, soz, I mean min = N/n - 1.645 * sqrt(N/n(1 - 1/n))

obviously...

ingaborg November 24 2007, 00:26:12 UTC

Oh. It gives you a quadratic equation. Sorry. But it still might be right.

ingaborg November 24 2007, 00:33:12 UTC

So my suggestion is to try some values out and get to an acceptable answer by an iterative method. For example I tried n = 100, N = 500, and min = 5, and found that there was a 95% confidence of getting at least 1 result for each value (round down the value of min calculated above). If you have some values in mind, you should be able to home in on something acceptable.

ingaborg November 26 2007, 19:29:25 UTC

Ignore the "min = 5", that's wrong! You calculate min from the other bit.