They called, 2 o'clock sharp. Asked a few questions; I tried to make very clear that I have quite a bit of lab science experience, although skirting the issue that technically my chemistry experience ends in highschool. They asked me a few easy questions, and then hit me with a hard one(below). I gave an answer, but it was clear from their tone that this wasn't the correct one. Oh well. Then at the end, everyone sounded like they were leaving, so I hung up. Hopefully everyone was done :S they said that they'd contact me within a week, however. Sounds like we're testing irrigation water, or something, which I would certainly like to do for a summer.
I clearly screwed up on one question, though; I wasn't quite sure what they were looking for -- it's clearly an information problem, but it's really context specific as far as what is allowable, and I was missing the context. Here it is:
You're receiving samples from some field testing. Each plot of land is supposed to be the source of ONE sample. You're receiving one vial per sample, ie one vial per plot of land. Each vial is labeled. There exists two vials that are labeled to be from the same plot. What do you do?
I figured that *both* of them are immediately suspect. One of them might be mislabeled. You could toss both datapoints, you could wait until the analysis is done and then ignore one of them(ie, add more labelling detail). You could try to re-get those two datapoints. Perhaps more importantly you could look at the *order* of samples; if they come on a tray, and are clearly not only marked but sorted in terms of their origin, you might be able to infer where they probably came from from that. so if your sample sequence contains
(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,6,17,...,299,300) there's a good probability that the second #6 is actually #16. For two reasons, one that it's just missing a 1, second that it is in the position where 16 would be. But this answer was not satisfactory by the sounds of it.
What was he asking? I don't think I misheard, they reiterated the question part way through, emphasizing the part where there is bad data; so I reemphasized the fact that you can't rely completely on bad data. I guess I could have gone on a rant about error analysis and bayesian expectation...but still.
What do you think?
edit: sequence, not set
Suggestions so far from people
- Fire the people who got the vials screwed up