The mathematics of the Million Second Quiz: jiggery

jiggery_pokery

The mathematics of the Million Second Quiz

Sep 15, 2013 21:38

The Million Second Quiz is a quiz event taking place in the United States on the NBC TV network, online and in person at the moment. It consists of a series of quiz bouts between a champion and a series of challengers, taking place around the clock over the course of a million seconds, or about eleven and a half days.

The champion earns a nominal $10 per second while they remain the champion, whether the quiz bouts are in progress or not, until they are defeated by a challenger. Defeated champions only convert their nominal prize into an actual payout if they are the reigning champion at the end of the million seconds or if they are one of the four most successful defeated champions along the way, and there is set to be an extra competition at the end of the million seconds to pay out an extra bonus to one of them.

As I type, I've seen the first four episodes of the show and I've enjoyed them. The action is at sufficient pace that the shows have vitality and excitement, and the continuation of the action from one show to the next really conveys the sense of an overarching, ambitious, continuous game. I have not found the narrative at all confusing, though accept that not all gameplay elements were originally introduced to the viewer in the ideal order.

Full disclosure dictates that I reveal that I know staff working on the show and I believe they tend to be working on the questions. I've enjoyed the questions, particularly for their unusually up-to-date nature and reliance on prompt current events. A little of the question wording has been criticised and the criticism looks reasonable to me, but overall I consider the standard unusually high and interesting.

The quiz bouts during the live televised program follow what I consider to be a reasonably interesting structure. They last for either three hundred or four hundred seconds, as announced in advance, and are made up of a series of multiple-choice questions with four answers. The two contestants are asked to identify the correct answer from among the four each time within a five-second time limit. Contestants earn points for correct answers and the contestant with more points at the end of the quiz wins the bout.

Questions started in the first hundred seconds have a base value of one point, questions in the second hundred seconds have a base value of two points and so on. Both contestants independently answer the same question and earn the base value if their answer is correct.

However, as an alternative to answering the question, either contestant may press their "doubler" button at any time. This pauses the bout. The doubling contestant's opponent then has five seconds to answer the question. If the opponent answers correctly, they score double the base value; if they answer incorrectly, the doubling contestant scores double the base value.

That said, the doubling contestant's opponent can go on to "double back" and return the question to the original doubling contestant. The original doubling contestant then has no choice but to answer the question within a further five seconds. If they answer correctly, they score four times the base value; if they answer incorrectly, their opponent scores four times the base value.

So there are two interesting gameplay decisions alongside trying to answer the questions:
1) Should I double a question?
2) If my opponent doubles a question to me, should I double back?

I think these are worthy of a little Expected Value analysis. I haven't seen anyone perform this analysis yet, so I shall go ahead and do so. In summary, the mathematics confirms some intuitive predictions about what optimal strategy might seem to be and extends this by formalising the parameters used to make the decision.

We consider the general issue of whether to double or double back a single question, and we make the massively simplifying assumption that we consider it in isolation of the state of the game. In practice, the remaining time and the score of the game to date have considerable effects on optimal strategy under specific circumstances that this analysis will not cover.As an example, the first televised bout in the third episode had a good example. With fewer than ten seconds remaining, one contestant was down 14-23 and doubled the final, three-point, question. With six points at stake, the other contestant should just have answered and won 20-23 even if they had answered incorrectly to concede the six points. However, they made an objectively poor decision by making the only play that risked them losing and doubled back; this redoubled the question value to 12, and the initial doubler answered correctly to jump from 14-23 to win 26-23.
Let us define some variables, in the context of a single question:

t is the base value of that question. So far, permissible values of t have been 1, 2, 3 and 4. It is possible that the TV show might change the gameplay over time.
c is the probability that you correctly select the answer to that question, and thus lies between 0 and 1 inclusive.
o is the probability that your opponent correctly selects the answer to that question, and thus lies between 0 and 1 inclusive.
c* is your opponent's estimate of the probability that you correctly select the answer to that question, and thus lies between 0 and 1 inclusive.
o* is your estimate of the probability that your opponent correctly selects the answer to that question, and thus lies between 0 and 1 inclusive.

In each case, we care about the change in the gap between your score and your opponent's score as a result of that question. Ideally, we want this to be positive and as high as possible, reflecting a large increase in your score and no increase in your opponent's score. A negative change reflects no increase in your score and an increase in your opponent's score.

BASELINE

If nobody doubles, then there is a c probability that you will answer correctly and score t points. However, there is also an o probability that your opponent will answer correctly and score t points. Accordingly, you expect to score ct points and your opponent expects to score ot points, so the expected increase in the gap is t(c-o) as a baseline.

QUESTION TWO: If my opponent doubles a question to me, should I double back?

The analysis for question one will rely on the result for question two, so we will consider this first. Suppose your opponent has doubled a question to you, so you can either answer or double back. We will consider these two cases separately.

Case (a): you answer.

There is a c probability that you will answer correctly and score 2t points. However, there is a 1-c probability that you will answer incorrectly. In such a circumstance, you will score zero and your opponent will score 2t points.

This means that there is a c probability that the gap will increase by 2t and a 1-c probability that the gap will increase by -2t. Consequently the expected increase in the gap is 2tc - 2t(1-c) or 2t(2c-1) points.

Case (b): you double back.

There is a o* probability that your opponent will answer correctly and score 4t points. However, there is a 1-o* probability that your opponent will answer incorrectly. In such a circumstance, your opponent will score zero and you will score 4t points.

This means that there is a o* probability that the gap will increase by -4t and a 1-o* probability that the gap will increase by 4t. Consequently the expected increase in the gap is -4to* + 4t(1-o*) or 4t(1-2o*) points.

Accordingly, we can draw a probability tree of the outcome as follows:

With this in mind, we can use this to inform our decision whether to double back or not. We will choose to answer if the expected increase in the gap between scores is higher - and thus more favourable to us - by answering than if the expected increase in the gap between scores is higher by doubling back.

Thus we will answer if 2t(2c-1) > 4t(1-2o*)

Dividing by 2t throughout, we will answer if 2c-1 > 2(1-2o*)

Adding 1 to each side, we will answer if 2c > 3-4o*

Adding each side by 2, we will answer if c > 3/2-2o*

Rephrasing this, we will answer if o* > 3/4-c/2

We can draw a graph of this like so:

This illustrates a conclusion that the decision on whether or not to double back depends twice as much on whether you think your opponent knows it than on whether you know it. Even if you are almost certain about the answer, you should choose to double it back if your opponent is guessing completely by chance.

We use this result to answer

QUESTION ONE: Should I double a question?

It is key to consider the previous result and your opponent's likely reaction to your double. We assume that the opponent will act rationally and then use the same logic to make a decision from their perspective. The opponent will estimate whether they think you are likely to know the answer or not, which is the definition of calculating a value for c* and we can use the logic from question two.

Accordingly, the opponent will answer if o > 3/2-2c*

We can draw a graph of this like so:

If the opponent prefers your chance of answering the question to theirs, they will choose to answer. This is the green situation.

If the opponent prefers their chance of answering the question to yours, they will choose to double back to you. This is the blue situation.

We need to analyse the two cases separately. We will analyse the blue situation first.

Case (a): your opponent doubles back.

There is a c probability that you will answer correctly and score 4t points. However, there is a 1-c probability that you will answer incorrectly. In such a circumstance, you will score zero and your opponent will score 4t points.

This means that there is a c probability that the gap will increase by 4t and a 1-c probability that the gap will increase by -4t. Consequently the expected increase in the gap is 4ty - 4t(1-c) or 4t(2c-1) points.

Accordingly, we can draw a probability tree of the outcome as follows:

With this in mind, we can use this to inform our decision whether to double or not. We will choose to double if the expected increase in the gap between scores is higher - and thus more favourable to us - by doubling than if the expected increase in the gap between scores is higher by not doubling. This calculation relies on the result derived above and the result from the baseline case.

Thus we will double if 4t(2c-1) > t(c-o)

Dividing by t throughout, we will double if 4(2c-1) > (c-o)

Expanding each side, we will double if 8c-4 > c-o

Adding 4-c to each side, we will double if 7c > 4-o

Dividing each side by 7, we will double if c > 4/7-o/7

We can draw a graph of this like so:

This illustrates a conclusion to this case that, if you can make the assumption that the opponent will double back, the decision on whether or not to double depends seven times as much on whether you know it than on whether your opponent knows it. Even if you are almost certain your opponent knows the answer, you should choose to double it if you have a nearly 50:50 chance of being right.

We can use this result to analyse the green situation now.

Case (b): your opponent answers.

There is a o probability that your opponent will answer correctly and score 2t points. However, there is a 1-o probability that your opponent will answer incorrectly. In such a circumstance, your opponent will score zero and you will score 2t points.

This means that there is a o probability that the gap will increase by -2t and a 1-o probability that the gap will increase by 2t. Consequently the expected increase in the gap is -2to + 2t(1-o) or 2t(1-2o) points.

Accordingly, we can draw a probability tree of the outcome as follows:

With this in mind, we can use this to inform our decision whether to double or not. We will choose to double if the expected increase in the gap between scores is higher - and thus more favourable to us - by doubling than if the expected increase in the gap between scores is higher by not doubling. This calculation relies on the result derived above and the result from the baseline case.

Thus we will double if 2t(1-2o) > t(c-o)

Dividing by t throughout, we will double if 2(1-2o) > (c-o)

Expanding each side, we will double if 2-4o > c-o

Adding o to each side, we will double if 2-3o > c

Reversing the inequality, we will double if c < 2-3o

We can draw a graph of this like so:

This illustrates a conclusion to this case that, if you can make the assumption that the opponent will answer, the decision on whether or not to double depends three times as much on whether your opponent knows it than on whether you knows it. Even if you are almost certain you know the answer, you should choose to double it if your opponent has much less than a 50:50 chance of being right.

We pull these half conclusions together to solve the overall question.

So in conclusion, in order to decide whether to double or not, you must work out whether your opponent is likely to think you know it or not.

If you think your opponent is likely to think you don't know it, you should assume that they will double back and you should choose to double it if you have a 58% chance of being right, or less (as low as 42%) if you think your opponent knows it.

If you think your opponent is likely to think you do know it, you should assume that they will answer and you should choose to double it if your opponent has less than a 67% chance of being right, or less (as low as 33%) if you know it for sure.

If your opponent doubles a question to you, whether or not to double back depends twice as much on whether you think your opponent knows it than on whether you know it. Even if you are almost certain about the answer, you should choose to double it back if your opponent is guessing completely by chance.

I would not expect these conclusions to be considered counter-intuitive or surprising at all, but the maths underpinning them interests me. Additionally, I do not think it realistic to be able to calculate exact probabilities within the timespan of a few seconds given by the show, though only a general sense is required, and I think that that is realistic. It is far more important to be able to answer the questions correctly, particularly the crucial ones, than anything else!

(With thanks to K. for improvements to a draft of this.)

Please redirect any comments here, using OpenID or (identified, ideally) anonymous posting; there are

comments to the post already. Thank you!