Today I’m going to talk about how your computer cup can runneth over. So let us look at integer overflow.
You’ll have seen this all over the place but probably didn’t know the reason it happened. If you download large files you sometimes see it, where the file claims to have a negative size. It can also happen in the opposite direction, where numbers get so negative they suddenly become positive again. This wrapping round is called overflow (or underflow when negative numbers become positive).
Essentially it comes from the computer having a hard-coded “largest number”. There are ways around this but many programmers assume it won’t affect them. They assume that the numbers they deal with won’t get so big to cause a problem. So how does it happen?
Counting on your fingers
What do small children do when they run out of fingers to count on? Obviously they count with fingers and toes. And when they run out of toes things get tricky. Most people, when they get to the stage they can count above twenty, don’t need fingers and toes any more.
Computers also have an equivalent of counting on fingers and toes. They can count a bit higher than 20 without running out of digits but the principle is the same. Most desktop computers can count as high as 232-1 before running out of space. (That number is 4294967295 in full, in case you were wondering.) Often the highest number is half of that, since some of that space is used to count the negative numbers too.
Think of the numbers on a number line. The number line has 232-1 marked increments on it. In the middle of the line is ‘zero’. At either end is ‘maximum’ and ‘minimum’. The far ends of the number line are wrapped round and joined together in one continuous loop. This means the maximum and minimum numbers are actually next to each other. It is possible for a number to silently pass over this line and change size drastically in the process.
This happens to the ‘machine numbers’, which is how a computer naturally counts. In special cases where extra large numbers are needed it’s possible to use bypass this restriction. So people who deal with time in seconds since the beginning of the universe won’t have to worry. But most people don’t bypass it and this can cause problems.
A practical demonstration
I thought, since this one is quite a simple problem, that I would show you it in action. This section was inspired by
this series of posts.
I’ll be using a simple calculator-style interpreter. The format is very simple to understand - any line that begins “>” is what I wrote, and the result follows on the next line. So let’s have a look at some addition to start things off:
> 2 + 3
5
To make sure the interpreter doesn’t implicitly fix the problem I’m trying to demonstrate I will add in an extra bit of code to say I want to deal only with machine-level numbers. This is like saying “only count on your fingers”.
> 2 + 3 :: Int
5
The answer we see on screen is identical, but this time the computer is definitely using the “restricted” integer numbers. We can find out the top and bottom ends of this limited scale by asking the interpreter:
> minBound :: Int
-2147483648
> maxBound :: Int
2147483647
So, going on what we learned above, if we try to subtract something from minBound we’ll ending up with a positive number, and the similar but opposite will happen when we add to maxBound.
> maxBound + 1 :: Int
-2147483648
Ta-da! Just like magic. We have shown that working with machine numbers can cause even simple addition to go awry.
Start worrying, the end is coming
No doubt you’ll remember the year 2000 date problem… well, there’s another one coming up in 2038. Yes, it’s true. And it’s all down to integer overflow.
A unix system thinks of time as being the number of seconds since the Epoch; the epoch is midnight, 1 January 1970. By the year 2038 the requisite number of seconds will have passed and “seconds since epoch” will start to look like “seconds before epoch”. So 2038 (ie, 1970+68) will also be 1902 (1970-68). Or possibly just 1970…
You can see the problems inherent in lack of foresight… those silly programmers in the seventies that didn’t think their work would still be in use nearly forty years later!