Problems with representing numbers on computers, part 2

Jan 14, 2006 23:19

This may not make sense until you read Problems with representing numbers on computers, part 1.


Now what if you want to store numbers with fractional parts (e.g. 123.52)?

First, I want to mention something about representing fractional numbers in base 10, and explain what scientific notation is.

If you have a number with a fractional part in base 10, e.g. 123.52, that means 1 hundred, 2 tens, 3 ones... and 5 tenths, and 2 hundredths. Note that a tenth is 1 / 10 (1 divided by 10), a hundredth is 1 / 100, etc.

Some numbers don't lend themselves well to being represented in base 10. For example, 1 / 3 (one third) is 0.3333333.. with an infinite number of 3's. The problem is that there's no good way to write 1/3 in terms of tenths, hundredths, etc. You have to add up 3 tenths, plus 3 hundredths, plus 3 thousandths, and so on infinitely. If you only have a finite number of places to use, you can't store 1/3. e.g. If you had ten places to use, you could have .3333333333 -- but that's not quite equal to 1/3. It's close, but not exact. We'll come back to this problem later, when I talk about representing fractions in binary.

Scientific notation is a method for representing numbers, widely used in scientific applications. I mention it because it's similar to floating point numbers. I'll start with an example. The number 38129.7712 would be written in scientific notation as 3.81297712 * 10^4 (or sometimes as 3.81297712E4). Essentially, I'm taking the original number and dividing it by 10^4 = 10000. Since I don't want to actually change the number, I have to write down that it's multiplied by 10^4 (note that the 4 is called an exponent). In general, the idea is to rewrite the number so that it has a single digit, then a decimal point, and then the rest of the digits. For example, 123.5 would be written as 1.235 * 10^2.

(One of the advantages in science is that you can easily compare two large numbers and get an idea if one is much larger than the other. For example, is 239283202323 larger or smaller than 2123400110203? If I write them in scientific notation, I get 2.39283202323 * 10^11 and 2.123400110203 * 10^12, so it's easy to see that the second is larger -- it's multiplied by 10 one more time than the first number is. In general, if you write the numbers in scientific notation, sometimes just checking the exponent is enough to see which is larger.)

So how does this apply to storing fractional numbers in a computer?

Remember how 123.52 means 1 hundred, 2 tens, 3 ones, 5 tenths, and 2 hundredths? For the part of a number that's larger or equal to 1, you write it in terms of ones, tens, hundreds, thousands, etc. For the fractional part that's smaller than 1, you write it in terms of tenths, hundredths, thousandths, etc. Similarly, in binary you write the part larger than or equal to 1 in terms of ones, twos, fours, eights, etc. For the fractional part, you write it in terms of halves (1 / 2), quarters (1 / 4), eighths (1 / 8), etc. For example, 1101.01 means 1 eight, 1 four, 0 twos, 1 one, 0 halves, and 1 quarter. 8 + 4 + 1 + 1/4 = 13.25.

Now, a floating point number does the same sort of thing as scientific notation does. It takes 1101.01, and rewrites it as 1.10101 * 2^3 (3 is the exponent here). (The reason it's * 2^3 and not * 10^3 is that each place is a multiple of two, not a multiple of ten; I went over this in part 1.)

Again, just like in base 10, there are some numbers that can't be represented well. For example, 0.4 can't be represented well as a sum of halves, quarters, eighths, etc. It requires an infinite sum of them. So a computer, which has a limited number of bits with which to store the number, can't store 0.4 exactly. It can only store something close to it. This is the cause of some odd behaviour you sometimes see with computer arithmetic. For example, a computer might calculate 81.66 * 20 = 1633.1999999999998, when the answer should be 81.66 * 20 = 1633.2. This is getting close to the problem of why 12345678987654320 and 12345678987654321 come out the same. But, you say, those two numbers are integers, not fractional numbers -- so why the problem? Well, JavaScript stores all numbers as double precision (64 bit) floating points, even if the original numbers are integers. (That way, calculations are easier; by using floating points, it always has the ability to store a fractional part if a calculation results in a number with a fractional part. It doesn't have to go back and forth between using integers and floating points.) This is not a problem in itself -- except that a 64 bit floating point can't store as large a range of numbers as a 64 bit integer. Read on for how floating points work.

A double precision floating point is 64 bits. It uses 1 bit as the sign bit, 12 bits to store the exponent (called the "mantissa"), and 52 bits to store the number (called the "significand"). Actually, it uses a trick to squeeze out a little bit more storage space: Any non-zero binary number must have at least one bit that is set to 1. (If all the bits were 0s, then the number would be zero.) Once we write it in scientific notation style, it will be written as 1.xxxxx... -- there will always be a 1 and then a decimal point and then the rest of the number. So rather than storing the 1 before the decimal point, we can just assume it's always there and not have to store it. That gives us one more bit that we can use to store the rest of the number. (Note that you can't do this in base 10, because the digit before the decimal place could be 1, 2, 3, 4, 5, 6, 7, 8, or 9. You can't make an assumption as to what it will be.)

Remember that 1101.01 (13.25) can be written as 1.10101 * 2^3? So it would be stored as:

  • Sign bit: 0
  • Exponent (mantissa): 3 (the exponent is also stored in binary, but in a somewhat unusual form, so I won't go into it here and will just write it in base 10)
  • Number (significand): .10101 (note that the 1 before the decimal point is not actually stored

(Now, I said that you can assume that 1 is always there so long as the number isn't zero. So, if you're always assuming the 1 is there, it necessitates a special way to represent zero, and in fact a zero in floating point is represented differently -- it's not as simple as having all bits set to 0. That's the price paid for squeezing out an extra bit of storage.)

Now we're at the point where I can explain the original problem.

Let's look at how 12345678987654321 is stored as a double precision floating point. In binary, 12345678987654321 is:
101011110111000101010001100010100100011111010010110001

Writing that in scientific notation style, we have:
1.01011110111000101010001100010100100011111010010110001 * 2^53

So the sign bit is 0 (signifying a positive number), and the mantissa is 53 (from 2^53). Now, since we assume there's a 1 before the decimal point, we only store
.01011110111000101010001100010100100011111010010110001
for the significand. The problem is, that's still 53 bits (count them if you like), and the double precision floating point only has space for 52 bits for the significand. So what does it do? It chops off the rightmost bit (called the "least significant" bit -- that's the bit representing the number of ones) and only stores:
.0101111011100010101000110001010010001111101001011000

Now, I said that 12345678987654321 is
101011110111000101010001100010100100011111010010110001
and similarly, 12345678987654320 is
101011110111000101010001100010100100011111010010110000

The only difference is the rightmost least significant bit (representing the number of ones). So, because there's not space to store the least significant bit, it's not possible to tell the difference between those two numbers once they've been stored in a double precision floating point. When the computer tries to reconstruct the number, it assumes the missing bit is a 0. (This is a reasonable assumption; it has to assume one way or the other, and it doesn't make much difference in the long run whether it assumes a 1 or a 0. Either way, you're losing some information.)

In the case that you're storing a long fractional number, e.g. 8723.23834129523172323915, if the least significant bit got cut off it might not matter so much. The least significant bit affects the smallest part of the number -- so it would change the fractional part by a small amount. But when you're storing an integer, the smallest part of the number is the ones place -- and a change there is usually quite noticeable.

(Note that if you tried to store something even larger than 12345678987654321, more than one bit might get cut off -- and so more places than just the ones place would be incorrect.)

The solution? Use a programming language (other than JavaScript) that lets you store integers as integers (which, as you recall, have 1 sign bit and 63 bits of storage -- more than enough for 12345678987654321), or use a programming language that uses floating points larger than double precision. (Some programming languages use "extended double precision", which uses 80 or more bits instead of the 64 bits double precision uses.)

If you need a calculator that can handle large numbers, try the arbitrary precision calculator bc (*nix version). There's also a Windows version. It doesn't have a graphical interface; you have to type commands to it. But it's very flexible -- read the documentation for more information.

To get started, try typing
111111111^2 (and press enter).
It should give you 12345678987654321.

In general, you'll want to type
scale=5
(or maybe a number larger than 5) before you start doing any other calculations in it. scale determines how many decimal places it uses in calculations; by default, scale is set to 0, so if you type 4/3 you'll get 1 instead of 1.25. If you type scale=1 and then 4/3, you'll get 1.3. If you type scale=2 and then 4/3, you'll get 1.25. I find five decimal places is enough for most of my calculations.

Well, that was an interesting two hour diversion from homework. Comments?
Previous post Next post
Up