My computer is a cheating lazy teenager >

furahi

My computer is a cheating lazy teenager >_

Mar 26, 2006 23:36

I keep working on the thesis...
I'm making a neural network now to classify colors in an image in a (kinda cheap) effort to better locate license plates.

Basically the way a Neural network works is you tell the computer something like:

If you receive:Then write: R G B Red Green Blue Yellow 255 0 0 1 0 0 0 0 255 0 0 1 0 0 0 0 255 0 0 1 0 255 255 0 0 0 0 1 128 0 0 1 0 0 0 0 230 0 0 1 0 0 0 0 200 0 0 1 0 255 255 50 0 0 0 1 100 0 0 1 0 0 0 0 100 0 0 1 0 0 0 0 100 0 0 1 0 215 200 40 0 0 0 1

Meaning You tell it "for this input, I want this output". In the case of colors, you say "255,0,0 means red" "0,255,0 means green" and so on... and also show it some variations of the data, so that eventually you tell the computer
230, 0, 0
And the computer infers
0.89, 0.1, 0, 0.05

Which you convert into 1, 0, 0, 0 which means "Red" in the previous table.

There is also a case where maybe you tell the computer
0, 128, 255
and the computer infers
0.5, 0.4, 0.6, 0.7

Which is the computer's way of saying "I don't know" (since you can't convert this to 1 and 0's safely)

Anyway... so I'm training my neural network to do something like this, and the comptuer tells me the accumulated error is 20%. It's bad, but I say "OK, let me test this", and after much analysis I notice that for /any/ input the computer is spitting
1 0 0 0
Which is weird... until I notice that the data set I'm using to train the computer is slightly biased... about 90% of the inputs there should produce that output (if this was the previous example, that means that I'm showing the computer 90% of red, and only 10% of the other colors).

So the computer is doing what any lazy teenager would do if somehow they found out that 90% of the answers of a multiple choice test are "C"; it's saying C for everything and that's working fine for it.
Furthermore, as I said the outputs are not 1 0 0 0, but maybe 0.9, 0.01, 0.2, 0.005; so when it "learns" another color, say yellow; it fixes the output for maybe 1 test case, but that also makes the reds something like 0.89, 0.05, 0.201, 0.09; and since there are too many reds the computer decides it's wiser to have the yellows completely wrong, than the reds slightly wrong, because it adds up to a bigger error in the end.

I hope this makes sense for at least anyone out there =O

thesis