"There are three kinds of lies: lies, damned lies, and statistics.": improvedhuman

improvedhuman

"There are three kinds of lies: lies, damned lies, and statistics."

Jan 16, 2009 15:56

Notation:
A' means the transpose of A.
I is the identity matrix.
"Vector" means column vector unless specified otherwise.
1 is the vector with all components equal to 1.
J is the matrix with all components equal to 1.
0 is the zero vector.
E means expectation of.

In the context regression, the equation of interest is:
Y = E(Y) + e
Where Y is a vector of random variables (that represent a sample - an n-dimensional vector represents a sample of n elements) and e is a vector representing the random error in the samples.

In linear regression, the main assumption is
E(Y) = XB
Where X is a "design matrix" where each covariate is represented by a column and each row represents a sample. B is a matrix of coefficients.
e.g: for the model E(Yi) = b0 + b1 Xi and three samples (Y1,X1), (Y2,X2), (Y3,X3) we would have:
Y=
[[Y1]
[Y2]
[Y3]]

X=
[[1 X1]
[1 X2]
[1 X3]]

B=
[[b0]
[b1]]

Indeed as the example illustrates, traditionally the first column of the X matrix is all 1's.

Some interesting properties to note: For n samples and 2 columns in the design matrix:

Y'Y = the sum of squares of the Yi's.

X'X =
[[ n (sum of Xi's) ]
[ (sum of Xi's) (sum of squares of Xi's) ]]

X'Y =
[[ sum of Yi's ]
[ sum of YiXi 's ]]

The variance-covariance matrix of a random vector Y is defined by:
sigma-squared{Y} = E{[Y - E(Y)][Y - E(Y)]'}
So that the diagonal terms give the variances of the Yi's, and the rest give the covariances.

Note that for i.i.d error terms [ei] = e with variance sigma-squared, sigma-squared{e} = sigma-squared*I.

Sometimes Y is transformed by some (fixed) matrix A, in which case for W = AY
sigma-squared{W} = A sigma-squared{Y} A'

Ordinary Least Squares:
To do regression, we want to have our model "as close as possible" to the data. In order to do that, we need to first define what is meant by "close". It turns out that a good metric is the sum of the squares of the errors, that is, we want to find B such that e'e is minimal.
Recall from above that e = Y - XB.
Also recall that to find the minimum of a function we need to find its derivative and set it to zero. So, to minimize
Q = e'e = (Y - XB)'(Y - XB) = Y'Y - B'X'Y - Y'XB + B'X'XB
We need to find
dQ/dB which is just a vector of the partial derivatives of Q w.r.t. Bi's, and equals:
-2X'Y + 2X'XB
Setting this to zero and solving for B gives
B = (X'X)-1X'Y

Now that we have the coefficients, we can write the estimates
Y-hat = XB (our model pretty much). Note that we can write
Y-hat = X(X'X)-1X'Y = HY where H would be the "hat" matrix X(X'X)-1X'

That, in turn can be used for other estimates and statistics etc, etc.