L1 regularization

Nov 25, 2008 20:57

L2 regularization is seen a way to avoid overfitting when doing regression, no more ( Read more... )

sparsity, machine_learning

Leave a comment

Comments 8

Ooh. Tell me more. serapio November 26 2008, 09:33:21 UTC
What advantage does PCA or PLS offer over L1?

I've been thinking about this too, because Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. The effect of L1 regularization is similar to the effect of OT assumptions, but less radical.

Reply

Re: Ooh. Tell me more. gustavolacerda November 27 2008, 01:54:29 UTC
If your data forms a multivariate Gaussian, PCA minimizes reconstruction error (unsupervised case) and PLS minimizes prediction error (supervised case).

While L1 returns the subset of the original variables it considers to be nonzero (you don't specify how many, though you could tweak the regularization parameter until it returns the desired number), PCA/PLS return a pre-specified number of linear mixtures of the original variables.

<< Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. >>

Please tell me more!

Reply

Re: Ooh. Tell me more. serapio December 4 2008, 06:50:22 UTC
Besides completely throwing out variables that are near zero weight, regularization (and I guess the spacial transformations of PCA and PLS do this too) also reduces the number of relevant variables in any particular case. Spreading out the weight distribution makes it easier to approximate the result by just including the largest few terms in each case ( ... )

Reply

Re: Ooh. Tell me more. gustavolacerda December 4 2008, 07:19:54 UTC
<< Besides completely throwing out variables that are near zero weight, regularization ... >>

First of all, this is *L1* regularization.
Secondly, no, not *near* zero weight. L1 methods throw out the subset of variables whose exclusion least hurts (in terms of prediction error).

Reply


Leave a comment

Up