Ooh. Tell me more.serapioNovember 26 2008, 09:33:21 UTC
What advantage does PCA or PLS offer over L1?
I've been thinking about this too, because Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. The effect of L1 regularization is similar to the effect of OT assumptions, but less radical.
Re: Ooh. Tell me more.gustavolacerdaNovember 27 2008, 01:54:29 UTC
If your data forms a multivariate Gaussian, PCA minimizes reconstruction error (unsupervised case) and PLS minimizes prediction error (supervised case).
While L1 returns the subset of the original variables it considers to be nonzero (you don't specify how many, though you could tweak the regularization parameter until it returns the desired number), PCA/PLS return a pre-specified number of linear mixtures of the original variables.
<< Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. >>
Re: Ooh. Tell me more.serapioDecember 4 2008, 06:50:22 UTC
Besides completely throwing out variables that are near zero weight, regularization (and I guess the spacial transformations of PCA and PLS do this too) also reduces the number of relevant variables in any particular case. Spreading out the weight distribution makes it easier to approximate the result by just including the largest few terms in each case
( ... )
Re: Ooh. Tell me more.gustavolacerdaDecember 4 2008, 07:19:54 UTC
<< Besides completely throwing out variables that are near zero weight, regularization ... >>
First of all, this is *L1* regularization. Secondly, no, not *near* zero weight. L1 methods throw out the subset of variables whose exclusion least hurts (in terms of prediction error).
Comments 8
I've been thinking about this too, because Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. The effect of L1 regularization is similar to the effect of OT assumptions, but less radical.
Reply
While L1 returns the subset of the original variables it considers to be nonzero (you don't specify how many, though you could tweak the regularization parameter until it returns the desired number), PCA/PLS return a pre-specified number of linear mixtures of the original variables.
<< Optimality Theory, the favorite model of a lot of linguists, is something like a crude approximation of a logistic regression, with assumptions that radically reduce the number of active variables. >>
Please tell me more!
Reply
Reply
First of all, this is *L1* regularization.
Secondly, no, not *near* zero weight. L1 methods throw out the subset of variables whose exclusion least hurts (in terms of prediction error).
Reply
Leave a comment