Is there any special case where ridge regression can shrink coefficients to zero?

Are there some special cases, where the Ridge Regression can also lead to coefficients that are zero ?
It is widely known, that lasso is shrinking coefficients towards or on zero, while the ridge Regression cant shrink coefficients to zero

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

1

$begingroup$
Of course! If the least squares estimates are zero, then Ridge Regression will always produce zeros. What would be of interest is to find any other situation :-).
$endgroup$
– whuber♦
Aug 28 '18 at 18:53

$begingroup$
In which cases an ols coefficient can be exactly zero ?
$endgroup$
– Vala
Aug 28 '18 at 18:54

1

$begingroup$
This will happen whenever the response variable is orthogonal to each of the explanatory variables.
$endgroup$
– whuber♦
Aug 28 '18 at 18:56

$begingroup$
Would there be also the requirement that the predictors are orthogonal to each other, or would it be enough if just the correlation to the respone is zero
$endgroup$
– Vala
Aug 28 '18 at 18:57

1

$begingroup$
See also stats.stackexchange.com/questions/74542/… for an explanation of why Ridge cannot shrink the parameters to zero (unless they start there, as @whuber observes.)
$endgroup$
– jbowman
Aug 28 '18 at 20:14

|
show 4 more comments

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

1

$begingroup$
Of course! If the least squares estimates are zero, then Ridge Regression will always produce zeros. What would be of interest is to find any other situation :-).
$endgroup$
– whuber♦
Aug 28 '18 at 18:53

$begingroup$
In which cases an ols coefficient can be exactly zero ?
$endgroup$
– Vala
Aug 28 '18 at 18:54

1

$begingroup$
This will happen whenever the response variable is orthogonal to each of the explanatory variables.
$endgroup$
– whuber♦
Aug 28 '18 at 18:56

$begingroup$
Would there be also the requirement that the predictors are orthogonal to each other, or would it be enough if just the correlation to the respone is zero
$endgroup$
– Vala
Aug 28 '18 at 18:57

1

$begingroup$
See also stats.stackexchange.com/questions/74542/… for an explanation of why Ridge cannot shrink the parameters to zero (unless they start there, as @whuber observes.)
$endgroup$
– jbowman
Aug 28 '18 at 20:14

|
show 4 more comments

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

machine-learning lasso ridge-regression

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

edited Aug 28 '18 at 19:00

asked Aug 28 '18 at 18:48

Vala

415

asked Aug 28 '18 at 18:48

Vala

415

asked Aug 28 '18 at 18:48

Vala

415

1

$begingroup$
Of course! If the least squares estimates are zero, then Ridge Regression will always produce zeros. What would be of interest is to find any other situation :-).
$endgroup$
– whuber♦
Aug 28 '18 at 18:53

$begingroup$
In which cases an ols coefficient can be exactly zero ?
$endgroup$
– Vala
Aug 28 '18 at 18:54

1

$begingroup$
This will happen whenever the response variable is orthogonal to each of the explanatory variables.
$endgroup$
– whuber♦
Aug 28 '18 at 18:56

$begingroup$
Would there be also the requirement that the predictors are orthogonal to each other, or would it be enough if just the correlation to the respone is zero
$endgroup$
– Vala
Aug 28 '18 at 18:57

1

$begingroup$
See also stats.stackexchange.com/questions/74542/… for an explanation of why Ridge cannot shrink the parameters to zero (unless they start there, as @whuber observes.)
$endgroup$
– jbowman
Aug 28 '18 at 20:14

|
show 4 more comments

1

$begingroup$
Of course! If the least squares estimates are zero, then Ridge Regression will always produce zeros. What would be of interest is to find any other situation :-).
$endgroup$
– whuber♦
Aug 28 '18 at 18:53

$begingroup$
In which cases an ols coefficient can be exactly zero ?
$endgroup$
– Vala
Aug 28 '18 at 18:54

1

$begingroup$
This will happen whenever the response variable is orthogonal to each of the explanatory variables.
$endgroup$
– whuber♦
Aug 28 '18 at 18:56

$begingroup$
Would there be also the requirement that the predictors are orthogonal to each other, or would it be enough if just the correlation to the respone is zero
$endgroup$
– Vala
Aug 28 '18 at 18:57

1

$begingroup$
See also stats.stackexchange.com/questions/74542/… for an explanation of why Ridge cannot shrink the parameters to zero (unless they start there, as @whuber observes.)
$endgroup$
– jbowman
Aug 28 '18 at 20:14

Of course! If the least squares estimates are zero, then Ridge Regression will always produce zeros. What would be of interest is to find any other situation :-).

– whuber♦
Aug 28 '18 at 18:53

In which cases an ols coefficient can be exactly zero ?

– Vala
Aug 28 '18 at 18:54

This will happen whenever the response variable is orthogonal to each of the explanatory variables.

– whuber♦
Aug 28 '18 at 18:56

Would there be also the requirement that the predictors are orthogonal to each other, or would it be enough if just the correlation to the respone is zero

– Vala
Aug 28 '18 at 18:57

See also stats.stackexchange.com/questions/74542/… for an explanation of why Ridge cannot shrink the parameters to zero (unless they start there, as @whuber observes.)

– jbowman
Aug 28 '18 at 20:14

|
show 4 more comments

1 Answer
1

active

oldest

votes

Suppose, as in the case of least squares methods, you are trying to solve a statistical estimation problem for a (vector-valued) parameter $beta$ by minimizing an objective function $Q(beta)$ (such as the sum of squares of the residuals). Ridge Regression "regularizes" the problem by adding a non-negative linear combination of the squares of the parameter, $P(beta).$ $P$ is (obviously) differentiable with a unique global minimum at $beta=0.$

The question asks, when is it possible for the global minimum of $Q+P$ to occur at $beta=0$? Assume, as in least squares methods, that $Q$ is differentiable in a neighborhood of $0.$ Because $0$ is a global minimum for $Q+P$ it is a local minimum, implying all its partial derivatives are $0.$ The sum rule of differentiation implies

$$fracpartialpartial beta_i(Q(beta) + P(beta)) = fracpartialpartial beta_iQ(beta) + fracpartialpartial beta_iP(beta) = Q_i(beta) + P_i(beta)$$
is zero at $beta=0.$ But since $P_i(0)=0$ for all $i,$ this implies $Q_i(0)=0$ for all $i,$ which makes $0$ at least a local minimum for the original objective function $Q.$ In the case of any least squares technique every local minimum is also a global minimum. This compels us to conclude that

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $beta=0$ as a solution if and only if $beta=0$ is a solution of the original unregularized problem.

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

1

$begingroup$
As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero
$endgroup$
– Vala
Aug 28 '18 at 19:14

2

$begingroup$
Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.
$endgroup$
– whuber♦
Aug 28 '18 at 19:20

$begingroup$
There will be also close to zero, but cant get exactly zero, Right ?
$endgroup$
– Vala
Aug 28 '18 at 19:34

$begingroup$
Please re-read the conclusion of my answer: I can't think of any way to make it clearer.
$endgroup$
– whuber♦
Aug 28 '18 at 19:36

$begingroup$
Regarding to your final conclusion it should be correct
$endgroup$
– Vala
Aug 28 '18 at 19:39

add a comment |

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f364396%2fis-there-any-special-case-where-ridge-regression-can-shrink-coefficients-to-zero%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $beta=0$ as a solution if and only if $beta=0$ is a solution of the original unregularized problem.

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

1

$begingroup$
As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero
$endgroup$
– Vala
Aug 28 '18 at 19:14

2

$begingroup$
Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.
$endgroup$
– whuber♦
Aug 28 '18 at 19:20

$begingroup$
There will be also close to zero, but cant get exactly zero, Right ?
$endgroup$
– Vala
Aug 28 '18 at 19:34

$begingroup$
Please re-read the conclusion of my answer: I can't think of any way to make it clearer.
$endgroup$
– whuber♦
Aug 28 '18 at 19:36

$begingroup$
Regarding to your final conclusion it should be correct
$endgroup$
– Vala
Aug 28 '18 at 19:39

add a comment |

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $beta=0$ as a solution if and only if $beta=0$ is a solution of the original unregularized problem.

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

1

$begingroup$
As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero
$endgroup$
– Vala
Aug 28 '18 at 19:14

2

$begingroup$
Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.
$endgroup$
– whuber♦
Aug 28 '18 at 19:20

$begingroup$
There will be also close to zero, but cant get exactly zero, Right ?
$endgroup$
– Vala
Aug 28 '18 at 19:34

$begingroup$
Please re-read the conclusion of my answer: I can't think of any way to make it clearer.
$endgroup$
– whuber♦
Aug 28 '18 at 19:36

$begingroup$
Regarding to your final conclusion it should be correct
$endgroup$
– Vala
Aug 28 '18 at 19:39

add a comment |

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $beta=0$ as a solution if and only if $beta=0$ is a solution of the original unregularized problem.

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

Quadratic regularization of Least Squares procedures ("Ridge Regression") has $beta=0$ as a solution if and only if $beta=0$ is a solution of the original unregularized problem.

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

answered Aug 28 '18 at 19:08

whuber♦

206k33453821

1

$begingroup$
As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero
$endgroup$
– Vala
Aug 28 '18 at 19:14

2

$begingroup$
Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.
$endgroup$
– whuber♦
Aug 28 '18 at 19:20

$begingroup$
There will be also close to zero, but cant get exactly zero, Right ?
$endgroup$
– Vala
Aug 28 '18 at 19:34

$begingroup$
Please re-read the conclusion of my answer: I can't think of any way to make it clearer.
$endgroup$
– whuber♦
Aug 28 '18 at 19:36

$begingroup$
Regarding to your final conclusion it should be correct
$endgroup$
– Vala
Aug 28 '18 at 19:39

add a comment |

1

$begingroup$
As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero
$endgroup$
– Vala
Aug 28 '18 at 19:14

2

$begingroup$
Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.
$endgroup$
– whuber♦
Aug 28 '18 at 19:20

$begingroup$
There will be also close to zero, but cant get exactly zero, Right ?
$endgroup$
– Vala
Aug 28 '18 at 19:34

$begingroup$
Please re-read the conclusion of my answer: I can't think of any way to make it clearer.
$endgroup$
– whuber♦
Aug 28 '18 at 19:36

$begingroup$
Regarding to your final conclusion it should be correct
$endgroup$
– Vala
Aug 28 '18 at 19:39

As pointed out by Martijn Weterings, it would also shrink coefficients to zero if t=0, or lambda converges to infinity. Regarding to the latter one: Could Ridge shrink a coefficient to zero for a sufficient large Tuning Parameter or is it just a theoretical concept that if lambda converges to infinity then the coefficient will be converge also to zero

– Vala
Aug 28 '18 at 19:14

Lambda going to infinity is the equivalent of minimizing $Q/lambda + P.$ I hope it's easy to see that for sufficiently large $lambda$ the solutions will have to be close to $beta=0,$ guaranteeing convergence to $beta=0$ in the limit.

– whuber♦
Aug 28 '18 at 19:20

There will be also close to zero, but cant get exactly zero, Right ?

– Vala
Aug 28 '18 at 19:34

Please re-read the conclusion of my answer: I can't think of any way to make it clearer.

– whuber♦
Aug 28 '18 at 19:36

Regarding to your final conclusion it should be correct

– Vala
Aug 28 '18 at 19:39

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here
The name of the picture

Clash Royale CLAN TAG#URR8PPP

搜尋此網誌

Dfyjkt