Can you have interaction terms for both “sides” of a dummy variable in a single regression?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.

wage = sex+ sex* height + constant + error

My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35

add a comment |

up vote
3
down vote

favorite

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35

add a comment |

up vote
3
down vote

favorite

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

wage = sex+ sex* height + constant + error

wage = sex+ sex* height + reverse_sex * weight + constant + error

Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!

interaction categorical-encoding

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

edited Nov 8 at 16:08

Penguin_Knight

9,8032046

asked Nov 8 at 15:51

Mike

182

asked Nov 8 at 15:51

Mike

182

asked Nov 8 at 15:51

Mike

182

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35

add a comment |

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35

Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35

add a comment |

1 Answer
1

active

oldest

votes

up vote
11
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.

Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.

I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
11
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

add a comment |

up vote
11
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

add a comment |

up vote
11
down vote

accepted

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

To simplify the wording let's just call the variables male and female.

The main question aside, this is not a typical test for interaction. By specifying:

$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$

you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

That way, the males have:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$

And the females have:

$$wage = beta_0 + beta_2 height + epsilon$$

In your version, the female will only have the constant (intercept), which could likely be a wrong specification.

Back to the question about:

wage = sex+ sex* height + reverse_sex * weight + constant + error

The actual interaction tests should then be:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$

A couple points here. First, male and female are completely collinear so one of them will be omitted:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

For males, these terms remain:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$

For females, these terms remain:

$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$

So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.

Second, this is unnecessarily complicating everything because your proposed model:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$

is essentially the same as:

$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$

Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?

So, let's just actually show it:

set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
 rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$fitted[male==1], col="red")

The first regression using male is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261 
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***

The second regression using female is:

Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261 
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***

Graphically, the relationship is:

enter image description here

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

edited Nov 8 at 17:44

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

answered Nov 8 at 16:23

Penguin_Knight

9,8032046

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

add a comment |

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52

@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22

Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02

@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt