Can you have interaction terms for both “sides” of a dummy variable in a single regression?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite
1












I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question























  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    Nov 8 at 16:35
















up vote
3
down vote

favorite
1












I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question























  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    Nov 8 at 16:35












up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!










share|cite|improve this question















I'm really not sure how to phrase my question properly, so I apologize if this has been answered elsewhere. Let's say I'm interested in using a regression to predict wage using sex and an interaction term (height), where the sex variable is 0 if a person is female.



wage = sex+ sex* height + constant + error



My understanding is that the omitted category here is a female person. What if I also wanted to investigate the effect of weight on being female as it impacts wage? Could I have a "reverse" sex term that is 1 if the person is female? Would something like this be valid:



wage = sex+ sex* height + reverse_sex * weight + constant + error



Would the omitted category still be a female person? Can I capture both interaction effects in one regression? Thanks in advance for the help!







interaction categorical-encoding






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Nov 8 at 16:08









Penguin_Knight

9,8032046




9,8032046










asked Nov 8 at 15:51









Mike

182




182











  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    Nov 8 at 16:35
















  • Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
    – EdM
    Nov 8 at 16:35















Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35




Note that there really aren't "omitted categories" with these dummy variables. Rather, the "constant" in your equation represents the value of "wage" when the values of predictor variables are 0, typically the reference value for a categorical variable. They might seem to be omitted because their names don't explicitly show up in displays of tables of regression coefficients, but they are there. The answer by @Penguin_Knight nicely shows how to proceed with the regression (including the important main effects for height, weight, etc) and significance testing.
– EdM
Nov 8 at 16:35










1 Answer
1






active

oldest

votes

















up vote
11
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    Nov 8 at 16:52










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    Nov 8 at 17:22










  • Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
    – Mike
    Nov 8 at 18:02










  • @Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
    – Penguin_Knight
    Nov 8 at 18:43











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
11
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    Nov 8 at 16:52










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    Nov 8 at 17:22










  • Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
    – Mike
    Nov 8 at 18:02










  • @Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
    – Penguin_Knight
    Nov 8 at 18:43















up vote
11
down vote



accepted










To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.






share|cite|improve this answer






















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    Nov 8 at 16:52










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    Nov 8 at 17:22










  • Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
    – Mike
    Nov 8 at 18:02










  • @Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
    – Penguin_Knight
    Nov 8 at 18:43













up vote
11
down vote



accepted







up vote
11
down vote



accepted






To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.






share|cite|improve this answer














To simplify the wording let's just call the variables male and female.



The main question aside, this is not a typical test for interaction. By specifying:



$$wage = beta_0 + beta_1 male + beta_2 male times height + epsilon$$



you are implicitly stating that height does not matter for female at all. Usually, a full interaction test should contain the variables that are used to compose the interaction:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



That way, the males have:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + epsilon$$



And the females have:



$$wage = beta_0 + beta_2 height + epsilon$$



In your version, the female will only have the constant (intercept), which could likely be a wrong specification.




Back to the question about:




wage = sex+ sex* height + reverse_sex * weight + constant + error




The actual interaction tests should then be:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 female + beta_5 weight + beta_6 female times weight + epsilon$$



A couple points here. First, male and female are completely collinear so one of them will be omitted:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



For males, these terms remain:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + epsilon$$



For females, these terms remain:



$$wage = beta_0 + beta_2 height + beta_4 weight + beta_5 female times weight + epsilon$$



So, it's technically fine, the $beta_5$ is still the extra "effect" of weight for female.



Second, this is unnecessarily complicating everything because your proposed model:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 female times weight + epsilon$$



is essentially the same as:



$$wage = beta_0 + beta_1 male + beta_2 height + beta_3 male times height + beta_4 weight + beta_5 maletimes weight + epsilon$$



The $beta_5$ will likely flip sign, but the magnitude is the same. It's basically the difference in slopes between males and females. If males' slope is $a$ smaller than females'; females' slope is $a$ bigger than males'. You'll also find the t-statistic will also flip sign, but p-values are the same. There is no need to split hair here.





Let's say I only wanted to investigate how weight affects wage on
females, but not males. Would it be possible to incorporate this in
one equation? Or would I need a separate regression for each sex?




So, let's just actually show it:



set.seed(81226)

male <- sample(c(1,0), 100, replace=T)
female <- 1 - male
weight <- rnorm(100, 150, 35)
wage <- 25000 - 5 * weight + 1 * male + 2.5 * (male * weight) +
rnorm(100, 0, 100)

m01 <- lm(wage ~ male + weight + male*weight)
summary(m01)

m02<- lm(wage ~ female + weight + female*weight)
summary(m02)

plot(weight, wage, pch=16, col=(male+1))
lines(weight[female==1], m01$fitted[female==1])
lines(weight[male==1], m01$
fitted[male==1], col="red")


The first regression using male is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24995.6097 55.7790 448.118 < 2e-16 ***
male 83.2834 73.5968 1.132 0.261
weight -5.0967 0.3627 -14.053 < 2e-16 ***
male:weight 2.0805 0.4723 4.405 2.75e-05 ***


The second regression using female is:



Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 25078.8931 48.0124 522.342 < 2e-16 ***
female -83.2834 73.5968 -1.132 0.261
weight -3.0162 0.3026 -9.969 < 2e-16 ***
female:weight -2.0805 0.4723 -4.405 2.75e-05 ***


Graphically, the relationship is:



enter image description here



The red is males, and the black is female. In the first model, female only got the coefficient -5.0967, that is the slope of the black line. The slope of the red line has an adjustment of 2.0805, which is (-5.0967 + 2.0805). The 2.0805 is then the "difference in slopes," aka, the interaction. If both lines are parallel, effect of weight on wage is the same for both sex.



Now, the second model uses female. The slope for males is -3.0162, which is actually just (-5.0967 + 2.0805) from above. The females' slope has a further adjustment of -2.0805 (notice the sign flip), ending up with -5.0967.



I hope this helps clarifying that your question "effect of weight on female" is the same as "absence of such effect of weight on male." Your proposed question sounds making sense, but to people who understand regression it is closer to a needless gesture: if males got a benefit, the females would of course suffer from the same magnitude of penalty.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Nov 8 at 17:44

























answered Nov 8 at 16:23









Penguin_Knight

9,8032046




9,8032046











  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    Nov 8 at 16:52










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    Nov 8 at 17:22










  • Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
    – Mike
    Nov 8 at 18:02










  • @Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
    – Penguin_Knight
    Nov 8 at 18:43

















  • Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
    – Mike
    Nov 8 at 16:52










  • @Mike, see the edits in the answer.
    – Penguin_Knight
    Nov 8 at 17:22










  • Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
    – Mike
    Nov 8 at 18:02










  • @Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
    – Penguin_Knight
    Nov 8 at 18:43
















Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52




Thank you for your unbelievably clear and concise response! I realize now that I left out one additional assumption. Sounds weird, but bear with me. Let's say I only wanted to investigate how weight affects wage on females, but not males. Would it be possible to incorporate this in one equation? Or would I need a separate regression for each sex?
– Mike
Nov 8 at 16:52












@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22




@Mike, see the edits in the answer.
– Penguin_Knight
Nov 8 at 17:22












Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02




Ah, I'm sorry your answer makes sense, but didn't quite address my concern. I should have been more clear with my follow-up question. I'm interested in a scenario where I'm assuming height matters for males, but not females, and weight matters for females, but not males. Does that make sense? I know it's an unusual scenario, but it's a simplification of another analysis I'm working on.
– Mike
Nov 8 at 18:02












@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43





@Mike, " I'm assuming height matters for males, but not females, and weight matters for females, but not males." In that case, do not call that product term "interaction" because it's actually not. And also be aware that i) your model constraints the association very severely and before fitting so, double check with exploratory data analysis; ii) according to your model, height matters to male and female matters to female may not actually be true, it's just because you model them that way.
– Penguin_Knight
Nov 8 at 18:43


















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f376020%2fcan-you-have-interaction-terms-for-both-sides-of-a-dummy-variable-in-a-single%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌