Calculating means from tall data or wide data in R [duplicate]
up vote
1
down vote
favorite
This question already has an answer here:
Calculate the mean by group
3 answers
I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.
Problem
The tables as a reproducible example
The following code generates tables similar to my set, first as tall data, second as wide data.
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3
wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02
Created on 2018-11-08 by the reprex package (v0.2.1)
In the tall version, each unique value of X
gets repeated for however many unique values of S
there are. There are 5 unique X
and 3 unique S
. This is much more apparent in the wide data. In my real set I have 8010 unique X
and 312 unique S
. The tall data is nice because I can easily plot X
vs Y
and get one plotted line for each S
.
The Question
What if I want to average all of the Y
s at each unique value of X
? It would look like this:
> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33
In this case I used the wide table, calculated the mean of each X
column, and then manually constructed a new table.
Can I do this with map()
functions from purrr
? The documentation was confusing, probably because I have never used lapply()
functions before.
Thanks for reading. I have a feeling this is really simple for most experienced users.
r dplyr reshape lapply purrr
marked as duplicate by Gregor
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 8 at 18:26
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
up vote
1
down vote
favorite
This question already has an answer here:
Calculate the mean by group
3 answers
I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.
Problem
The tables as a reproducible example
The following code generates tables similar to my set, first as tall data, second as wide data.
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3
wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02
Created on 2018-11-08 by the reprex package (v0.2.1)
In the tall version, each unique value of X
gets repeated for however many unique values of S
there are. There are 5 unique X
and 3 unique S
. This is much more apparent in the wide data. In my real set I have 8010 unique X
and 312 unique S
. The tall data is nice because I can easily plot X
vs Y
and get one plotted line for each S
.
The Question
What if I want to average all of the Y
s at each unique value of X
? It would look like this:
> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33
In this case I used the wide table, calculated the mean of each X
column, and then manually constructed a new table.
Can I do this with map()
functions from purrr
? The documentation was confusing, probably because I have never used lapply()
functions before.
Thanks for reading. I have a feeling this is really simple for most experienced users.
r dplyr reshape lapply purrr
marked as duplicate by Gregor
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 8 at 18:26
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
2
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
1
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
This question already has an answer here:
Calculate the mean by group
3 answers
I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.
Problem
The tables as a reproducible example
The following code generates tables similar to my set, first as tall data, second as wide data.
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3
wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02
Created on 2018-11-08 by the reprex package (v0.2.1)
In the tall version, each unique value of X
gets repeated for however many unique values of S
there are. There are 5 unique X
and 3 unique S
. This is much more apparent in the wide data. In my real set I have 8010 unique X
and 312 unique S
. The tall data is nice because I can easily plot X
vs Y
and get one plotted line for each S
.
The Question
What if I want to average all of the Y
s at each unique value of X
? It would look like this:
> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33
In this case I used the wide table, calculated the mean of each X
column, and then manually constructed a new table.
Can I do this with map()
functions from purrr
? The documentation was confusing, probably because I have never used lapply()
functions before.
Thanks for reading. I have a feeling this is really simple for most experienced users.
r dplyr reshape lapply purrr
This question already has an answer here:
Calculate the mean by group
3 answers
I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.
Problem
The tables as a reproducible example
The following code generates tables similar to my set, first as tall data, second as wide data.
library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4
tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3
wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02
Created on 2018-11-08 by the reprex package (v0.2.1)
In the tall version, each unique value of X
gets repeated for however many unique values of S
there are. There are 5 unique X
and 3 unique S
. This is much more apparent in the wide data. In my real set I have 8010 unique X
and 312 unique S
. The tall data is nice because I can easily plot X
vs Y
and get one plotted line for each S
.
The Question
What if I want to average all of the Y
s at each unique value of X
? It would look like this:
> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33
In this case I used the wide table, calculated the mean of each X
column, and then manually constructed a new table.
Can I do this with map()
functions from purrr
? The documentation was confusing, probably because I have never used lapply()
functions before.
Thanks for reading. I have a feeling this is really simple for most experienced users.
This question already has an answer here:
Calculate the mean by group
3 answers
r dplyr reshape lapply purrr
r dplyr reshape lapply purrr
edited Nov 8 at 18:05
asked Nov 8 at 16:14
M. L.
156
156
marked as duplicate by Gregor
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 8 at 18:26
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Gregor
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Nov 8 at 18:26
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
2
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
1
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09
add a comment |
1
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
2
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
1
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09
1
1
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:
library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:
library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
2
2
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
1
1
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
What you're looking for is the dplyr
package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.
require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))
First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%
, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.
The second line creates a new table, where for every group (based on its X), it calculates the mean
of all the Y
s. The result is this:
# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02
And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table
, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.
Good luck!
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have useddplyr
before forfilter()
andselect()
, but did not know aboutgroup_by() %>% summarise()
. Thank you so much for your response.
– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more atsummarise
, as well asmutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
What you're looking for is the dplyr
package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.
require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))
First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%
, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.
The second line creates a new table, where for every group (based on its X), it calculates the mean
of all the Y
s. The result is this:
# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02
And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table
, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.
Good luck!
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have useddplyr
before forfilter()
andselect()
, but did not know aboutgroup_by() %>% summarise()
. Thank you so much for your response.
– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more atsummarise
, as well asmutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15
add a comment |
up vote
1
down vote
accepted
What you're looking for is the dplyr
package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.
require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))
First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%
, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.
The second line creates a new table, where for every group (based on its X), it calculates the mean
of all the Y
s. The result is this:
# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02
And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table
, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.
Good luck!
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have useddplyr
before forfilter()
andselect()
, but did not know aboutgroup_by() %>% summarise()
. Thank you so much for your response.
– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more atsummarise
, as well asmutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
What you're looking for is the dplyr
package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.
require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))
First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%
, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.
The second line creates a new table, where for every group (based on its X), it calculates the mean
of all the Y
s. The result is this:
# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02
And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table
, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.
Good luck!
What you're looking for is the dplyr
package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.
require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))
First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%
, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.
The second line creates a new table, where for every group (based on its X), it calculates the mean
of all the Y
s. The result is this:
# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02
And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table
, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.
Good luck!
answered Nov 8 at 16:28
iod
2,4731619
2,4731619
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have useddplyr
before forfilter()
andselect()
, but did not know aboutgroup_by() %>% summarise()
. Thank you so much for your response.
– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more atsummarise
, as well asmutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15
add a comment |
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have useddplyr
before forfilter()
andselect()
, but did not know aboutgroup_by() %>% summarise()
. Thank you so much for your response.
– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more atsummarise
, as well asmutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used
dplyr
before for filter()
and select()
, but did not know about group_by() %>% summarise()
. Thank you so much for your response.– M. L.
Nov 8 at 18:13
Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used
dplyr
before for filter()
and select()
, but did not know about group_by() %>% summarise()
. Thank you so much for your response.– M. L.
Nov 8 at 18:13
Great. Happy I could help. Look more at
summarise
, as well as mutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!– iod
Nov 8 at 18:15
Great. Happy I could help. Look more at
summarise
, as well as mutate
(to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!– iod
Nov 8 at 18:15
add a comment |
1
Ok, the question is well posed, but maybe way too long. See if this gets you what you want:
library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26
If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27
2
Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28
Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31
1
Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09