Calculating means from tall data or wide data in R [duplicate]









up vote
1
down vote

favorite













This question already has an answer here:



  • Calculate the mean by group

    3 answers



I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.



Problem



The tables as a reproducible example



The following code generates tables similar to my set, first as tall data, second as wide data.





library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02


Created on 2018-11-08 by the reprex package (v0.2.1)



In the tall version, each unique value of X gets repeated for however many unique values of S there are. There are 5 unique X and 3 unique S. This is much more apparent in the wide data. In my real set I have 8010 unique X and 312 unique S. The tall data is nice because I can easily plot X vs Y and get one plotted line for each S.



The Question



What if I want to average all of the Ys at each unique value of X? It would look like this:



> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33


In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.



Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.



Thanks for reading. I have a feeling this is really simple for most experienced users.










share|improve this question















marked as duplicate by Gregor r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1




    Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
    – RLave
    Nov 8 at 16:26










  • If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
    – RLave
    Nov 8 at 16:27






  • 2




    Try to remove everything in eccess in the question, it'll help others too.
    – RLave
    Nov 8 at 16:28










  • Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
    – M. L.
    Nov 8 at 16:31






  • 1




    Just tried this on my large data set and it works! Thanks very much for your response!
    – M. L.
    Nov 8 at 18:09















up vote
1
down vote

favorite













This question already has an answer here:



  • Calculate the mean by group

    3 answers



I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.



Problem



The tables as a reproducible example



The following code generates tables similar to my set, first as tall data, second as wide data.





library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02


Created on 2018-11-08 by the reprex package (v0.2.1)



In the tall version, each unique value of X gets repeated for however many unique values of S there are. There are 5 unique X and 3 unique S. This is much more apparent in the wide data. In my real set I have 8010 unique X and 312 unique S. The tall data is nice because I can easily plot X vs Y and get one plotted line for each S.



The Question



What if I want to average all of the Ys at each unique value of X? It would look like this:



> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33


In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.



Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.



Thanks for reading. I have a feeling this is really simple for most experienced users.










share|improve this question















marked as duplicate by Gregor r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.










  • 1




    Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
    – RLave
    Nov 8 at 16:26










  • If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
    – RLave
    Nov 8 at 16:27






  • 2




    Try to remove everything in eccess in the question, it'll help others too.
    – RLave
    Nov 8 at 16:28










  • Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
    – M. L.
    Nov 8 at 16:31






  • 1




    Just tried this on my large data set and it works! Thanks very much for your response!
    – M. L.
    Nov 8 at 18:09













up vote
1
down vote

favorite









up vote
1
down vote

favorite












This question already has an answer here:



  • Calculate the mean by group

    3 answers



I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.



Problem



The tables as a reproducible example



The following code generates tables similar to my set, first as tall data, second as wide data.





library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02


Created on 2018-11-08 by the reprex package (v0.2.1)



In the tall version, each unique value of X gets repeated for however many unique values of S there are. There are 5 unique X and 3 unique S. This is much more apparent in the wide data. In my real set I have 8010 unique X and 312 unique S. The tall data is nice because I can easily plot X vs Y and get one plotted line for each S.



The Question



What if I want to average all of the Ys at each unique value of X? It would look like this:



> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33


In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.



Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.



Thanks for reading. I have a feeling this is really simple for most experienced users.










share|improve this question
















This question already has an answer here:



  • Calculate the mean by group

    3 answers



I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.



Problem



The tables as a reproducible example



The following code generates tables similar to my set, first as tall data, second as wide data.





library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
3999.066, 3999.066, 3999.066,
3998.745, 3998.745, 3998.745,
3998.423, 3998.423, 3998.423,
3998.102, 3998.102, 3998.102),
Y=rnorm(15, mean=2, sd=1),
S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1
#> 2 3999. 1.81 s2
#> 3 3999. 4.02 s3
#> 4 3999. 1.21 s1
#> 5 3999. 0.771 s2
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02


Created on 2018-11-08 by the reprex package (v0.2.1)



In the tall version, each unique value of X gets repeated for however many unique values of S there are. There are 5 unique X and 3 unique S. This is much more apparent in the wide data. In my real set I have 8010 unique X and 312 unique S. The tall data is nice because I can easily plot X vs Y and get one plotted line for each S.



The Question



What if I want to average all of the Ys at each unique value of X? It would look like this:



> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33


In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.



Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.



Thanks for reading. I have a feeling this is really simple for most experienced users.





This question already has an answer here:



  • Calculate the mean by group

    3 answers







r dplyr reshape lapply purrr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 18:05

























asked Nov 8 at 16:14









M. L.

156




156




marked as duplicate by Gregor r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.






marked as duplicate by Gregor r
Users with the  r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









  • 1




    Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
    – RLave
    Nov 8 at 16:26










  • If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
    – RLave
    Nov 8 at 16:27






  • 2




    Try to remove everything in eccess in the question, it'll help others too.
    – RLave
    Nov 8 at 16:28










  • Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
    – M. L.
    Nov 8 at 16:31






  • 1




    Just tried this on my large data set and it works! Thanks very much for your response!
    – M. L.
    Nov 8 at 18:09













  • 1




    Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
    – RLave
    Nov 8 at 16:26










  • If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
    – RLave
    Nov 8 at 16:27






  • 2




    Try to remove everything in eccess in the question, it'll help others too.
    – RLave
    Nov 8 at 16:28










  • Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
    – M. L.
    Nov 8 at 16:31






  • 1




    Just tried this on my large data set and it works! Thanks very much for your response!
    – M. L.
    Nov 8 at 18:09








1




1




Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26




Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26












If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27




If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27




2




2




Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28




Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28












Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31




Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31




1




1




Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09





Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09













1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.



require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))


First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.



The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:



# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02


And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.



Good luck!






share|improve this answer




















  • Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
    – M. L.
    Nov 8 at 18:13











  • Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
    – iod
    Nov 8 at 18:15


















1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.



require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))


First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.



The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:



# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02


And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.



Good luck!






share|improve this answer




















  • Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
    – M. L.
    Nov 8 at 18:13











  • Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
    – iod
    Nov 8 at 18:15















up vote
1
down vote



accepted










What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.



require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))


First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.



The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:



# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02


And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.



Good luck!






share|improve this answer




















  • Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
    – M. L.
    Nov 8 at 18:13











  • Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
    – iod
    Nov 8 at 18:15













up vote
1
down vote



accepted







up vote
1
down vote



accepted






What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.



require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))


First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.



The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:



# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02


And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.



Good luck!






share|improve this answer












What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.



require(dplyr)
group_by(tall,X) %>%
summarize(meanY=mean(Y))


First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.



The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:



# A tibble: 5 x 2
X meanY
<dbl> <dbl>
1 3998. 0.781
2 3998. 1.81
3 3999. 1.37
4 3999. 2.01
5 3999. 2.02


And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.



Good luck!







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 8 at 16:28









iod

2,4731619




2,4731619











  • Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
    – M. L.
    Nov 8 at 18:13











  • Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
    – iod
    Nov 8 at 18:15

















  • Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
    – M. L.
    Nov 8 at 18:13











  • Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
    – iod
    Nov 8 at 18:15
















Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13





Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13













Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15





Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15




Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ữḛḳṊẴ ẋ,Ẩṙ,ỹḛẪẠứụỿṞṦ,Ṉẍừ,ứ Ị,Ḵ,ṏ ṇỪḎḰṰọửḊ ṾḨḮữẑỶṑỗḮṣṉẃ Ữẩụ,ṓ,ḹẕḪḫỞṿḭ ỒṱṨẁṋṜ ḅẈ ṉ ứṀḱṑỒḵ,ḏ,ḊḖỹẊ Ẻḷổ,ṥ ẔḲẪụḣể Ṱ ḭỏựẶ Ồ Ṩ,ẂḿṡḾồ ỗṗṡịṞẤḵṽẃ ṸḒẄẘ,ủẞẵṦṟầṓế

⃀⃉⃄⃅⃍,⃂₼₡₰⃉₡₿₢⃉₣⃄₯⃊₮₼₹₱₦₷⃄₪₼₶₳₫⃍₽ ₫₪₦⃆₠₥⃁₸₴₷⃊₹⃅⃈₰⃁₫ ⃎⃍₩₣₷ ₻₮⃊⃀⃄⃉₯,⃏⃊,₦⃅₪,₼⃀₾₧₷₾ ₻ ₸₡ ₾,₭⃈₴⃋,€⃁,₩ ₺⃌⃍⃁₱⃋⃋₨⃊⃁⃃₼,⃎,₱⃍₲₶₡ ⃍⃅₶₨₭,⃉₭₾₡₻⃀ ₼₹⃅₹,₻₭ ⃌