Calculating means from tall data or wide data in R [duplicate]

up vote
1
down vote

favorite

This question already has an answer here:

Calculate the mean by group

3 answers

I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.

Problem

The tables as a reproducible example

The following code generates tables similar to my set, first as tall data, second as wide data.

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
 3999.066, 3999.066, 3999.066,
 3998.745, 3998.745, 3998.745,
 3998.423, 3998.423, 3998.423,
 3998.102, 3998.102, 3998.102), 
 Y=rnorm(15, mean=2, sd=1), 
 S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S 
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1 
#> 2 3999. 1.81 s2 
#> 3 3999. 4.02 s3 
#> 4 3999. 1.21 s1 
#> 5 3999. 0.771 s2 
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02

^{Created on 2018-11-08 by the reprex package (v0.2.1)}

In the tall version, each unique value of X gets repeated for however many unique values of S there are. There are 5 unique X and 3 unique S. This is much more apparent in the wide data. In my real set I have 8010 unique X and 312 unique S. The tall data is nice because I can easily plot X vs Y and get one plotted line for each S.

The Question

What if I want to average all of the Ys at each unique value of X? It would look like this:

> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33

In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.

Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.

Thanks for reading. I have a feeling this is really simple for most experienced users.

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

marked as duplicate by Gregor r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26

If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27

2

Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28

Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31

1

Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09

add a comment |

up vote
1
down vote

favorite

This question already has an answer here:

Calculate the mean by group

3 answers

I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.

Problem

The tables as a reproducible example

The following code generates tables similar to my set, first as tall data, second as wide data.

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
 3999.066, 3999.066, 3999.066,
 3998.745, 3998.745, 3998.745,
 3998.423, 3998.423, 3998.423,
 3998.102, 3998.102, 3998.102), 
 Y=rnorm(15, mean=2, sd=1), 
 S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S 
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1 
#> 2 3999. 1.81 s2 
#> 3 3999. 4.02 s3 
#> 4 3999. 1.21 s1 
#> 5 3999. 0.771 s2 
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02

^{Created on 2018-11-08 by the reprex package (v0.2.1)}

The Question

What if I want to average all of the Ys at each unique value of X? It would look like this:

> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33

In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.

Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.

Thanks for reading. I have a feeling this is really simple for most experienced users.

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

marked as duplicate by Gregor r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26

If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27

2

Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28

Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31

1

Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09

add a comment |

up vote
1
down vote

favorite

This question already has an answer here:

Calculate the mean by group

3 answers

I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.

Problem

The tables as a reproducible example

The following code generates tables similar to my set, first as tall data, second as wide data.

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
 3999.066, 3999.066, 3999.066,
 3998.745, 3998.745, 3998.745,
 3998.423, 3998.423, 3998.423,
 3998.102, 3998.102, 3998.102), 
 Y=rnorm(15, mean=2, sd=1), 
 S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S 
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1 
#> 2 3999. 1.81 s2 
#> 3 3999. 4.02 s3 
#> 4 3999. 1.21 s1 
#> 5 3999. 0.771 s2 
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02

^{Created on 2018-11-08 by the reprex package (v0.2.1)}

The Question

What if I want to average all of the Ys at each unique value of X? It would look like this:

> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33

In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.

Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.

Thanks for reading. I have a feeling this is really simple for most experienced users.

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

This question already has an answer here:

Calculate the mean by group

3 answers

I'm a beginner-intermediate R user that started learning R for laboratory research a few months ago. Thanks for your patience---especially if this ends up being a really stupid simple problem.

Problem

The tables as a reproducible example

The following code generates tables similar to my set, first as tall data, second as wide data.

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.4.4

tall <- tibble(X=c(3999.387, 3999.387, 3999.387,
 3999.066, 3999.066, 3999.066,
 3998.745, 3998.745, 3998.745,
 3998.423, 3998.423, 3998.423,
 3998.102, 3998.102, 3998.102), 
 Y=rnorm(15, mean=2, sd=1), 
 S=c("s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3","s1","s2","s3"))
head(tall)
#> # A tibble: 6 x 3
#> X Y S 
#> <dbl> <dbl> <chr>
#> 1 3999. 3.07 s1 
#> 2 3999. 1.81 s2 
#> 3 3999. 4.02 s3 
#> 4 3999. 1.21 s1 
#> 5 3999. 0.771 s2 
#> 6 3999. 2.39 s3

wide <- spread(tall,X,Y)
head(wide)
#> # A tibble: 3 x 6
#> S `3998.102` `3998.423` `3998.745` `3999.066` `3999.387`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 s1 0.454 1.50 1.84 1.21 3.07
#> 2 s2 2.04 0.392 1.50 0.771 1.81
#> 3 s3 1.38 0.992 0.790 2.39 4.02

^{Created on 2018-11-08 by the reprex package (v0.2.1)}

The Question

What if I want to average all of the Ys at each unique value of X? It would look like this:

> # A tibble: 5 x 2
> X Y
> <dbl> <dbl>
> 1 3998.102 2.29
> 2 3998.423 1.63
> 3 3999.745 1.36
> 4 3999.066 1.66
> 5 3999.387 1.33

In this case I used the wide table, calculated the mean of each X column, and then manually constructed a new table.

Can I do this with map() functions from purrr? The documentation was confusing, probably because I have never used lapply() functions before.

Thanks for reading. I have a feeling this is really simple for most experienced users.

This question already has an answer here:

Calculate the mean by group

3 answers

r dplyr reshape lapply purrr

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

edited Nov 8 at 18:05

asked Nov 8 at 16:14

M. L.

156

asked Nov 8 at 16:14

M. L.

156

asked Nov 8 at 16:14

M. L.

156

marked as duplicate by Gregor r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Gregor r
Users with the r badge can single-handedly close r questions as duplicates and reopen them as needed.

StackExchange.ready(function()
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();

);
);
);
Nov 8 at 18:26

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

1

Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26

If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27

2

Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28

Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31

1

Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09

add a comment |

1

Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26

If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27

2

Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28

Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31

1

Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09

Ok, the question is well posed, but maybe way too long. See if this gets you what you want: library(dplyr); tall %>% group_by(X) %>% summarise(mean_y = mean(Y))
– RLave
Nov 8 at 16:26

If this is correct I'll add it as an answer and explain it in detail, I just wanted to make sure that's what you need.
– RLave
Nov 8 at 16:27

Try to remove everything in eccess in the question, it'll help others too.
– RLave
Nov 8 at 16:28

Thanks, I'll adapt it your solution to my large set and see if it works. It seems like it might. I've also shorted the post. thanks, it was definitely too long.
– M. L.
Nov 8 at 16:31

Just tried this on my large data set and it works! Thanks very much for your response!
– M. L.
Nov 8 at 18:09

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

What you're looking for is the dplyr package, which is at the core of the tidyverse. I'll show you how to achieve what you're trying to do with it, but there are tons of tutorials for it on-line, and it's really quite straightforward once you understand how to use it.

require(dplyr)
group_by(tall,X) %>%
 summarize(meanY=mean(Y))

First, you can tell dplyr to do everything you want as if your data is broken into seperate data.frames based on the grouping column, in this case X.
Also, note that with dplyr you can "pipe" commands using %>%, which means the result of one command will be transferred to next one as its first argument, so you don't need to assign it every time or nest all your commands.

The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:

# A tibble: 5 x 2
 X meanY
 <dbl> <dbl>
1 3998. 0.781
2 3998. 1.81 
3 3999. 1.37 
4 3999. 2.01 
5 3999. 2.02

And that's it. You're done. It's really powerful, simple and easy to learn.
Another package you can use is data.table, but I find that it's powerfulness and conciseness comes at the expense of being a lot harder to learn (for me, anyway). It may require more lines to do things with dplyr, but it's easier for me to puzzle through the steps I need to take to achieve anything.

Good luck!

answered Nov 8 at 16:28

iod

2,4731619

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

require(dplyr)
group_by(tall,X) %>%
 summarize(meanY=mean(Y))

The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:

# A tibble: 5 x 2
 X meanY
 <dbl> <dbl>
1 3998. 0.781
2 3998. 1.81 
3 3999. 1.37 
4 3999. 2.01 
5 3999. 2.02

Good luck!

answered Nov 8 at 16:28

iod

2,4731619

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

add a comment |

up vote
1
down vote

accepted

require(dplyr)
group_by(tall,X) %>%
 summarize(meanY=mean(Y))

The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:

# A tibble: 5 x 2
 X meanY
 <dbl> <dbl>
1 3998. 0.781
2 3998. 1.81 
3 3999. 1.37 
4 3999. 2.01 
5 3999. 2.02

Good luck!

answered Nov 8 at 16:28

iod

2,4731619

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

add a comment |

up vote
1
down vote

accepted

require(dplyr)
group_by(tall,X) %>%
 summarize(meanY=mean(Y))

The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:

# A tibble: 5 x 2
 X meanY
 <dbl> <dbl>
1 3998. 0.781
2 3998. 1.81 
3 3999. 1.37 
4 3999. 2.01 
5 3999. 2.02

Good luck!

answered Nov 8 at 16:28

iod

2,4731619

require(dplyr)
group_by(tall,X) %>%
 summarize(meanY=mean(Y))

The second line creates a new table, where for every group (based on its X), it calculates the mean of all the Ys. The result is this:

# A tibble: 5 x 2
 X meanY
 <dbl> <dbl>
1 3998. 0.781
2 3998. 1.81 
3 3999. 1.37 
4 3999. 2.01 
5 3999. 2.02

Good luck!

answered Nov 8 at 16:28

iod

2,4731619

answered Nov 8 at 16:28

iod

2,4731619

answered Nov 8 at 16:28

iod

2,4731619

answered Nov 8 at 16:28

iod

2,4731619

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

add a comment |

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

Wow. This was way easier than how I was trying to do it. You and @RLave had the same solution. I have used dplyr before for filter() and select(), but did not know about group_by() %>% summarise(). Thank you so much for your response.
– M. L.
Nov 8 at 18:13

Great. Happy I could help. Look more at summarise, as well as mutate (to create new variables through calculation), and the rest of dplyr's tools. They're very useful and quite simple to understand. Don't forget to upvote and and accept!
– iod
Nov 8 at 18:15

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt

Calculating means from tall data or wide data in R [duplicate]

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

1 Answer
1

1 Answer
1

1 Answer
1

Popular posts from this blog

Edmonton

Crossroads (UK TV series)

Calculating means from tall data or wide data in R [duplicate]

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

Problem

The tables as a reproducible example

The Question

1 Answer 1

1 Answer 1

1 Answer 1

Popular posts from this blog

Edmonton

Crossroads (UK TV series)

1 Answer
1

1 Answer
1

1 Answer
1