Simulate many datasets in tidyr
Simulate many datasets in tidyr
I want to end up with a tidy data structure like the one below:
N | r | data | stat
---------------------------------
10 | 0.2 | <tibble> | 0.5
20 | 0.3 | <tibble> | 0.86
...
data
is generated from the parameters in the first columns and stat
is computed on data
. If I have the first two columns, how do I add tibbles of datasets?
data
stat
data
As a minimal example, here is a function to create two correlated columns:
correlated_data = function(N, r)
MASS::mvrnorm(N, mu=c(0, 4), Sigma=matrix(c(1, r, 1, r), ncol=2))
Running this for all combinations of N
and r
, I start by doing
N
r
# Make parameter combinations
expand.grid(N=c(10,20,30), r=c(0, 0.1, 0.3)) %>%
group_by(N, r) %>%
expand(set=1:100) %>% # create 100 of each combination
# HERE! How to add a N x 2 tibble to each row?
rowwise() %>%
mutate(data=correlate_data( N, r))
# Compute summary stats on each (for illustration only; not tested)
mutate(
stats = map(data, ~cor.test(.x[, 1], .x[, 2])), # Correlation on each
tidy_stats = map(stats, tidy)) # using broom package
I do have more parameters (N, r, distribution) and I will be computing more summaries. If alternative workflows are better, I welcome that as well.
dist
Try
mutate(data=list(correlate_data( N, r)))
– Andrew Gustar
Aug 29 at 22:30
mutate(data=list(correlate_data( N, r)))
or
mutate(data= map2(N,r,correlate_data))
and you don't need rowwise
– Moody_Mudskipper
Aug 29 at 22:33
mutate(data= map2(N,r,correlate_data))
rowwise
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
where is
dist
defined– akrun
Aug 29 at 21:54