Halting drake plan makes it rebuild targets it already had built previously
I'm currently using drake
to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.
Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake
for its intended purpose.
I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake
sequentially add targets to its cache but that seems utterly hackish.
Any ideas on how to deal with this?
EDIT: The actual problem seems to come from using set.seed
inside my data generating functions. I was aware that drake
already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake
would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.
r ropensci drake-r-package
add a comment |
I'm currently using drake
to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.
Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake
for its intended purpose.
I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake
sequentially add targets to its cache but that seems utterly hackish.
Any ideas on how to deal with this?
EDIT: The actual problem seems to come from using set.seed
inside my data generating functions. I was aware that drake
already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake
would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.
r ropensci drake-r-package
Would you post a small reproducible example? I am having trouble picturing how your use ofset.seed()
confuseddrake
into incorrectly invalidating targets.
– landau
Nov 13 '18 at 1:10
By any chance, did you call those data-generating functions outside the plan beforemake()
?
– landau
Nov 13 '18 at 1:20
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32
add a comment |
I'm currently using drake
to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.
Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake
for its intended purpose.
I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake
sequentially add targets to its cache but that seems utterly hackish.
Any ideas on how to deal with this?
EDIT: The actual problem seems to come from using set.seed
inside my data generating functions. I was aware that drake
already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake
would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.
r ropensci drake-r-package
I'm currently using drake
to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.
Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake
for its intended purpose.
I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake
sequentially add targets to its cache but that seems utterly hackish.
Any ideas on how to deal with this?
EDIT: The actual problem seems to come from using set.seed
inside my data generating functions. I was aware that drake
already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake
would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.
r ropensci drake-r-package
r ropensci drake-r-package
edited Nov 13 '18 at 8:49
landau
1,2261022
1,2261022
asked Nov 12 '18 at 18:59
zipzapboingzipzapboing
1269
1269
Would you post a small reproducible example? I am having trouble picturing how your use ofset.seed()
confuseddrake
into incorrectly invalidating targets.
– landau
Nov 13 '18 at 1:10
By any chance, did you call those data-generating functions outside the plan beforemake()
?
– landau
Nov 13 '18 at 1:20
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32
add a comment |
Would you post a small reproducible example? I am having trouble picturing how your use ofset.seed()
confuseddrake
into incorrectly invalidating targets.
– landau
Nov 13 '18 at 1:10
By any chance, did you call those data-generating functions outside the plan beforemake()
?
– landau
Nov 13 '18 at 1:20
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32
Would you post a small reproducible example? I am having trouble picturing how your use of
set.seed()
confused drake
into incorrectly invalidating targets.– landau
Nov 13 '18 at 1:10
Would you post a small reproducible example? I am having trouble picturing how your use of
set.seed()
confused drake
into incorrectly invalidating targets.– landau
Nov 13 '18 at 1:10
By any chance, did you call those data-generating functions outside the plan before
make()
?– landau
Nov 13 '18 at 1:20
By any chance, did you call those data-generating functions outside the plan before
make()
?– landau
Nov 13 '18 at 1:20
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32
add a comment |
1 Answer
1
active
oldest
votes
To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.
Let's say you have a script that generates a drake
plan and executes it.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
Created on 2018-11-12 by the reprex package (v0.2.1)
The second make()
worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed
arguments to simulate_data()
would be different, so all your targets would build from scratch.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
Created on 2018-11-12 by the reprex package (v0.2.1)
One solution is to be extra careful to hold onto the same plan
. However, there is an even easier way: just let drake
set the seeds for you. drake
automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed
argument to make()
) and the names of the targets.
library(digest)
library(drake)
library(magrittr) # defines %>%
simulate_data <- function()
mean(rnorm(100))
plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()
tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"
make(plan)
#> target target_1
#> target target_2
#> target target_3
# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181
# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201
Created on 2018-11-12 by the reprex package (v0.2.1)
I really should write more in the manual about how seeds work in drake
and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268458%2fhalting-drake-plan-makes-it-rebuild-targets-it-already-had-built-previously%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.
Let's say you have a script that generates a drake
plan and executes it.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
Created on 2018-11-12 by the reprex package (v0.2.1)
The second make()
worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed
arguments to simulate_data()
would be different, so all your targets would build from scratch.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
Created on 2018-11-12 by the reprex package (v0.2.1)
One solution is to be extra careful to hold onto the same plan
. However, there is an even easier way: just let drake
set the seeds for you. drake
automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed
argument to make()
) and the names of the targets.
library(digest)
library(drake)
library(magrittr) # defines %>%
simulate_data <- function()
mean(rnorm(100))
plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()
tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"
make(plan)
#> target target_1
#> target target_2
#> target target_3
# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181
# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201
Created on 2018-11-12 by the reprex package (v0.2.1)
I really should write more in the manual about how seeds work in drake
and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
add a comment |
To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.
Let's say you have a script that generates a drake
plan and executes it.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
Created on 2018-11-12 by the reprex package (v0.2.1)
The second make()
worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed
arguments to simulate_data()
would be different, so all your targets would build from scratch.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
Created on 2018-11-12 by the reprex package (v0.2.1)
One solution is to be extra careful to hold onto the same plan
. However, there is an even easier way: just let drake
set the seeds for you. drake
automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed
argument to make()
) and the names of the targets.
library(digest)
library(drake)
library(magrittr) # defines %>%
simulate_data <- function()
mean(rnorm(100))
plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()
tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"
make(plan)
#> target target_1
#> target target_2
#> target target_3
# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181
# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201
Created on 2018-11-12 by the reprex package (v0.2.1)
I really should write more in the manual about how seeds work in drake
and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
add a comment |
To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.
Let's say you have a script that generates a drake
plan and executes it.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
Created on 2018-11-12 by the reprex package (v0.2.1)
The second make()
worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed
arguments to simulate_data()
would be different, so all your targets would build from scratch.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
Created on 2018-11-12 by the reprex package (v0.2.1)
One solution is to be extra careful to hold onto the same plan
. However, there is an even easier way: just let drake
set the seeds for you. drake
automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed
argument to make()
) and the names of the targets.
library(digest)
library(drake)
library(magrittr) # defines %>%
simulate_data <- function()
mean(rnorm(100))
plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()
tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"
make(plan)
#> target target_1
#> target target_2
#> target target_3
# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181
# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201
Created on 2018-11-12 by the reprex package (v0.2.1)
I really should write more in the manual about how seeds work in drake
and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.
To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.
Let's say you have a script that generates a drake
plan and executes it.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
Created on 2018-11-12 by the reprex package (v0.2.1)
The second make()
worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed
arguments to simulate_data()
would be different, so all your targets would build from scratch.
library(drake)
simulate_data <- function(seed)
set.seed(seed)
rnorm(100)
seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)
print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158
plan <- map_plan(seed_grid, simulate_data)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)
make(plan)
#> target target_1
#> target target_2
#> target target_3
Created on 2018-11-12 by the reprex package (v0.2.1)
One solution is to be extra careful to hold onto the same plan
. However, there is an even easier way: just let drake
set the seeds for you. drake
automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed
argument to make()
) and the names of the targets.
library(digest)
library(drake)
library(magrittr) # defines %>%
simulate_data <- function()
mean(rnorm(100))
plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)
print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()
tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"
make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.
# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"
make(plan)
#> target target_1
#> target target_2
#> target target_3
# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671
# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181
# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201
Created on 2018-11-12 by the reprex package (v0.2.1)
I really should write more in the manual about how seeds work in drake
and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.
edited Nov 13 '18 at 2:22
answered Nov 13 '18 at 2:13
landaulandau
1,2261022
1,2261022
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
add a comment |
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
1
1
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)
– landau
Nov 13 '18 at 2:19
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268458%2fhalting-drake-plan-makes-it-rebuild-targets-it-already-had-built-previously%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Would you post a small reproducible example? I am having trouble picturing how your use of
set.seed()
confuseddrake
into incorrectly invalidating targets.– landau
Nov 13 '18 at 1:10
By any chance, did you call those data-generating functions outside the plan before
make()
?– landau
Nov 13 '18 at 1:20
Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦
– zipzapboing
Nov 13 '18 at 1:32