Halting drake plan makes it rebuild targets it already had built previously










1















I'm currently using drake to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.



Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake for its intended purpose.



I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake sequentially add targets to its cache but that seems utterly hackish.



Any ideas on how to deal with this?



EDIT: The actual problem seems to come from using set.seed inside my data generating functions. I was aware that drake already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.










share|improve this question
























  • Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

    – landau
    Nov 13 '18 at 1:10











  • By any chance, did you call those data-generating functions outside the plan before make()?

    – landau
    Nov 13 '18 at 1:20











  • Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

    – zipzapboing
    Nov 13 '18 at 1:32















1















I'm currently using drake to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.



Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake for its intended purpose.



I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake sequentially add targets to its cache but that seems utterly hackish.



Any ideas on how to deal with this?



EDIT: The actual problem seems to come from using set.seed inside my data generating functions. I was aware that drake already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.










share|improve this question
























  • Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

    – landau
    Nov 13 '18 at 1:10











  • By any chance, did you call those data-generating functions outside the plan before make()?

    – landau
    Nov 13 '18 at 1:20











  • Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

    – zipzapboing
    Nov 13 '18 at 1:32













1












1








1








I'm currently using drake to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.



Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake for its intended purpose.



I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake sequentially add targets to its cache but that seems utterly hackish.



Any ideas on how to deal with this?



EDIT: The actual problem seems to come from using set.seed inside my data generating functions. I was aware that drake already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.










share|improve this question
















I'm currently using drake to run a set of >1k simulations. I've estimated that it would take about two days to run the complete set, but I also expect my computer to crash at any point during that period because, well, it has.



Apparently stopping the plan discards any targets that were already built so essentially this means I can't use drake for its intended purpose.



I suppose I could make a function that actually edits the R file where the plan is specified in order to make drake sequentially add targets to its cache but that seems utterly hackish.



Any ideas on how to deal with this?



EDIT: The actual problem seems to come from using set.seed inside my data generating functions. I was aware that drake already does this for the user in a way that ensures reproducibility, but I figured that if I just left my functions the way they were it wouldn't change anything since drake would be ensuring that the random seed I chose always ends up being the same? Guess not, but since I removed that step things are caching fine so the issue is solved.







r ropensci drake-r-package






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 8:49









landau

1,2261022




1,2261022










asked Nov 12 '18 at 18:59









zipzapboingzipzapboing

1269




1269












  • Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

    – landau
    Nov 13 '18 at 1:10











  • By any chance, did you call those data-generating functions outside the plan before make()?

    – landau
    Nov 13 '18 at 1:20











  • Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

    – zipzapboing
    Nov 13 '18 at 1:32

















  • Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

    – landau
    Nov 13 '18 at 1:10











  • By any chance, did you call those data-generating functions outside the plan before make()?

    – landau
    Nov 13 '18 at 1:20











  • Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

    – zipzapboing
    Nov 13 '18 at 1:32
















Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

– landau
Nov 13 '18 at 1:10





Would you post a small reproducible example? I am having trouble picturing how your use of set.seed() confused drake into incorrectly invalidating targets.

– landau
Nov 13 '18 at 1:10













By any chance, did you call those data-generating functions outside the plan before make()?

– landau
Nov 13 '18 at 1:20





By any chance, did you call those data-generating functions outside the plan before make()?

– landau
Nov 13 '18 at 1:20













Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

– zipzapboing
Nov 13 '18 at 1:32





Yep, that was it! I was trying to be clever by mimicking a drake plan with a data.frame that populated seeds with a lapply call, but now I realize it was being called while sourcing the plan file. 🤦

– zipzapboing
Nov 13 '18 at 1:32












1 Answer
1






active

oldest

votes


















2














To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.



Let's say you have a script that generates a drake plan and executes it.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.


Created on 2018-11-12 by the reprex package (v0.2.1)



The second make() worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed arguments to simulate_data() would be different, so all your targets would build from scratch.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)

make(plan)
#> target target_1
#> target target_2
#> target target_3


Created on 2018-11-12 by the reprex package (v0.2.1)



One solution is to be extra careful to hold onto the same plan. However, there is an even easier way: just let drake set the seeds for you. drake automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed argument to make()) and the names of the targets.





library(digest)
library(drake)
library(magrittr) # defines %>%

simulate_data <- function()
mean(rnorm(100))


plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()

tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.

# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"

make(plan)
#> target target_1
#> target target_2
#> target target_3

# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181

# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201


Created on 2018-11-12 by the reprex package (v0.2.1)



I really should write more in the manual about how seeds work in drake and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.






share|improve this answer




















  • 1





    Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

    – landau
    Nov 13 '18 at 2:19











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268458%2fhalting-drake-plan-makes-it-rebuild-targets-it-already-had-built-previously%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.



Let's say you have a script that generates a drake plan and executes it.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.


Created on 2018-11-12 by the reprex package (v0.2.1)



The second make() worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed arguments to simulate_data() would be different, so all your targets would build from scratch.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)

make(plan)
#> target target_1
#> target target_2
#> target target_3


Created on 2018-11-12 by the reprex package (v0.2.1)



One solution is to be extra careful to hold onto the same plan. However, there is an even easier way: just let drake set the seeds for you. drake automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed argument to make()) and the names of the targets.





library(digest)
library(drake)
library(magrittr) # defines %>%

simulate_data <- function()
mean(rnorm(100))


plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()

tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.

# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"

make(plan)
#> target target_1
#> target target_2
#> target target_3

# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181

# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201


Created on 2018-11-12 by the reprex package (v0.2.1)



I really should write more in the manual about how seeds work in drake and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.






share|improve this answer




















  • 1





    Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

    – landau
    Nov 13 '18 at 2:19
















2














To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.



Let's say you have a script that generates a drake plan and executes it.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.


Created on 2018-11-12 by the reprex package (v0.2.1)



The second make() worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed arguments to simulate_data() would be different, so all your targets would build from scratch.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)

make(plan)
#> target target_1
#> target target_2
#> target target_3


Created on 2018-11-12 by the reprex package (v0.2.1)



One solution is to be extra careful to hold onto the same plan. However, there is an even easier way: just let drake set the seeds for you. drake automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed argument to make()) and the names of the targets.





library(digest)
library(drake)
library(magrittr) # defines %>%

simulate_data <- function()
mean(rnorm(100))


plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()

tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.

# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"

make(plan)
#> target target_1
#> target target_2
#> target target_3

# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181

# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201


Created on 2018-11-12 by the reprex package (v0.2.1)



I really should write more in the manual about how seeds work in drake and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.






share|improve this answer




















  • 1





    Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

    – landau
    Nov 13 '18 at 2:19














2












2








2







To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.



Let's say you have a script that generates a drake plan and executes it.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.


Created on 2018-11-12 by the reprex package (v0.2.1)



The second make() worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed arguments to simulate_data() would be different, so all your targets would build from scratch.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)

make(plan)
#> target target_1
#> target target_2
#> target target_3


Created on 2018-11-12 by the reprex package (v0.2.1)



One solution is to be extra careful to hold onto the same plan. However, there is an even easier way: just let drake set the seeds for you. drake automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed argument to make()) and the names of the targets.





library(digest)
library(drake)
library(magrittr) # defines %>%

simulate_data <- function()
mean(rnorm(100))


plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()

tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.

# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"

make(plan)
#> target target_1
#> target target_2
#> target target_3

# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181

# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201


Created on 2018-11-12 by the reprex package (v0.2.1)



I really should write more in the manual about how seeds work in drake and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.






share|improve this answer















To bring onlookers up to speed, I will try to spell out the problem. @zipzapboing, please correct me if my description is off-target.



Let's say you have a script that generates a drake plan and executes it.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 581687
#> 2 target_2 700363
#> 3 target_3 914982

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 581687L)
#> 2 target_2 simulate_data(seed = 700363L)
#> 3 target_3 simulate_data(seed = 914982L)

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.


Created on 2018-11-12 by the reprex package (v0.2.1)



The second make() worked just fine, right? But if you were to run the same script in a different session, you would end up with a different plan. The randomly-generated seed arguments to simulate_data() would be different, so all your targets would build from scratch.





library(drake)

simulate_data <- function(seed)
set.seed(seed)
rnorm(100)


seed_grid <- data.frame(
id = paste0("target_", 1:3),
seed = sample.int(1e6, 3)
)

print(seed_grid)
#> id seed
#> 1 target_1 654304
#> 2 target_2 252208
#> 3 target_3 781158

plan <- map_plan(seed_grid, simulate_data)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data(seed = 654304L)
#> 2 target_2 simulate_data(seed = 252208L)
#> 3 target_3 simulate_data(seed = 781158L)

make(plan)
#> target target_1
#> target target_2
#> target target_3


Created on 2018-11-12 by the reprex package (v0.2.1)



One solution is to be extra careful to hold onto the same plan. However, there is an even easier way: just let drake set the seeds for you. drake automatically gives each target its own reproducible random seed. These target-level seeds are deterministically generated by a root seed (the seed argument to make()) and the names of the targets.





library(digest)
library(drake)
library(magrittr) # defines %>%

simulate_data <- function()
mean(rnorm(100))


plan <- drake_plan(target = simulate_data()) %>%
expand_plan(values = 1:3)

print(plan)
#> # A tibble: 3 x 2
#> target command
#> <chr> <chr>
#> 1 target_1 simulate_data()
#> 2 target_2 simulate_data()
#> 3 target_3 simulate_data()

tmp <- rnorm(1)
digest(.Random.seed) # Fingerprint of the current seed.
#> [1] "0bbddc33a4afe7cd1c1742223764661c"

make(plan)
#> target target_1
#> target target_2
#> target target_3
make(plan)
#> All targets are already up to date.

# The targets have different seeds and different values.
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

clean() # Destroy the targets.
tmp <- rnorm(1) # Change the global seed.
digest(.Random.seed) # The seed changed.
#> [1] "5993aa5cff4b72a0e14fa58dc5c5e3bf"

make(plan)
#> target target_1
#> target target_2
#> target target_3

# The targets were regenerated with the same values (same seeds).
readd(target_1)
#> [1] -0.05530201
readd(target_2)
#> [1] 0.03698055
readd(target_3)
#> [1] 0.05990671

# You can recover a target's seed from its metadata.
seed <- diagnose(target_1)$seed
print(seed)
#> [1] 1875584181

# And you can use that seed to reproduce
# the target's value outside make().
set.seed(seed)
mean(rnorm(100))
#> [1] -0.05530201


Created on 2018-11-12 by the reprex package (v0.2.1)



I really should write more in the manual about how seeds work in drake and highlight the original pitfall raised in this thread. I doubt you are the only one who struggled with this issue.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 '18 at 2:22

























answered Nov 13 '18 at 2:13









landaulandau

1,2261022




1,2261022







  • 1





    Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

    – landau
    Nov 13 '18 at 2:19













  • 1





    Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

    – landau
    Nov 13 '18 at 2:19








1




1





Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

– landau
Nov 13 '18 at 2:19






Ref: github.com/ropenscilabs/drake-manual/issues/49 (added just now)

– landau
Nov 13 '18 at 2:19




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268458%2fhalting-drake-plan-makes-it-rebuild-targets-it-already-had-built-previously%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)