Find the maximum in a certain time frame in a non-continuous time series

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;

I have a dataframe with a time series that looks like this:

df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588, 
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22, 
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date", 
"flow"), row.names = c(NA, 10L), class = "data.frame")

I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.

I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:

 idx <- c()
 for (j in 3:(length(df$date)-2))
 as.Date(df$date[j])-as.Date(df$date[j-2])<3)
 if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
 idx <- c(idx,j)
 
 else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3

Notice that the dates in the dataframe are not consecutive.

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

1

Are you sure that row 7 would be removed?

– RLave
Nov 14 '18 at 8:53

@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.

– asher
Nov 14 '18 at 9:08

are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7

– DJV
Nov 14 '18 at 9:13

Also in your example you have a date twice, is it correct?

– RLave
Nov 14 '18 at 9:21

1

@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.

– asher
Nov 14 '18 at 9:26

|
show 2 more comments

I have a dataframe with a time series that looks like this:

df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588, 
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22, 
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date", 
"flow"), row.names = c(NA, 10L), class = "data.frame")

I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.

I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:

 idx <- c()
 for (j in 3:(length(df$date)-2))
 as.Date(df$date[j])-as.Date(df$date[j-2])<3)
 if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
 idx <- c(idx,j)
 
 else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3

Notice that the dates in the dataframe are not consecutive.

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

1

Are you sure that row 7 would be removed?

– RLave
Nov 14 '18 at 8:53

@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.

– asher
Nov 14 '18 at 9:08

are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7

– DJV
Nov 14 '18 at 9:13

Also in your example you have a date twice, is it correct?

– RLave
Nov 14 '18 at 9:21

1

@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.

– asher
Nov 14 '18 at 9:26

|
show 2 more comments

I have a dataframe with a time series that looks like this:

df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588, 
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22, 
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date", 
"flow"), row.names = c(NA, 10L), class = "data.frame")

I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.

I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:

 idx <- c()
 for (j in 3:(length(df$date)-2))
 as.Date(df$date[j])-as.Date(df$date[j-2])<3)
 if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
 idx <- c(idx,j)
 
 else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3

Notice that the dates in the dataframe are not consecutive.

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

I have a dataframe with a time series that looks like this:

df<-structure(list(date = structure(c(-6905, -6891, -6853, -6588, 
-6588, -6586, -6523, -6515, -5856, -5753), class = "Date"), flow = c(2.22, 
2.56, 3.3, 1.38, 4, 1.4, 1.32, 1.26, 6, 35.69)), .Names = c("date", 
"flow"), row.names = c(NA, 10L), class = "data.frame")

I want to remove all the lines that are not the maximum within 2 days forward or backward of its date. So in the case above, lines 4 and 6 will be removed. I couldn't find similar answered questions.

I wrote this code that doesn't work and it is ugly, long and doesn't take care of the edges of the dataframe:

 idx <- c()
 for (j in 3:(length(df$date)-2))
 as.Date(df$date[j])-as.Date(df$date[j-2])<3)
 if (df$flow[j]!=max(df$flow[(j-2):(j+2)]))
 idx <- c(idx,j)
 
 else if (as.Date(df$date[j+1])-as.Date(df$date[j])<3

Notice that the dates in the dataframe are not consecutive.

r dataframe time-series max

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

edited Nov 14 '18 at 9:31

asked Nov 14 '18 at 8:37

asher

556

asked Nov 14 '18 at 8:37

asher

556

asked Nov 14 '18 at 8:37

asher

556

1

Are you sure that row 7 would be removed?

– RLave
Nov 14 '18 at 8:53

@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.

– asher
Nov 14 '18 at 9:08

are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7

– DJV
Nov 14 '18 at 9:13

Also in your example you have a date twice, is it correct?

– RLave
Nov 14 '18 at 9:21

1

@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.

– asher
Nov 14 '18 at 9:26

|
show 2 more comments

1

Are you sure that row 7 would be removed?

– RLave
Nov 14 '18 at 8:53

@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.

– asher
Nov 14 '18 at 9:08

are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7

– DJV
Nov 14 '18 at 9:13

Also in your example you have a date twice, is it correct?

– RLave
Nov 14 '18 at 9:21

1

@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.

– asher
Nov 14 '18 at 9:26

Are you sure that row 7 would be removed?

– RLave
Nov 14 '18 at 8:53

@RLave No. I was wrong. row 7 will not be removed. It is not the same year. Thanks for pointing it out. I also edited my answer: my code most definitely doesn't work.

– asher
Nov 14 '18 at 9:08

are you sure we're not suppose to be left only with date flow <date> <dbl> 1 1951-03-29 3.30 2 1951-12-19 4.00 3 1954-04-02 35.7

– DJV
Nov 14 '18 at 9:13

Also in your example you have a date twice, is it correct?

– RLave
Nov 14 '18 at 9:21

@RLave Yes. That is correct. It is a time-series of separated flood events which might happen on the same day.

– asher
Nov 14 '18 at 9:26

|
show 2 more comments

3 Answers
3

active

oldest

votes

Using the zoo library.

library(zoo)

# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)

# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)

# pad with NAs to make the time series regular
dtf.zoo <- merge(
 dtf.zoo, 
 zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)

# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))

# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]

# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69

which(!(dt$flow %in% dt2$flow))
# 4 6

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

add a comment |

You can also use the tidyverse approch:

require(tidyverse)

df %>% 
 #Arrange by date
 arrange(date) %>%
 #Picking the max for each da
 group_by(date) %>% 
 top_n(1, flow) %>% 
 ungroup() %>%
 #Adding missing dates with NAs
 complete(date = seq.Date(min(date), max(date), by="day")) %>% 
 #Remove Two up/down
 mutate(
 remove = case_when(
 flow < rowMeans(data.frame(lag(flow, 1), 
 lag(flow, 2)), na.rm = TRUE) ~ "remove", 
 flow < rowMeans(data.frame(lead(flow, 1),
 lead(flow, 2)), na.rm = TRUE) ~ "remove", 
 TRUE ~ "keep")) %>% 
 na.omit() %>%
 filter(remove == "keep") %>% 
 select(-remove)


# A tibble: 8 x 2
 date flow
 <date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

add a comment |

I use lapply() to check the range : [date - 2 days , date + 2 days] of each date.

rm.list <- lapply(df$date, function(x) 
 ind <- which(abs(df$date - x) <= 2)
 flow <- df$flow[ind]
 if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
 else NULL
)

rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]

# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53295976%2ffind-the-maximum-in-a-certain-time-frame-in-a-non-continuous-time-series%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Using the zoo library.

library(zoo)

# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)

# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)

# pad with NAs to make the time series regular
dtf.zoo <- merge(
 dtf.zoo, 
 zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)

# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))

# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]

# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69

which(!(dt$flow %in% dt2$flow))
# 4 6

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

add a comment |

Using the zoo library.

library(zoo)

# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)

# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)

# pad with NAs to make the time series regular
dtf.zoo <- merge(
 dtf.zoo, 
 zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)

# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))

# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]

# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69

which(!(dt$flow %in% dt2$flow))
# 4 6

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

add a comment |

Using the zoo library.

library(zoo)

# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)

# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)

# pad with NAs to make the time series regular
dtf.zoo <- merge(
 dtf.zoo, 
 zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)

# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))

# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]

# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69

which(!(dt$flow %in% dt2$flow))
# 4 6

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

Using the zoo library.

library(zoo)

# convert into a zoo time series
dtf.zoo <- zoo(dt$flow, order.by=dt$date)

# remove duplicate dates by keeping the maximum value
dtf.zoo <- aggregate(dtf.zoo, time(dtf.zoo), max)

# pad with NAs to make the time series regular
dtf.zoo <- merge(
 dtf.zoo, 
 zoo(, seq(min(index(dtf.zoo)), max(index(dtf.zoo)), "day"))
)

# find rows that are less than a value two days prior or hence
rem <- which(dtf.zoo < rollapply(dtf.zoo, 5, max, na.rm=TRUE, partial=TRUE))

# remove those rows
dtf.zoo2 <- dtf.zoo[-rem]

# remove NAs
dt2 <- data.frame(flow=na.omit(dtf.zoo2))
dt2
# flow
# 1951-02-05 2.22
# 1951-02-19 2.56
# 1951-03-29 3.30
# 1951-12-19 4.00
# 1952-02-22 1.32
# 1952-03-01 1.26
# 1953-12-20 6.00
# 1954-04-02 35.69

which(!(dt$flow %in% dt2$flow))
# 4 6

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

edited Nov 14 '18 at 11:24

answered Nov 14 '18 at 9:55

AkselA

4,83421326

answered Nov 14 '18 at 9:55

AkselA

4,83421326

answered Nov 14 '18 at 9:55

AkselA

4,83421326

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

add a comment |

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

hey, if I don't misunderstand, the OP wants to remove line 4 and 6, but you remove line 4 and 5.

– Darren Tsai
Nov 14 '18 at 10:56

@DarrenTsai: Indeed, thank you. I forgot to set partial=TRUE in rollapply(), which caused an index mismatch.

– AkselA
Nov 14 '18 at 11:28

@AkselA: I added this condition: if (length(rem>0))df.zoo <- df.zoo[-rem] to escape cases that don't have events that are less than 3 days apart

– asher
Nov 14 '18 at 13:26

@asher: That will work. Removing which() and replacing -rem with !rem should also work.

– AkselA
Nov 14 '18 at 13:41

@AkselA: makes sense. Thanks for the ongoing exchange it is very helpful.

– asher
Nov 15 '18 at 11:40

add a comment |

You can also use the tidyverse approch:

require(tidyverse)

df %>% 
 #Arrange by date
 arrange(date) %>%
 #Picking the max for each da
 group_by(date) %>% 
 top_n(1, flow) %>% 
 ungroup() %>%
 #Adding missing dates with NAs
 complete(date = seq.Date(min(date), max(date), by="day")) %>% 
 #Remove Two up/down
 mutate(
 remove = case_when(
 flow < rowMeans(data.frame(lag(flow, 1), 
 lag(flow, 2)), na.rm = TRUE) ~ "remove", 
 flow < rowMeans(data.frame(lead(flow, 1),
 lead(flow, 2)), na.rm = TRUE) ~ "remove", 
 TRUE ~ "keep")) %>% 
 na.omit() %>%
 filter(remove == "keep") %>% 
 select(-remove)


# A tibble: 8 x 2
 date flow
 <date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

add a comment |

You can also use the tidyverse approch:

require(tidyverse)

df %>% 
 #Arrange by date
 arrange(date) %>%
 #Picking the max for each da
 group_by(date) %>% 
 top_n(1, flow) %>% 
 ungroup() %>%
 #Adding missing dates with NAs
 complete(date = seq.Date(min(date), max(date), by="day")) %>% 
 #Remove Two up/down
 mutate(
 remove = case_when(
 flow < rowMeans(data.frame(lag(flow, 1), 
 lag(flow, 2)), na.rm = TRUE) ~ "remove", 
 flow < rowMeans(data.frame(lead(flow, 1),
 lead(flow, 2)), na.rm = TRUE) ~ "remove", 
 TRUE ~ "keep")) %>% 
 na.omit() %>%
 filter(remove == "keep") %>% 
 select(-remove)


# A tibble: 8 x 2
 date flow
 <date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

add a comment |

You can also use the tidyverse approch:

require(tidyverse)

df %>% 
 #Arrange by date
 arrange(date) %>%
 #Picking the max for each da
 group_by(date) %>% 
 top_n(1, flow) %>% 
 ungroup() %>%
 #Adding missing dates with NAs
 complete(date = seq.Date(min(date), max(date), by="day")) %>% 
 #Remove Two up/down
 mutate(
 remove = case_when(
 flow < rowMeans(data.frame(lag(flow, 1), 
 lag(flow, 2)), na.rm = TRUE) ~ "remove", 
 flow < rowMeans(data.frame(lead(flow, 1),
 lead(flow, 2)), na.rm = TRUE) ~ "remove", 
 TRUE ~ "keep")) %>% 
 na.omit() %>%
 filter(remove == "keep") %>% 
 select(-remove)


# A tibble: 8 x 2
 date flow
 <date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

You can also use the tidyverse approch:

require(tidyverse)

df %>% 
 #Arrange by date
 arrange(date) %>%
 #Picking the max for each da
 group_by(date) %>% 
 top_n(1, flow) %>% 
 ungroup() %>%
 #Adding missing dates with NAs
 complete(date = seq.Date(min(date), max(date), by="day")) %>% 
 #Remove Two up/down
 mutate(
 remove = case_when(
 flow < rowMeans(data.frame(lag(flow, 1), 
 lag(flow, 2)), na.rm = TRUE) ~ "remove", 
 flow < rowMeans(data.frame(lead(flow, 1),
 lead(flow, 2)), na.rm = TRUE) ~ "remove", 
 TRUE ~ "keep")) %>% 
 na.omit() %>%
 filter(remove == "keep") %>% 
 select(-remove)


# A tibble: 8 x 2
 date flow
 <date> <dbl>
1 1951-02-05 2.22
2 1951-02-19 2.56
3 1951-03-29 3.30
4 1951-12-19 4.00
5 1952-02-22 1.32
6 1952-03-01 1.26
7 1953-12-20 6.00
8 1954-04-02 35.7

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

edited Nov 14 '18 at 10:41

answered Nov 14 '18 at 9:16

DJV

1,8681519

answered Nov 14 '18 at 9:16

DJV

1,8681519

answered Nov 14 '18 at 9:16

DJV

1,8681519

add a comment |

I use lapply() to check the range : [date - 2 days , date + 2 days] of each date.

rm.list <- lapply(df$date, function(x) 
 ind <- which(abs(df$date - x) <= 2)
 flow <- df$flow[ind]
 if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
 else NULL
)

rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]

# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

add a comment |

I use lapply() to check the range : [date - 2 days , date + 2 days] of each date.

rm.list <- lapply(df$date, function(x) 
 ind <- which(abs(df$date - x) <= 2)
 flow <- df$flow[ind]
 if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
 else NULL
)

rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]

# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

add a comment |

I use lapply() to check the range : [date - 2 days , date + 2 days] of each date.

rm.list <- lapply(df$date, function(x) 
 ind <- which(abs(df$date - x) <= 2)
 flow <- df$flow[ind]
 if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
 else NULL
)

rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]

# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

I use lapply() to check the range : [date - 2 days , date + 2 days] of each date.

rm.list <- lapply(df$date, function(x) 
 ind <- which(abs(df$date - x) <= 2)
 flow <- df$flow[ind]
 if(length(ind) > 1) which(flow < max(flow)) + min(ind) - 1
 else NULL
)

rm <- unique(unlist(rm.list)) # [1] 4 6
df[-rm, ]

# date flow
# 1 1951-02-05 2.22
# 2 1951-02-19 2.56
# 3 1951-03-29 3.30
# 5 1951-12-19 4.00
# 7 1952-02-22 1.32
# 8 1952-03-01 1.26
# 9 1953-12-20 6.00
# 10 1954-04-02 35.69

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

edited Nov 14 '18 at 10:49

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

answered Nov 14 '18 at 10:37

Darren Tsai

2,6572531

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dfyjkt