How to group by date and find consecutive day count
How to group by date and find consecutive day count
So I have a table like
product date_purchased
apple 2018-08-01
apple 2018-08-02
apple 2018-08-03
apple 2018-08-10
apple 2018-08-11
banana 2018-08-14
I am trying to look for how many times the product was purchased on consecutive days. like
apple 2018-08-01 1
apple 2018-08-02 2
apple 2018-08-03 3
apple 2018-08-10 1
apple 2018-08-11 2
banana 2018-08-14 1
The first column in product, second column is the last date it was purchased and the third column is the days it was purchased consecutively.
[EDIT]: Changed the output format
I'm familiar with group by, but I am not sure how do I check for consecutive days and get a count.
– John Constantine
Aug 23 at 15:38
FYI it might be best to call your column
products as product conflicts with the product method– josh
Aug 23 at 16:01
products
product
product
2 Answers
2
Create a new key by using diff and cumsum , then we can groupby agg
diff
cumsum
groupby
agg
df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[358]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['product','Newkey'])['date_purchased'].agg(['last','count'])
Out[359]:
last count
product Newkey
apple 1 2018-08-03 3
2 2018-08-11 2
banana 3 2018-08-14 1
Update
df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[384]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['Newkey']).cumcount()+1
Out[385]:
0 1
1 2
2 3
3 1
4 2
5 1
dtype: int64
Awesome. Just be sure to sort your dataframe by
products and date_purchased otherwise the diff might not work.– josh
Aug 23 at 16:02
products
date_purchased
diff
Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38
@JohnConstantine what you mean output days ?
– W-B
Aug 23 at 16:57
@Wen Sorry, I modified the output in the question.
– John Constantine
Aug 23 at 17:44
@JohnConstantine check the update
– W-B
Aug 23 at 17:45
Find when the dates change and create date_groups with the shift and cumsum functions. Then you can groupby by product and date_group with the multiple aggregation functionality provided by pandas.
date_groups
shift
cumsum
product
date_group
Finally formatting and renaming the columns to match expected output:
import datetime as dt
(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum()
)
.groupby(['product', 'date_group'])['date_purchased'].agg(['last', 'count'])
.reset_index(level=-1, drop=True)
.rename(columns='last': 'last_date_purchased',
'count': 'times_in_a_row')
)
last_date_purchased times_in_a_row
product
apple 2018-08-03 3
apple 2018-08-11 2
banana 2018-08-14 1
EDIT:
The desired output changes a bit the strategy to follow. The previous one was simpler and I apologize for the over use of lambda functions. I am sure some pipe can be used.
lambda
pipe
The code changes in the sense that now we do not count the elements in each group_date but a single key in associated to the each day. Also we have to simply groupby to use the leverage of the transform function.
group_date
key
groupby
transform
(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum(),
key=1,
times_in_a_row=lambda x: x.groupby(['product', 'date_group'])
.transform(lambda x: x.cumsum())
)
[['product', 'date_purchased', 'times_in_a_row']]
)
product date_purchased times_in_a_row
0 apple 2018-08-01 1
1 apple 2018-08-02 2
2 apple 2018-08-03 3
3 apple 2018-08-10 1
4 apple 2018-08-11 2
5 banana 2018-08-14 1
Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38
I get an error, dt is not defined.
– John Constantine
Aug 23 at 16:45
I am very sorry I forgot to add the imports. Let me edit.
– gonzalo mr
Aug 23 at 16:50
@JohnConstantine what do you mean by output days by each date?
– gonzalo mr
Aug 23 at 16:53
I modified the output in question.
– John Constantine
Aug 23 at 16:58
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What have you tried so far?
– w-m
Aug 23 at 15:36