How to group by date and find consecutive day count

How to group by date and find consecutive day count



So I have a table like


product date_purchased
apple 2018-08-01
apple 2018-08-02
apple 2018-08-03
apple 2018-08-10
apple 2018-08-11
banana 2018-08-14



I am trying to look for how many times the product was purchased on consecutive days. like


apple 2018-08-01 1
apple 2018-08-02 2
apple 2018-08-03 3
apple 2018-08-10 1
apple 2018-08-11 2
banana 2018-08-14 1



The first column in product, second column is the last date it was purchased and the third column is the days it was purchased consecutively.



[EDIT]: Changed the output format





What have you tried so far?
– w-m
Aug 23 at 15:36





I'm familiar with group by, but I am not sure how do I check for consecutive days and get a count.
– John Constantine
Aug 23 at 15:38





FYI it might be best to call your column products as product conflicts with the product method
– josh
Aug 23 at 16:01



products


product


product




2 Answers
2



Create a new key by using diff and cumsum , then we can groupby agg


diff


cumsum


groupby


agg


df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[358]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['product','Newkey'])['date_purchased'].agg(['last','count'])
Out[359]:
last count
product Newkey
apple 1 2018-08-03 3
2 2018-08-11 2
banana 3 2018-08-14 1



Update


df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[384]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['Newkey']).cumcount()+1
Out[385]:
0 1
1 2
2 3
3 1
4 2
5 1
dtype: int64





Awesome. Just be sure to sort your dataframe by products and date_purchased otherwise the diff might not work.
– josh
Aug 23 at 16:02



products


date_purchased


diff





Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38





@JohnConstantine what you mean output days ?
– W-B
Aug 23 at 16:57





@Wen Sorry, I modified the output in the question.
– John Constantine
Aug 23 at 17:44





@JohnConstantine check the update
– W-B
Aug 23 at 17:45



Find when the dates change and create date_groups with the shift and cumsum functions. Then you can groupby by product and date_group with the multiple aggregation functionality provided by pandas.


date_groups


shift


cumsum


product


date_group



Finally formatting and renaming the columns to match expected output:


import datetime as dt

(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum()
)
.groupby(['product', 'date_group'])['date_purchased'].agg(['last', 'count'])
.reset_index(level=-1, drop=True)
.rename(columns='last': 'last_date_purchased',
'count': 'times_in_a_row')
)


last_date_purchased times_in_a_row
product
apple 2018-08-03 3
apple 2018-08-11 2
banana 2018-08-14 1



EDIT:



The desired output changes a bit the strategy to follow. The previous one was simpler and I apologize for the over use of lambda functions. I am sure some pipe can be used.


lambda


pipe



The code changes in the sense that now we do not count the elements in each group_date but a single key in associated to the each day. Also we have to simply groupby to use the leverage of the transform function.


group_date


key


groupby


transform


(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum(),
key=1,
times_in_a_row=lambda x: x.groupby(['product', 'date_group'])
.transform(lambda x: x.cumsum())
)
[['product', 'date_purchased', 'times_in_a_row']]
)

product date_purchased times_in_a_row
0 apple 2018-08-01 1
1 apple 2018-08-02 2
2 apple 2018-08-03 3
3 apple 2018-08-10 1
4 apple 2018-08-11 2
5 banana 2018-08-14 1





Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38





I get an error, dt is not defined.
– John Constantine
Aug 23 at 16:45





I am very sorry I forgot to add the imports. Let me edit.
– gonzalo mr
Aug 23 at 16:50





@JohnConstantine what do you mean by output days by each date?
– gonzalo mr
Aug 23 at 16:53





I modified the output in question.
– John Constantine
Aug 23 at 16:58






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)