How to group by date and find consecutive day count
How to group by date and find consecutive day count
So I have a table like
product date_purchased
apple 2018-08-01
apple 2018-08-02
apple 2018-08-03
apple 2018-08-10
apple 2018-08-11
banana 2018-08-14
I am trying to look for how many times the product was purchased on consecutive days. like
apple 2018-08-01 1
apple 2018-08-02 2
apple 2018-08-03 3
apple 2018-08-10 1
apple 2018-08-11 2
banana 2018-08-14 1
The first column in product, second column is the last date it was purchased and the third column is the days it was purchased consecutively.
[EDIT]: Changed the output format
I'm familiar with group by, but I am not sure how do I check for consecutive days and get a count.
– John Constantine
Aug 23 at 15:38
FYI it might be best to call your column
products
as product
conflicts with the product
method– josh
Aug 23 at 16:01
products
product
product
2 Answers
2
Create a new key by using diff
and cumsum
, then we can groupby
agg
diff
cumsum
groupby
agg
df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[358]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['product','Newkey'])['date_purchased'].agg(['last','count'])
Out[359]:
last count
product Newkey
apple 1 2018-08-03 3
2 2018-08-11 2
banana 3 2018-08-14 1
Update
df.date_purchased=pd.to_datetime(df.date_purchased)
df['Newkey']=df.date_purchased.diff().dt.days.ne(1).cumsum()
df
Out[384]:
product date_purchased Newkey
0 apple 2018-08-01 1
1 apple 2018-08-02 1
2 apple 2018-08-03 1
3 apple 2018-08-10 2
4 apple 2018-08-11 2
5 banana 2018-08-14 3
df.groupby(['Newkey']).cumcount()+1
Out[385]:
0 1
1 2
2 3
3 1
4 2
5 1
dtype: int64
Awesome. Just be sure to sort your dataframe by
products
and date_purchased
otherwise the diff
might not work.– josh
Aug 23 at 16:02
products
date_purchased
diff
Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38
@JohnConstantine what you mean output days ?
– W-B
Aug 23 at 16:57
@Wen Sorry, I modified the output in the question.
– John Constantine
Aug 23 at 17:44
@JohnConstantine check the update
– W-B
Aug 23 at 17:45
Find when the dates change and create date_groups
with the shift
and cumsum
functions. Then you can groupby by product
and date_group
with the multiple aggregation functionality provided by pandas.
date_groups
shift
cumsum
product
date_group
Finally formatting and renaming the columns to match expected output:
import datetime as dt
(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum()
)
.groupby(['product', 'date_group'])['date_purchased'].agg(['last', 'count'])
.reset_index(level=-1, drop=True)
.rename(columns='last': 'last_date_purchased',
'count': 'times_in_a_row')
)
last_date_purchased times_in_a_row
product
apple 2018-08-03 3
apple 2018-08-11 2
banana 2018-08-14 1
EDIT:
The desired output changes a bit the strategy to follow. The previous one was simpler and I apologize for the over use of lambda
functions. I am sure some pipe
can be used.
lambda
pipe
The code changes in the sense that now we do not count the elements in each group_date
but a single key
in associated to the each day. Also we have to simply groupby
to use the leverage of the transform
function.
group_date
key
groupby
transform
(df.assign(date_group=lambda x: (x.date_purchased != x.date_purchased.shift(1)
+ dt.timedelta(days=1)).cumsum(),
key=1,
times_in_a_row=lambda x: x.groupby(['product', 'date_group'])
.transform(lambda x: x.cumsum())
)
[['product', 'date_purchased', 'times_in_a_row']]
)
product date_purchased times_in_a_row
0 apple 2018-08-01 1
1 apple 2018-08-02 2
2 apple 2018-08-03 3
3 apple 2018-08-10 1
4 apple 2018-08-11 2
5 banana 2018-08-14 1
Awesome!!. How can I edit this to show output days for each date.
– John Constantine
Aug 23 at 16:38
I get an error, dt is not defined.
– John Constantine
Aug 23 at 16:45
I am very sorry I forgot to add the imports. Let me edit.
– gonzalo mr
Aug 23 at 16:50
@JohnConstantine what do you mean by output days by each date?
– gonzalo mr
Aug 23 at 16:53
I modified the output in question.
– John Constantine
Aug 23 at 16:58
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What have you tried so far?
– w-m
Aug 23 at 15:36