Conditionally offseting values by group with Pandas
Conditionally offseting values by group with Pandas
I am looking for a more efficient and maintainable way to offset values conditionally by group. Easiest to show an example.
Value is always non-negative for Offset == False
and always negative for Offset == True
. What I'm looking to do is "collapse" positive Values (flooring at 0) against negative ones by Label.
Offset == False
Offset == True
Note Label
+ Offset
combined are always unique. Since Offset
is Boolean, you can only have a maximum of 2 rows per Label.
Label
Offset
Offset
Example 1
df = pd.DataFrame('Label': ['L1', 'L2', 'L3', 'L3'],
'Offset': [False, False, False, True],
'Value': [100, 100, 50, -100])
# input
# Label Offset Value
# 0 L1 False 100
# 1 L2 False 100
# 2 L3 False 50
# 3 L3 True -100
Desired output:
Label Offset Value
0 L1 False 100
1 L2 False 100
2 L3 False 0
3 L3 True -50
Example 2
df = pd.DataFrame('Label': ['L1', 'L2', 'L3', 'L3'],
'Offset': [False, False, False, True],
'Value': [100, 100, 100, -50])
# input
# Label Offset Value
# 0 L1 False 100
# 1 L2 False 100
# 2 L3 False 100
# 3 L3 True -50
Desired output:
Label Offset Value
0 L1 False 100
1 L2 False 100
2 L3 False 50
3 L3 True 0
Current inefficient solution
My current solution is a manual loop which is slow and difficult to maintain:
for label in df['Label'].unique():
mask = df['Label'] == label
if len(df.loc[mask]) == 2:
val_false = df.loc[~df['Offset'] & mask, 'Value'].iloc[0]
val_true = df.loc[df['Offset'] & mask, 'Value'].iloc[0]
if val_false > abs(val_true):
df.loc[~df['Offset'] & mask, 'Value'] += val_true
df.loc[df['Offset'] & mask, 'Value'] = 0
else:
df.loc[~df['Offset'] & mask, 'Value'] = 0
df.loc[df['Offset'] & mask, 'Value'] += val_false
I'm looking for a vectorised, or at least partially vectorised, solution to improve performance and get rid of this mess.
@Yuca, I'll add a note,
Label
+ Offset
combined are always unique. Since Offset
is Boolean, you can only have a maximum of 2 rows per Label.– jpp
Aug 24 at 12:09
Label
Offset
Offset
3 Answers
3
Maybe:
label_sums = df.Value.groupby(df.Label).transform(sum)
df["new_sum"] = label_sums.where(np.sign(label_sums) == np.sign(df.Value), 0)
which gives me
In [42]: df
Out[42]:
Label Offset Value new_sum
0 L1 False 100 100
1 L2 False 100 100
2 L3 False 50 0
3 L3 True -100 -50
4 L4 False 100 100
5 L5 False 100 100
6 L6 False 100 50
7 L6 True -50 0
Yes, thank you!
– jpp
Aug 24 at 12:36
I have a nugget of something. @jpp you can see my deleted post. I'll try to revisit later for my own sake (-:
– piRSquared
Aug 24 at 12:52
@piRSquared, Will do, thks.
– jpp
Aug 24 at 12:54
@jpp currently my deleted post is a complete rip off of this answer just in Numpy form.
– piRSquared
Aug 24 at 13:32
This is the best I've got: create an auxiliary column to find where to display the aggregate and then set the other members of the group to 0
df['aux'] = abs(df['Value'])
idx = abs(df.groupby(['Label'])['aux'].transform(max)) == abs(df['aux'])
df['aux2'] = False
df.loc[idx,'aux2'] = True
df = df.join(df.groupby('Label').Value.sum(), on='Label', rsuffix = 'jpp')
df.loc[df['aux2']==False, 'Valuejpp'] = 0
df = df.drop(['aux', 'aux2','Value'], axis = 1)
Result
Label Offset Valuejpp
0 L1 False 100
1 L2 False 100
2 L3 False 0
3 L3 True -50
the difference in the quality of the answers between 191k rep and 1k rep shows, but it works! :)
– Yuca
Aug 24 at 12:45
Data From DSM
df1=df.copy()
df.loc[df.Offset,'Value']=df.Value.abs()
s1=(df.groupby('Label').Value.diff().lt(0)).groupby(df['Label']).transform('any')
s2=df.groupby('Label').Value.transform('count')
s3=df1.groupby('Label').Value.transform('sum')
np.where(s2<=1,df1.Value,np.where(s1,s3*(-df1.Offset),s3*df1.Offset))
Out[757]: array([100, 100, 0, -50, 100, 100, 50, 0], dtype=int64)
You can do it with np.select , I am pretty sure you know how to do it man
– Wen
Aug 24 at 12:58
Yep, I think I had a brain freeze! But DSM's solution is also smart IMO.
– jpp
Aug 24 at 13:08
@jpp haha ,I had it sometimes too .
– Wen
Aug 24 at 13:09
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
if there where 3 L3 values (100, -25, -25) then it would translate into 50, 0, 0?
– Yuca
Aug 24 at 12:08