Python Pandas Bottom-Up Calculation Within Groups
Python Pandas Bottom-Up Calculation Within Groups
Assume I have the following dataset:
df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1],
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])
For the last observation per group, I have the following code:
def calculate(df):
if (df.last == 1):
value = df.loc["A"] + df.loc["B"]
else:
for all other observation per group, the row value is calculated as follows:
value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"])
return value
In simple english this is what I'm trying to do. For the last observation in each group, column C
equals the sum of columns A and B
C
A and B
For all other observations, I would like to calculate the value of column C
from the "bottom up" using the "else statement" above (which is there for a proof of concept).
C
To further clarify, these are the formulas for calculating the Want
column for Group 2
using excel: F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7".
There's some kind of "recursive" nature to it, which is why I thought of the "for loop"
Want
Group 2
F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7".
I would really appreciate a solution where it's consistent with the first if statement.
That is
if statement.
value = something
rather than the function returning a data frame or something like that, so that I can call the function using the following
df["value"] = df.apply(calculate, axis=1)
Your help is appreciated. Thanks
@Karnage Please see the updated df with the desired values. I tried to edit the question but it doesn't allow me df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])
– xyz
Sep 12 '18 at 11:46
3 Answers
3
Seems you need np.where
with shift
np.where
shift
np.where(df.last==1,df.A+df.B,df.A+1.65*df.B+df.C.shift())
Out[199]: array([ nan, 11.9 , 14.55, 27.2 , 19.85, 22.5 , 41.15, 27.8 , 30.45])
Thanks for helping me. I have updated the df to include the correct values. df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])
– xyz
Sep 12 '18 at 11:53
You can use df.assign
and np.where
and df.rolling
to reach the goal.
df.assign
np.where
df.rolling
import pandas as pd
import numpy as np
df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22],
"last":[0,1,0,0,0,1,0,0,1],
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]).sort_index(ascending = False)
df = df.assign(FakeC = df['A'] + df['B']) #you can comment out this line then replace FakeC with A+B for the following expressions
df = df.assign(value = np.where(df['last'] == 1, df['A'] + df['B'], df['FakeC'].rolling(2, min_periods=1).sum() - df['FakeC'] + 3 * df['A'] + 1.65 * df['B']))
df = df.assign(final = np.where(df['last'] == 1, df['value'], df['value'].rolling(2, min_periods=1).sum() - df['FakeC'].rolling(2, min_periods=1).sum() + df['FakeC']))
print(df)
Output:
A B C Group Want last FakeC value final
8 9 13 22 3 22.00 1 22 22.00 22.00
7 8 12 0 3 65.80 0 20 65.80 65.80
6 7 11 0 3 104.95 0 18 59.15 104.95
5 6 10 16 2 16.00 1 16 16.00 16.00
4 5 9 0 2 45.85 0 14 45.85 45.85
3 4 8 0 2 71.05 0 12 39.20 71.05
2 3 7 10 2 91.60 0 10 32.55 59.75
1 2 6 0 1 8.00 1 8 8.00 8.00
0 1 5 0 1 19.25 0 6 19.25 19.25
Thanks for your help. This is the updated df that includes the correct values df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])
– xyz
Sep 12 '18 at 11:54
Thank you. I had considered your approach before. I think the only drawback is that you would have to "reverse" the "final" column in order to re-index the whole df. Thanks anyways
– xyz
Sep 12 '18 at 19:53
that is caused by rolling can't support negative window size. I think alternative solution is use
for in
loop to iter each df.iloc
to implement sum(i, i+1)– Sphinx
Sep 12 '18 at 20:04
for in
df.iloc
You are right. That's why I considered the for loop in the original statement. Thanks anyways for your help
– xyz
Sep 12 '18 at 20:13
Here is my attempt. I try to keep things as straight-forward as possible.
import pandas as pd
import numpy as np
df = pd.DataFrame(
"A":range(1,10),
"B":range(5,14),
"Group": [1,1,2,2,2,2,3,3,3],
"C":[0,0,10,0,0,16,0,0,22],
"want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22],
"last":[0,1,0,0,0,1,0,0,1])
# Determine where the last-in-group value applies.
condition = df['Group'] != df['Group'].shift(-1)
# Calculate the alternative data.
alternate_formula = (3 * df['A'] + 1.65 * df['B'])
# Calculate C as either the 'last-in-group' function or the alternative function.
df['C'] = np.where(condition, df['A'] + df['B'], alternate_formula)
# Reverse the order of the dataframe, group by the 'Group' column, run a cumulative sum
# for each group and then resort the back to the original order.
df['C'] = df.sort_index(ascending=False).groupby('Group')['C'].cumsum().sort_index()
print(df)
I think it is usually preferable to use a vectorised approach than an iterative approach such as 'apply'.
Thank you for your thoughts on this. Please see the updated df that includes the correct values: df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])
– xyz
Sep 12 '18 at 11:51
@xyz reformulated the response based on the target. This solution requires that the index is originally in ascending order.
– Karnage
Sep 13 '18 at 22:42
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
It would be very helpful to see a completed column of desired values.
– Karnage
Sep 12 '18 at 1:23