Python Pandas Bottom-Up Calculation Within Groups

Python Pandas Bottom-Up Calculation Within Groups



Assume I have the following dataset:


df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1],
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])



For the last observation per group, I have the following code:


def calculate(df):
if (df.last == 1):
value = df.loc["A"] + df.loc["B"]
else:



for all other observation per group, the row value is calculated as follows:


value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"])
return value



In simple english this is what I'm trying to do. For the last observation in each group, column C equals the sum of columns A and B


C


A and B



For all other observations, I would like to calculate the value of column C from the "bottom up" using the "else statement" above (which is there for a proof of concept).


C



To further clarify, these are the formulas for calculating the Want column for Group 2 using excel: F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7". There's some kind of "recursive" nature to it, which is why I thought of the "for loop"


Want


Group 2


F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7".



I would really appreciate a solution where it's consistent with the first if statement. That is


if statement.


value = something



rather than the function returning a data frame or something like that, so that I can call the function using the following


df["value"] = df.apply(calculate, axis=1)



Your help is appreciated. Thanks






It would be very helpful to see a completed column of desired values.

– Karnage
Sep 12 '18 at 1:23






@Karnage Please see the updated df with the desired values. I tried to edit the question but it doesn't allow me df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:46





3 Answers
3



Seems you need np.where with shift


np.where


shift


np.where(df.last==1,df.A+df.B,df.A+1.65*df.B+df.C.shift())
Out[199]: array([ nan, 11.9 , 14.55, 27.2 , 19.85, 22.5 , 41.15, 27.8 , 30.45])






Thanks for helping me. I have updated the df to include the correct values. df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:53



You can use df.assign and np.where and df.rolling to reach the goal.


df.assign


np.where


df.rolling


import pandas as pd
import numpy as np
df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22],
"last":[0,1,0,0,0,1,0,0,1],
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]).sort_index(ascending = False)
df = df.assign(FakeC = df['A'] + df['B']) #you can comment out this line then replace FakeC with A+B for the following expressions
df = df.assign(value = np.where(df['last'] == 1, df['A'] + df['B'], df['FakeC'].rolling(2, min_periods=1).sum() - df['FakeC'] + 3 * df['A'] + 1.65 * df['B']))
df = df.assign(final = np.where(df['last'] == 1, df['value'], df['value'].rolling(2, min_periods=1).sum() - df['FakeC'].rolling(2, min_periods=1).sum() + df['FakeC']))
print(df)



Output:


A B C Group Want last FakeC value final
8 9 13 22 3 22.00 1 22 22.00 22.00
7 8 12 0 3 65.80 0 20 65.80 65.80
6 7 11 0 3 104.95 0 18 59.15 104.95
5 6 10 16 2 16.00 1 16 16.00 16.00
4 5 9 0 2 45.85 0 14 45.85 45.85
3 4 8 0 2 71.05 0 12 39.20 71.05
2 3 7 10 2 91.60 0 10 32.55 59.75
1 2 6 0 1 8.00 1 8 8.00 8.00
0 1 5 0 1 19.25 0 6 19.25 19.25






Thanks for your help. This is the updated df that includes the correct values df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:54






Thank you. I had considered your approach before. I think the only drawback is that you would have to "reverse" the "final" column in order to re-index the whole df. Thanks anyways

– xyz
Sep 12 '18 at 19:53






that is caused by rolling can't support negative window size. I think alternative solution is use for in loop to iter each df.iloc to implement sum(i, i+1)

– Sphinx
Sep 12 '18 at 20:04


for in


df.iloc






You are right. That's why I considered the for loop in the original statement. Thanks anyways for your help

– xyz
Sep 12 '18 at 20:13



Here is my attempt. I try to keep things as straight-forward as possible.


import pandas as pd
import numpy as np

df = pd.DataFrame(
"A":range(1,10),
"B":range(5,14),
"Group": [1,1,2,2,2,2,3,3,3],
"C":[0,0,10,0,0,16,0,0,22],
"want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22],
"last":[0,1,0,0,0,1,0,0,1])

# Determine where the last-in-group value applies.
condition = df['Group'] != df['Group'].shift(-1)

# Calculate the alternative data.
alternate_formula = (3 * df['A'] + 1.65 * df['B'])

# Calculate C as either the 'last-in-group' function or the alternative function.
df['C'] = np.where(condition, df['A'] + df['B'], alternate_formula)

# Reverse the order of the dataframe, group by the 'Group' column, run a cumulative sum
# for each group and then resort the back to the original order.
df['C'] = df.sort_index(ascending=False).groupby('Group')['C'].cumsum().sort_index()

print(df)



I think it is usually preferable to use a vectorised approach than an iterative approach such as 'apply'.






Thank you for your thoughts on this. Please see the updated df that includes the correct values: df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:51






@xyz reformulated the response based on the target. This solution requires that the index is originally in ascending order.

– Karnage
Sep 13 '18 at 22:42




Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)