Python Pandas Bottom-Up Calculation Within Groups

Assume I have the following dataset:

df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

For the last observation per group, I have the following code:

def calculate(df): if (df.last == 1): value = df.loc["A"] + df.loc["B"] else:

for all other observation per group, the row value is calculated as follows:

value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"]) return value

In simple english this is what I'm trying to do. For the last observation in each group, column C equals the sum of columns A and B

C

A and B

For all other observations, I would like to calculate the value of column C from the "bottom up" using the "else statement" above (which is there for a proof of concept).

C

To further clarify, these are the formulas for calculating the Want column for Group 2 using excel: F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7". There's some kind of "recursive" nature to it, which is why I thought of the "for loop"

Want

Group 2

F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7".

I would really appreciate a solution where it's consistent with the first if statement. That is

if statement.

value = something

rather than the function returning a data frame or something like that, so that I can call the function using the following

df["value"] = df.apply(calculate, axis=1)

Your help is appreciated. Thanks

It would be very helpful to see a completed column of desired values.

– Karnage
Sep 12 '18 at 1:23

@Karnage Please see the updated df with the desired values. I tried to edit the question but it doesn't allow me df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:46

3 Answers
3

Seems you need np.where with shift

np.where

shift

np.where(df.last==1,df.A+df.B,df.A+1.65*df.B+df.C.shift()) Out[199]: array([ nan, 11.9 , 14.55, 27.2 , 19.85, 22.5 , 41.15, 27.8 , 30.45])

Thanks for helping me. I have updated the df to include the correct values. df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:53

You can use df.assign and np.where and df.rolling to reach the goal.

df.assign

np.where

df.rolling

import pandas as pd import numpy as np df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]).sort_index(ascending = False) df = df.assign(FakeC = df['A'] + df['B']) #you can comment out this line then replace FakeC with A+B for the following expressions df = df.assign(value = np.where(df['last'] == 1, df['A'] + df['B'], df['FakeC'].rolling(2, min_periods=1).sum() - df['FakeC'] + 3 * df['A'] + 1.65 * df['B'])) df = df.assign(final = np.where(df['last'] == 1, df['value'], df['value'].rolling(2, min_periods=1).sum() - df['FakeC'].rolling(2, min_periods=1).sum() + df['FakeC'])) print(df)

Output:

A B C Group Want last FakeC value final 8 9 13 22 3 22.00 1 22 22.00 22.00 7 8 12 0 3 65.80 0 20 65.80 65.80 6 7 11 0 3 104.95 0 18 59.15 104.95 5 6 10 16 2 16.00 1 16 16.00 16.00 4 5 9 0 2 45.85 0 14 45.85 45.85 3 4 8 0 2 71.05 0 12 39.20 71.05 2 3 7 10 2 91.60 0 10 32.55 59.75 1 2 6 0 1 8.00 1 8 8.00 8.00 0 1 5 0 1 19.25 0 6 19.25 19.25

Thanks for your help. This is the updated df that includes the correct values df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:54

Thank you. I had considered your approach before. I think the only drawback is that you would have to "reverse" the "final" column in order to re-index the whole df. Thanks anyways

– xyz
Sep 12 '18 at 19:53

that is caused by rolling can't support negative window size. I think alternative solution is use for in loop to iter each df.iloc to implement sum(i, i+1)

– Sphinx
Sep 12 '18 at 20:04

for in

df.iloc

You are right. That's why I considered the for loop in the original statement. Thanks anyways for your help

– xyz
Sep 12 '18 at 20:13

Here is my attempt. I try to keep things as straight-forward as possible.

import pandas as pd import numpy as np df = pd.DataFrame( "A":range(1,10), "B":range(5,14), "Group": [1,1,2,2,2,2,3,3,3], "C":[0,0,10,0,0,16,0,0,22], "want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22], "last":[0,1,0,0,0,1,0,0,1]) # Determine where the last-in-group value applies. condition = df['Group'] != df['Group'].shift(-1) # Calculate the alternative data. alternate_formula = (3 * df['A'] + 1.65 * df['B']) # Calculate C as either the 'last-in-group' function or the alternative function. df['C'] = np.where(condition, df['A'] + df['B'], alternate_formula) # Reverse the order of the dataframe, group by the 'Group' column, run a cumulative sum # for each group and then resort the back to the original order. df['C'] = df.sort_index(ascending=False).groupby('Group')['C'].cumsum().sort_index() print(df)

I think it is usually preferable to use a vectorised approach than an iterative approach such as 'apply'.

Thank you for your thoughts on this. Please see the updated df that includes the correct values: df = pd.DataFrame("A":range(1,10), "B":range(5,14), "Group":[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1], "Want":[19.25,8,91.6,71.05,45.85,16,104.95,65.8,22])

– xyz
Sep 12 '18 at 11:51

@xyz reformulated the response based on the target. This solution requires that the index is originally in ascending order.

– Karnage
Sep 13 '18 at 22:42

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt