Python: Column and Row operations without using loop
Python: Column and Row operations without using loop
I have a the below df1:
df1
Date Tickers Qty
01-01-2018 ABC 25
02-01-2018 BCD 25
02-01-2018 XYZ 31
05-01-2018 XYZ 25
and another df2 as below
df2
Date ABC BCD XYZ
01-01-2018 123 5 78
02-01-2018 125 7 79
03-01-2018 127 6 81
04-01-2018 126 7 82
05-01-2018 124 6 83
I want a resultant column in df1 which is the product of the correct column and row in df2 - getting the right ticker's rate on the given date and let the other dates have nan within df1
df1
df2
df1
Date df1['Product']
01-01-2018 3075
02-01-2018 175
02-01-2018 2449
03-01-2018 nan
04-01-2018 nan
05-01-2018 2075
This seems like standard python operation, but I just am unable to achieve this without writing a loop - which is taking a very long time to execute:
I merged the above 2 tables on Date and then ran the below loop
Date
for i in range(len(df1)):
try:
df1['Product'][i] = df1[df1['Ticker'][i]][i]
except ValueError:
df['Product'][i] = np.nan
Is there any better pythonic way of achieving this and not writing this loop pls?
2 Answers
2
Use:
df11 = df1.pivot('Date', 'Tickers','Qty')
df22 = df2.set_index('Date')
s = df22.mul(df11).bfill(axis=1).iloc[:, 0]
print (s)
Date
01-01-2018 3075.0
02-01-2018 175.0
03-01-2018 NaN
04-01-2018 NaN
05-01-2018 2075.0
Name: ABC, dtype: float64
Solution for add new column to df1:
df1
df11 = df1.pivot('Date', 'Tickers','Qty')
df22 = df2.set_index('Date')
df = df1.join(df22.mul(df11).stack().rename('new'), on=['Date','Tickers'], how='left')
print (df)
Date Tickers Qty new
0 01-01-2018 ABC 25 3075.0
1 02-01-2018 BCD 25 175.0
2 05-01-2018 XYZ 25 2075.0
EDIT:
If pairs Dates with Tickers are duplicated, solution above is not possible use.
Date
Tickers
print (df1)
Date Tickers Qty
0 01-01-2018 ABC 25
1 01-01-2018 ABC 20 <-added duplicated pairs 01-01-2018 and ABC
2 02-01-2018 XYZ 31
3 02-01-2018 BCD 25
4 05-01-2018 XYZ 25
df3 = df1[['Date']].copy()
#add new values to column
df3['new'] = df2.set_index('Date').lookup(df1['Date'], df1['Tickers']) * df1['Qty']
#add missing values to duplicated Dates
df3 = df2[['Date']].drop_duplicates().merge(df3, how='left')
print (df3)
Date new
0 01-01-2018 3075.0
1 01-01-2018 2460.0
2 02-01-2018 2449.0
3 02-01-2018 175.0
4 03-01-2018 NaN
5 04-01-2018 NaN
6 05-01-2018 2075.0
df11 = df1.pivot('Date', 'Tickers','Qty') gives ValueError: Index contains duplicate entries, cannot reshape– spiff
Sep 12 '18 at 6:57
df11 = df1.pivot('Date', 'Tickers','Qty')
ValueError: Index contains duplicate entries, cannot reshape
apologies for wasting your time so far by not providing the info that there can be multiple tickers on a single date..
– spiff
Sep 12 '18 at 6:58
could you help with the updated info pls?
– spiff
Sep 12 '18 at 7:38
@spiff - Yes, sure. Give me some time.
– jezrael
Sep 12 '18 at 7:38
ah ofcourse - thank you very much!
– spiff
Sep 12 '18 at 7:39
you need to set 'Date' as index and multiply,
df1=df1.set_index('Date')
df2=df2.set_index('Date')
df3=(df2['ABC']*df1['Qty']).reset_index()
print(df3)
Date 0
0 01-01-2018 3075.0
1 02-01-2018 3125.0
2 03-01-2018 NaN
3 04-01-2018 NaN
4 05-01-2018 3100.0
thanks vm - but i think this may not work if I have more than one tickers on the same date? Edit: maybe I should have explained that in my example itself, apologies for that
– spiff
Sep 12 '18 at 6:38
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I would like to add that i could have more than one tickers on a given date and so using date as a pure index may not help?
– spiff
Sep 12 '18 at 6:53