Combine dataframes result with a DatetimeIndex index

i have a pandas dataframe with random values at every minute.

import pandas as pd import numpy as np df = pd.DataFrame(data=np.random.randint(0,30,size=20), index=pd.date_range("20180101", periods=20, freq='T')) df 0 2018-01-01 00:00:00 21 2018-01-01 00:01:00 21 2018-01-01 00:02:00 23 2018-01-01 00:03:00 18 2018-01-01 00:04:00 3 2018-01-01 00:05:00 11 2018-01-01 00:06:00 3 2018-01-01 00:07:00 4 2018-01-01 00:08:00 5 2018-01-01 00:09:00 25 2018-01-01 00:10:00 15 2018-01-01 00:11:00 11 2018-01-01 00:12:00 29 2018-01-01 00:13:00 22 2018-01-01 00:14:00 7 2018-01-01 00:15:00 13 2018-01-01 00:16:00 26 2018-01-01 00:17:00 7 2018-01-01 00:18:00 26 2018-01-01 00:19:00 15

Now, I must create a new column in the dataframe df that "reflects" the mean() of a window of 2 periods on an higher frequency(5 minutes).

df

df2 = df.resample('5T').sum().rolling(2).mean() df2 0 2018-01-01 00:00:00 NaN 2018-01-01 00:05:00 67.0 2018-01-01 00:10:00 66.0 2018-01-01 00:15:00 85.5

Here comes the problem. I need to "map" somehow the values of the "higher frequency" frame to the lower.
I should get something like:

0 new_column 2018-01-01 00:00:00 21 NaN 2018-01-01 00:01:00 21 NaN 2018-01-01 00:02:00 23 NaN 2018-01-01 00:03:00 18 NaN 2018-01-01 00:04:00 3 NaN 2018-01-01 00:05:00 11 67.0 2018-01-01 00:06:00 3 67.0 2018-01-01 00:07:00 4 67.0 2018-01-01 00:08:00 5 67.0 2018-01-01 00:09:00 25 67.0 2018-01-01 00:10:00 15 66.0 2018-01-01 00:11:00 11 66.0 2018-01-01 00:12:00 29 66.0 2018-01-01 00:13:00 22 66.0 2018-01-01 00:14:00 7 66.0 2018-01-01 00:15:00 13 85.5 2018-01-01 00:16:00 26 85.5 2018-01-01 00:17:00 7 85.5 2018-01-01 00:18:00 26 85.5 2018-01-01 00:19:00 15 85.5

I am using pandas 0.23.4

0.23.4

2 Answers
2

You can just use:

df['new_column'] = df2[0].repeat(5).values

with 5 being your resampling factor

5

your code seems faster. I think is better to just return a list of values instead of recreate the dataframe

– Dail
Sep 9 '18 at 13:30

You can pd.concat both dataframes and fillforward

pd.concat

fillforward

df3=pd.concat([df,df2],axis=1).ffill()

It works but i do not understand why, i mean, df and df2 have different index so how pandas can understand it?

– Dail
Sep 9 '18 at 12:51

@Dail They have the same index. Just that some indexes are missing in the resample, which we fill forward from previous values... For ex,2018-01-01 00:05:00 is present in both dfs. Concat works based on indexes....

– TheMaster
Sep 9 '18 at 13:05

2018-01-01 00:05:00

so, are you telling me that resampling does not remove indexes? yes like the one you wrote it exists on both. basically it fills missing indexes of df2?

– Dail
Sep 9 '18 at 13:09

another question, can i assign the values to a new column directly?

– Dail
Sep 9 '18 at 13:09

@Dail resampling does not remove indexes? It removes indexes. one you wrote it exists on both. basically it fills missing indexes of df2? Yes. concat with same indexes pandas.pydata.org/pandas-docs/stable/generated/… fill other values. can i assign the values to a new column directly? The result df3 is a dataframe. So, Yes,You can assign values directly, just as with any other dataframe series..

– TheMaster
Sep 9 '18 at 14:08

resampling does not remove indexes?

one you wrote it exists on both. basically it fills missing indexes of df2?

can i assign the values to a new column directly?

df3

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt