Combine dataframes result with a DatetimeIndex index
Combine dataframes result with a DatetimeIndex index
i have a pandas dataframe with random values at every minute.
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randint(0,30,size=20), index=pd.date_range("20180101", periods=20, freq='T'))
df
0
2018-01-01 00:00:00 21
2018-01-01 00:01:00 21
2018-01-01 00:02:00 23
2018-01-01 00:03:00 18
2018-01-01 00:04:00 3
2018-01-01 00:05:00 11
2018-01-01 00:06:00 3
2018-01-01 00:07:00 4
2018-01-01 00:08:00 5
2018-01-01 00:09:00 25
2018-01-01 00:10:00 15
2018-01-01 00:11:00 11
2018-01-01 00:12:00 29
2018-01-01 00:13:00 22
2018-01-01 00:14:00 7
2018-01-01 00:15:00 13
2018-01-01 00:16:00 26
2018-01-01 00:17:00 7
2018-01-01 00:18:00 26
2018-01-01 00:19:00 15
Now, I must create a new column in the dataframe df
that "reflects" the mean() of a window of 2 periods on an higher frequency(5 minutes).
df
df2 = df.resample('5T').sum().rolling(2).mean()
df2
0
2018-01-01 00:00:00 NaN
2018-01-01 00:05:00 67.0
2018-01-01 00:10:00 66.0
2018-01-01 00:15:00 85.5
Here comes the problem. I need to "map" somehow the values of the "higher frequency" frame to the lower.
I should get something like:
0 new_column
2018-01-01 00:00:00 21 NaN
2018-01-01 00:01:00 21 NaN
2018-01-01 00:02:00 23 NaN
2018-01-01 00:03:00 18 NaN
2018-01-01 00:04:00 3 NaN
2018-01-01 00:05:00 11 67.0
2018-01-01 00:06:00 3 67.0
2018-01-01 00:07:00 4 67.0
2018-01-01 00:08:00 5 67.0
2018-01-01 00:09:00 25 67.0
2018-01-01 00:10:00 15 66.0
2018-01-01 00:11:00 11 66.0
2018-01-01 00:12:00 29 66.0
2018-01-01 00:13:00 22 66.0
2018-01-01 00:14:00 7 66.0
2018-01-01 00:15:00 13 85.5
2018-01-01 00:16:00 26 85.5
2018-01-01 00:17:00 7 85.5
2018-01-01 00:18:00 26 85.5
2018-01-01 00:19:00 15 85.5
I am using pandas 0.23.4
0.23.4
2 Answers
2
You can just use:
df['new_column'] = df2[0].repeat(5).values
with 5
being your resampling factor
5
You can pd.concat
both dataframes and fillforward
pd.concat
fillforward
df3=pd.concat([df,df2],axis=1).ffill()
It works but i do not understand why, i mean, df and df2 have different index so how pandas can understand it?
– Dail
Sep 9 '18 at 12:51
@Dail They have the same index. Just that some indexes are missing in the resample, which we fill forward from previous values... For ex,
2018-01-01 00:05:00
is present in both dfs. Concat works based on indexes....– TheMaster
Sep 9 '18 at 13:05
2018-01-01 00:05:00
so, are you telling me that resampling does not remove indexes? yes like the one you wrote it exists on both. basically it fills missing indexes of df2?
– Dail
Sep 9 '18 at 13:09
another question, can i assign the values to a new column directly?
– Dail
Sep 9 '18 at 13:09
@Dail
resampling does not remove indexes?
It removes indexes. one you wrote it exists on both. basically it fills missing indexes of df2?
Yes. concat with same indexes pandas.pydata.org/pandas-docs/stable/generated/… fill other values. can i assign the values to a new column directly?
The result df3
is a dataframe. So, Yes,You can assign values directly, just as with any other dataframe series..– TheMaster
Sep 9 '18 at 14:08
resampling does not remove indexes?
one you wrote it exists on both. basically it fills missing indexes of df2?
can i assign the values to a new column directly?
df3
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
your code seems faster. I think is better to just return a list of values instead of recreate the dataframe
– Dail
Sep 9 '18 at 13:30