Subtracting time within a column in pandas
Subtracting time within a column in pandas
I have been trying to subtract dates and time between columns that I read into python using pandas dataframe. I wrote the code as follows:
Time = df['t'] - df['t'].shift(1) + df['t']
This leads to error message. My input and intended output are stated below
Input data:
t =
9:47:00
9:48:00
9:49:00
9:50:00
9:51:00
9:52:00
9:53:00
9:54:00
I am hoping the code will produce the output data(cummulative hour that starts from zero.
0
0.016666667
0.033333333
0.05
0.066666667
0.083333333
0.1
I also tried to turn it into a string using datetime.strptime('t',"%H:%M:%S.%f") and I still get the error.
datetime.strptime('t',"%H:%M:%S.%f")
Any suggestions would be appreciated.
you seem to get a result of 2t - t[-1], which most likely not what you want in any case. I'd suggest dropping last
df['t']– EPo
Aug 30 at 22:04
df['t']
3 Answers
3
Similar to Chris' solution. You should be working with timedeltas from the outset for a simpler solution.
timedelta
v = pd.to_timedelta(df['t'])
((v - v.shift())).dt.total_seconds().cumsum().div(3600).fillna(0)
0 0.000000
1 0.016667
2 0.033333
3 0.050000
4 0.066667
5 0.083333
6 0.100000
7 0.116667
Name: t, dtype: float64
import pandas as pd
# assuming you're working with strings:
t = [
'9:47:00', '9:48:00', '9:49:00', '9:50:00',
'9:51:00', '9:52:00', '9:53:00', '9:54:00'
]
df = pd.DataFrame('time' : pd.to_datetime(t, infer_datetime_format=True))
print(df)
#> time
#> 0 2018-08-30 09:47:00
#> 1 2018-08-30 09:48:00
#> 2 2018-08-30 09:49:00
#> 3 2018-08-30 09:50:00
#> 4 2018-08-30 09:51:00
#> 5 2018-08-30 09:52:00
#> 6 2018-08-30 09:53:00
#> 7 2018-08-30 09:54:00
df['time_shift'] = df.shift(-1)
df['tdelt'] = df['time_shift'] - df['time']
x = df.tdelt.astype('timedelta64[s]') / 3600
x = x[1:-1].cumsum()
print(x)
#> 1 0.016667
#> 2 0.033333
#> 3 0.050000
#> 4 0.066667
#> 5 0.083333
#> 6 0.100000
#> Name: tdelt, dtype: float64
Created on 2018-08-30 by the reprexpy package
import reprexpy
print(reprexpy.SessionInfo())
#> Session info --------------------------------------------------------------------
#> Platform: Windows-7-6.1.7601-SP1 (64-bit)
#> Python: 3.6
#> Date: 2018-08-30
#> Packages ------------------------------------------------------------------------
#> pandas==0.23.4
#> reprexpy==0.1.1
I believe you can also use
pd.Series.diff instead of creating a helper series.– jpp
Aug 30 at 22:27
pd.Series.diff
Unable to use your suggested approach. here is the error message: File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 368, in _convert_listlike require_iso8601=require_iso8601 File "pandas/_libs/tslib.pyx", line 492, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 744, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 677, in pandas._libs.tslib.array_to_datetime TypeError: <class 'datetime.time'> is not convertible to datetime >>>
– KJ1
Sep 1 at 15:20
what version of pandas are you using? my solution used 0.23.4.
– Chris
Sep 4 at 14:10
I think this is an answer.
Or maybe in other way -
# lets have some data to process:
stamps = pd.date_range('2018-05-19 18:15:05', periods=4, freq='2H')
df = pd.DataFrame(stamps)
df.apply(lambda e : e + datetime.timedelta(seconds=1))
Shows:
0
0 2018-05-19 18:15:06
1 2018-05-19 20:15:06
2 2018-05-19 22:15:06
3 2018-05-20 00:15:06
And this:
df.apply(lambda e : e + datetime.timedelta(seconds=100))
0
0 2018-05-19 18:16:45
1 2018-05-19 20:16:45
2 2018-05-19 22:16:45
3 2018-05-20 00:16:45
Just to see difference. Tip here is to stick with dates. That way timedelta works. You can use combine to make Datetime from Date, and Time.
timedelta
combine
Datetime
Date
Time
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Minimal, complete, verifiable example applies here: please supply a code segment that reproduces the problem, rather than one-line snippets. Your one-line code references only one column -- two of the terms are the same.
– Prune
Aug 30 at 21:56