Python regularise irregular time series with linear interpolation(Python用线性插值正则化不规则时间序列)
问题描述
我在 pandas 中有一个时间序列,如下所示:
I have a time series in pandas that looks like this:
Values
1992-08-27 07:46:48 28.0
1992-08-27 08:00:48 28.2
1992-08-27 08:33:48 28.4
1992-08-27 08:43:48 28.8
1992-08-27 08:48:48 29.0
1992-08-27 08:51:48 29.2
1992-08-27 08:53:48 29.6
1992-08-27 08:56:48 29.8
1992-08-27 09:03:48 30.0
我想将其重新采样为具有 15 分钟步长的常规时间序列,其中值是线性插值的.基本上我想得到:
I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:
Values
1992-08-27 08:00:00 28.2
1992-08-27 08:15:00 28.3
1992-08-27 08:30:00 28.4
1992-08-27 08:45:00 28.8
1992-08-27 09:00:00 29.9
但是使用 Pandas 的重采样方法 (df.resample('15Min')) 我得到:
However using the resample method (df.resample('15Min')) from Pandas I get:
Values
1992-08-27 08:00:00 28.20
1992-08-27 08:15:00 NaN
1992-08-27 08:30:00 28.60
1992-08-27 08:45:00 29.40
1992-08-27 09:00:00 30.00
我尝试过使用不同的how"和fill_method"参数的重采样方法,但从未得到我想要的结果.我是不是用错了方法?
I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?
我认为这是一个相当简单的查询,但我已经在网上搜索了一段时间并没有找到答案.
I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.
提前感谢我能得到的任何帮助.
Thanks in advance for any help I can get.
推荐答案
这需要一些工作,但试试这个.基本思想是找到最接近每个重采样点的两个时间戳并进行插值.np.searchsorted
用于查找最接近重采样点的日期.
It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted
is used to find dates closest to the resample point.
# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)
# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]
#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
rs['Values'] = rs.eval('before * before_weight + after * after_weight')
毕竟,希望是正确的答案:
After all that, hopefully the right answer:
In [161]: rs['Values']
Out[161]:
1992-08-27 08:00:00 28.011429
1992-08-27 08:15:00 28.313939
1992-08-27 08:30:00 28.223030
1992-08-27 08:45:00 28.952000
1992-08-27 09:00:00 29.908571
Freq: 15T, Name: Values, dtype: float64
这篇关于Python用线性插值正则化不规则时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python用线性插值正则化不规则时间序列


基础教程推荐
- 使用 Google App Engine (Python) 将文件上传到 Google Cloud Storage 2022-01-01
- 症状类型错误:无法确定关系的真值 2022-01-01
- 如何在Python中绘制多元函数? 2022-01-01
- 使用Python匹配Stata加权xtil命令的确定方法? 2022-01-01
- 将 YAML 文件转换为 python dict 2022-01-01
- 如何在 Python 中检测文件是否为二进制(非文本)文 2022-01-01
- Python 的 List 是如何实现的? 2022-01-01
- 哪些 Python 包提供独立的事件系统? 2022-01-01
- 使 Python 脚本在 Windows 上运行而不指定“.py";延期 2022-01-01
- 合并具有多索引的两个数据帧 2022-01-01