Calculating distance and velocity between time ordered coordinates(计算时间有序坐标之间的距离和速度)
问题描述
I have a csv containing locations (latitude
,longitude
) for a given user denoted by the id
field, at a given time (timestamp
). I need to calculate the distance and the velocity between a point and the successive point for each user. For example, for ID 1 I need to find the distance and velocity between point 1 and point 2, point 2 and point 3, point 3 and point 4, and so on. Given I am working with coordinates on the Earth, I understand the Haversine metric will be used for distance calculations, however, I am unsure how to iterate though my file given the time and user order aspect to my problem. Given this, with python
, how can I iterate through my file to sort the events by user and by time, and then calculate the distance and velocity between each?
Ideally, the output would be a second csv looking something like: ID#, start_time, start_location, end_time, end_location, distance, velocity
.
Sample data below:
ID,timestamp,latitude,longitude
3,6/9/2017 22:20,38.7953326,77.0088833
1,5/5/2017 13:10,38.8890106,77.0500613
2,2/10/2017 16:23,40.7482494,73.9841913
1,5/5/2017 12:35,38.9206015,77.2223287
3,6/10/2017 10:00,42.3662109,71.0209426
1,5/5/2017 20:00,38.8974155,77.0368333
2,2/10/2017 7:30,38.8514261,77.0422981
3,6/9/2017 10:20,38.9173461,77.2225527
2,2/10/2017 19:51,40.7828687,73.9675438
3,6/10/2017 6:42,38.9542676,77.4496951
1,5/5/2017 16:35,38.8728748,77.0077629
2,2/10/2017 10:00,40.7769311,73.8761546
Seems like you could use the magic of pandas
.
Read the data
It's easy to create a pandas dataframe
from a csv file using the read_csv()
function:
import pandas as pd
df = pd.read_csv(filename)
Based on your sample data, this will create the following dataframe
:
ID timestamp latitude longitude
0 3 6/9/2017 22:20 38.795333 77.008883
1 1 5/5/2017 13:10 38.889011 77.050061
2 2 2/10/2017 16:23 40.748249 73.984191
3 1 5/5/2017 12:35 38.920602 77.222329
4 3 6/10/2017 10:00 42.366211 71.020943
5 1 5/5/2017 20:00 38.897416 77.036833
6 2 2/10/2017 7:30 38.851426 77.042298
7 3 6/9/2017 10:20 38.917346 77.222553
8 2 2/10/2017 19:51 40.782869 73.967544
9 3 6/10/2017 6:42 38.954268 77.449695
10 1 5/5/2017 16:35 38.872875 77.007763
11 2 2/10/2017 10:00 40.776931 73.876155
Convert the timestamp column
Pandas (and python in general) has extensive libraries for date and time operations. But first, you will need to prepare your data by converting the timestamp column (a string) into a datetime object. I am assuming your data is in the format "MM/DD/YYYY"
(since you didn't specify).
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%Y %H:%M')
Helper functions
You're going to have to define some functions to compute the distance and the velocity. The Haversine distance function is adapted from this answer.
from math import sin, cos, sqrt, atan2, radians
def getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2):
R = 6371 # Radius of the earth in km
dLat = radians(lat2-lat1)
dLon = radians(lon2-lon1)
rLat1 = radians(lat1)
rLat2 = radians(lat2)
a = sin(dLat/2) * sin(dLat/2) + cos(rLat1) * cos(rLat2) * sin(dLon/2) * sin(dLon/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = R * c # Distance in km
return d
def calc_velocity(dist_km, time_start, time_end):
"""Return 0 if time_start == time_end, avoid dividing by 0"""
return dist_km / (time_end - time_start).seconds if time_end > time_start else 0
Make some intermediate variables
We want to compute the Haversine function on each row, but we need some information from the first row for each group. Luckily, pandas
makes this easy with sort_values()
, groupby()
and transform()
.
The following code makes 3 new columns, one each for the initial latitude, longitude, and time for each ID.
# First sort by ID and timestamp:
df = df.sort_values(by=['ID', 'timestamp'])
# Group the sorted dataframe by ID, and grab the initial value for lat, lon, and time.
df['lat0'] = df.groupby('ID')['latitude'].transform(lambda x: x.iat[0])
df['lon0'] = df.groupby('ID')['longitude'].transform(lambda x: x.iat[0])
df['t0'] = df.groupby('ID')['timestamp'].transform(lambda x: x.iat[0])
Apply the functions
# create a new column for distance
df['dist_km'] = df.apply(
lambda row: getDistanceFromLatLonInKm(
lat1=row['latitude'],
lon1=row['longitude'],
lat2=row['lat0'],
lon2=row['lon0']
),
axis=1
)
# create a new column for velocity
df['velocity_kmps'] = df.apply(
lambda row: calc_velocity(
dist_km=row['dist_km'],
time_start=row['t0'],
time_end=row['timestamp']
),
axis=1
)
The Result
>>> print(df[['ID', 'timestamp', 'latitude', 'longitude', 'dist_km', 'velocity_kmps']])
ID timestamp latitude longitude dist_km velocity_kmps
3 1 2017-05-05 12:35:00 38.920602 77.222329 0.000000 0.000000
1 1 2017-05-05 13:10:00 38.889011 77.050061 15.314742 0.007293
10 1 2017-05-05 16:35:00 38.872875 77.007763 19.312148 0.001341
5 1 2017-05-05 20:00:00 38.897416 77.036833 16.255868 0.000609
6 2 2017-02-10 07:30:00 38.851426 77.042298 0.000000 0.000000
11 2 2017-02-10 10:00:00 40.776931 73.876155 344.880549 0.038320
2 2 2017-02-10 16:23:00 40.748249 73.984191 335.727502 0.010498
8 2 2017-02-10 19:51:00 40.782869 73.967544 339.206320 0.007629
7 3 2017-06-09 10:20:00 38.917346 77.222553 0.000000 0.000000
0 3 2017-06-09 22:20:00 38.795333 77.008883 22.942974 0.000531
9 3 2017-06-10 06:42:00 38.954268 77.449695 20.070609 0.000274
4 3 2017-06-10 10:00:00 42.366211 71.020943 648.450485 0.007611
From here, I will leave it to you to figure out how to grab the last entry for each ID.
这篇关于计算时间有序坐标之间的距离和速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:计算时间有序坐标之间的距离和速度


基础教程推荐
- 筛选NumPy数组 2022-01-01
- 如何在海运重新绘制中自定义标题和y标签 2022-01-01
- 在 Python 中,如果我在一个“with"中返回.块,文件还会关闭吗? 2022-01-01
- 线程时出现 msgbox 错误,GUI 块 2022-01-01
- Python kivy 入口点 inflateRest2 无法定位 libpng16-16.dll 2022-01-01
- 何时使用 os.name、sys.platform 或 platform.system? 2022-01-01
- 用于分类数据的跳跃记号标签 2022-01-01
- 使用PyInstaller后在Windows中打开可执行文件时出错 2022-01-01
- Dask.array.套用_沿_轴:由于额外的元素([1]),使用dask.array的每一行作为另一个函数的输入失败 2022-01-01
- 如何让 python 脚本监听来自另一个脚本的输入 2022-01-01