Pandas - group by column and transform the data to numpy array(Pandas - 按列分组并将数据转换为 numpy 数组)
问题描述
Having the following data frame, group A have 4 samples, B 3 samples and C 1 sample:
group data_1 data_2
0 A 1 4
1 A 2 5
2 A 3 6
3 A 4 7
4 B 1 4
5 B 2 5
6 B 3 6
7 C 1 4
I would like to transform the data into numpy array, where each row is a group with all its samples and zero padding for groups that have fewer samples.
Resulting in an array like so:
[
[[1,4],[2,5],[3,6],[4,7]], # this is A group 4 samples
[[1,4],[2,5],[3,6],[0,0]], # this is B group 3 samples
[[1,4],[0,0],[0,0],[0,0]], # this is C group 1 sample
]
First is necessary add missing values - first solution with unstack and stack, counter Series is created by cumcount.
Second solution use reindex by MultiIndex.
Last use lambda function with groupby, convert to numpy array by values and last to lists:
g = df.groupby('group').cumcount()
L = (df.set_index(['group',g])
.unstack(fill_value=0)
.stack().groupby(level=0)
.apply(lambda x: x.values.tolist())
.tolist())
print (L)
[[[1, 4], [2, 5], [3, 6], [4, 7]],
[[1, 4], [2, 5], [3, 6], [0, 0]],
[[1, 4], [0, 0], [0, 0], [0, 0]]]
Another solution:
g = df.groupby('group').cumcount()
mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])
L = (df.set_index(['group',g])
.reindex(mux, fill_value=0)
.groupby(level=0)['data_1','data_2']
.apply(lambda x: x.values.tolist())
.tolist()
)
这篇关于Pandas - 按列分组并将数据转换为 numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Pandas - 按列分组并将数据转换为 numpy 数组
基础教程推荐
- Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01
- PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
- 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
- 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
- 修改列表中的数据帧不起作用 2022-01-01
- 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
- PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
- 包装空间模型 2022-01-01
- 求两个直方图的卷积 2022-01-01
- 在Python中从Azure BLOB存储中读取文件 2022-01-01
