问题描述
我有以下熊猫数据框.
import pandas as pd
df = pd.read_csv('filename.csv')
print(df)
dog A B C
0 dog1 0.787575 0.159330 0.053095
1 dog10 0.770698 0.169487 0.059815
2 dog11 0.792689 0.152043 0.055268
3 dog12 0.785066 0.160361 0.054573
4 dog13 0.795455 0.150464 0.054081
5 dog14 0.794873 0.150700 0.054426
.. ....
8 dog19 0.811585 0.140207 0.048208
9 dog2 0.797202 0.152033 0.050765
10 dog20 0.801607 0.145137 0.053256
11 dog21 0.792689 0.152043 0.055268
....
我通过汇总列 "A"、"B"、"C" 来创建一个新列,如下所示:
I create a new column by summing columns "A", "B", "C" as follows:
df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)
现在我想根据条件执行此操作,即 if "A" <0.78 然后创建一个新的求和列 df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1).否则,该值应为零.
Now I would like to do this based on a conditional, i.e. if "A" < 0.78 then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1). Otherwise, the value should be zero.
如何创建这样的条件语句?
How does one create conditional statements like this?
我的想法是使用
df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))
但是,这不起作用,我无法指定轴.
However, this doesn't work and I'm not able to specify axis.
如何根据其他列的值创建列?
How do you create a column based on the values of other columns?
您也可以为每个 df['dog'] == 'dog2' 创建列 dog2_sum,即
You could also do something like for each df['dog'] == 'dog2', create column dog2_sum, i.e.
df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))
但我的方法不正确.
`
推荐答案
下面应该可以了,这里我们屏蔽满足条件的df,这会将NaN设置为条件所在的行不满足,所以我们在新的 col 上调用 fillna:
The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:
In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df
Out[67]:
A B C
0 0.197334 0.707852 -0.443475
1 -1.063765 -0.914877 1.585882
2 0.899477 1.064308 1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858 0.777523 -0.453747
In [73]:
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df
Out[73]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
另一种方法是调用 where 在 sum 结果上,当条件不满足时,这需要一个值参数来返回:
Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:
In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df
Out[75]:
A B C total
0 0.197334 0.707852 -0.443475 0.905186
1 -1.063765 -0.914877 1.585882 0.000000
2 0.899477 1.064308 1.426789 1.963785
3 -0.556486 -0.150080 -0.149494 0.000000
4 -0.035858 0.777523 -0.453747 0.000000
这篇关于Pandas:如何根据其他列值的条件对列进行求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)