mode imputation by groups in pandas (handling group modes that are NaN)(在 pandas 中按组分配模式(处理NaN的组模式))
                            本文介绍了在 pandas 中按组分配模式(处理NaN的组模式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
                        
                        问题描述
我有一个分类列&WALLSMATERIAL_MODE&QOOT;其中包含NAN,我希望通过以下组[‘NAME_RECOVICATION_TYPE’,‘AGE_GROUP’]将其归因于NAN:
    NAME_EDUCATION_TYPE             AGE_GROUP   WALLSMATERIAL_MODE
20  Secondary / secondary special   45-60       Stone, brick
21  Secondary / secondary special   21-45       NaN
22  Secondary / secondary special   21-45       Panel
23  Secondary / secondary special   60-70       Mixed
24  Secondary / secondary special   21-45       Panel
25  Secondary / secondary special   45-60       Stone, brick
26  Secondary / secondary special   45-60       Wooden
27  Secondary / secondary special   21-45       NaN
28  Higher education                21-45       NaN
29  Higher education                21-45       Panel
可再生性代码
df = pd.DataFrame({'NAME_EDUCATION_TYPE': {20: 'Secondary / secondary special',
  21: 'Secondary / secondary special',
  22: 'Secondary / secondary special',
  23: 'Secondary / secondary special',
  24: 'Secondary / secondary special',
  25: 'Secondary / secondary special',
  26: 'Secondary / secondary special',
  27: 'Secondary / secondary special',
  28: 'Higher education',
  29: 'Higher education'},
 'AGE_GROUP': {20: '45-60',
  21: '21-45',
  22: '21-45',
  23: '60-70',
  24: '21-45',
  25: '45-60',
  26: '45-60',
  27: '21-45',
  28: '21-45',
  29: '21-45'},
 'WALLSMATERIAL_MODE': {20: 'Stone, brick',
  21: np.nan,
  22: 'Panel',
  23: 'Mixed',
  24: 'Panel',
  25: 'Stone, brick',
  26: 'Wooden',
  27: np.nan,
  28: np.nan,
  29: 'Panel'}})
我尝试从这个post改编以下函数,该函数适用于中位数推算并处理非中位数的组中值
输入:
def mode(s):
    if pd.isnull(s.mode()):
        return df['WALLSMATERIAL_MODE'].mode()
    return s.mode()
        
df['WALLSMATERIAL_MODE'] = df['WALLSMATERIAL_MODE'].groupby([df['NAME_EDUCATION_TYPE'], df['AGE_GROUP']], dropna=False).apply(lambda x: x.fillna(mode(x)))
out:调用pd.isull时引发以下错误
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
我不明白,我已尝试在所有组模式上应用pd.isull,但没有引发此错误。请参阅下面的群组模式
输入:
df['WALLSMATERIAL_MODE'].groupby([df['NAME_EDUCATION_TYPE'], df['AGE_GROUP']]).agg(pd.Series.mode).to_dict()
输出:
{('Higher education', '60-70'): nan,
 ('Higher education', '45-60'): nan,
 ('Higher education', '21-45'): 'Panel',
 ('Higher education', '0-21'): nan,
 ('Secondary / secondary special', '60-70'): 'Mixed',
 ('Secondary / secondary special', '45-60'): 'Stone, brick',
 ('Secondary / secondary special', '21-45'): 'Panel',
 ('Secondary / secondary special', '0-21'): nan}
如果有人能指出错误在哪里,或者是否有有效的方法对本专栏进行分组归罪,我将不胜感激!
推荐答案
下面的代码似乎使用了Try Except来完成此操作。我宁愿避免使用Try,除非我想不出一种更干净的方法。
def mode_cats(s):
        try:
            if pd.isnull(s.mode().any()): # check if the mode of the subgroup is NaN or contains NaN 
                                          # (mode() may indeed return a list of several modes)
                m = app_train_dash['WALLSMATERIAL_MODE'].mode().iloc[0] # returns the mode of the column
            else:
                m = s.mode().iloc[0]  # returns the mode of the subgroup
            return m
        except IndexError: # mode returns an empty series if the subgroup consists of a single NaN value
                           # this causes s.mode().iloc[0] to raise an index error
            return app_train_dash['WALLSMATERIAL_MODE'].mode().iloc[0]
正如@Ben.T指出的那样,我必须使用.iloc[0]和.mode()
但是当.mode().iloc[0]有一个空数组作为输入时,我得到IndexError: single positional indexer is out-of-bounds。
错误回溯:
- 模式()在一行的子组上被调用,值=NaN。.mode()返回单个NaN的这个子组的空数组
 - 对传递的空数组调用pd.isull并返回空数组
 - 对空数组调用.iloc[0]会引发索引错误
 
这篇关于在 pandas 中按组分配模式(处理NaN的组模式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
				 沃梦达教程
				
			本文标题为:在 pandas 中按组分配模式(处理NaN的组模式)
				
        
 
            
        基础教程推荐
             猜你喜欢
        
	     - PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
 - 包装空间模型 2022-01-01
 - PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
 - 修改列表中的数据帧不起作用 2022-01-01
 - 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
 - 求两个直方图的卷积 2022-01-01
 - Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01
 - 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
 - 在Python中从Azure BLOB存储中读取文件 2022-01-01
 - 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
 
    	
    	
    	
    	
    	
    	
    	
    	
				
				
				
				