Pandas 数据框和字典的深拷贝

2023-08-31Python开发问题
17

本文介绍了Pandas 数据框和字典的深拷贝的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

I'm creating a small Pandas dataframe:

df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})

I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?

import copy
df_copy = copy.deepcopy(df)

A df_copy.head() gives the following:

Then I put these values into a dictionary:

mydict = df_copy.to_dict()

That dictionary looks like this:

Finally, I remove one item of the list:

mydict['colA'][0].remove("b")

I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:

I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:

1) how can I build a dictionary from a dataframe that doesn't update the dataframe?

2) how can I take a copy of a dataframe which would be completely independent?

thanks for your help!

Cheers, Nicolas

解决方案

Disclaimer


Notice that putting mutable objects inside a DataFrame can be an antipattern so make sure that you really need it and you understand what you are doing.

Why doesn't your copy independent


When applied on an object, copy.deepcopy is looked up for a _deepcopy_ method of that object, that is called in turn. It's added to avoid copying too much for objects. In the case of a DataFrame instance in version 0.20.0 and above - _deepcopy_ doesn`t work recursively.

Similarly, if you will use DataFrame.copy(deep=True) deep copy will copy the data, but will not do so recursively. .

How to solve the problem


To take a truly deep copy of a DataFrame containing a list(or other python objects), so that it will be independent - you can use one of the methods below.

df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))

For a dictionary, you may use same trick:

mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")

There's also a standard hacky way of deep-copying python objects:

import pickle
df_copy = pickle.loads(pickle.dumps(df))  

Feel free to ask for any clarifications, if needed.

这篇关于Pandas 数据框和字典的深拷贝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

在xarray中按单个维度的多个坐标分组
groupby multiple coords along a single dimension in xarray(在xarray中按单个维度的多个坐标分组)...
2024-08-22 Python开发问题
15

Pandas中的GROUP BY AND SUM不丢失列
Group by and Sum in Pandas without losing columns(Pandas中的GROUP BY AND SUM不丢失列)...
2024-08-22 Python开发问题
17

pandas 有从特定日期开始的按月分组的方式吗?
Is there a way of group by month in Pandas starting at specific day number?( pandas 有从特定日期开始的按月分组的方式吗?)...
2024-08-22 Python开发问题
10

GROUP BY+新列+基于条件的前一行抓取值
Group by + New Column + Grab value former row based on conditionals(GROUP BY+新列+基于条件的前一行抓取值)...
2024-08-22 Python开发问题
18

PANDA中的Groupby算法和插值算法
Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)...
2024-08-22 Python开发问题
11

PANAS-基于列对行进行分组,并将NaN替换为非空值
Pandas - Group Rows based on a column and replace NaN with non-null values(PANAS-基于列对行进行分组,并将NaN替换为非空值)...
2024-08-22 Python开发问题
10