问题描述
我正在尝试比较 python 中的两个 csv 文件并将差异保存到 python 2.7 中的第三个 csv 文件.
I am trying to compare two csv files in python and save the difference to a third csv file in python 2.7.
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
set1 = tuple(oldList1)
set2 = tuple(oldList2)
print oldList2.difference(oldList1)
我收到错误消息:
Traceback (most recent call last):
File "compare.py", line 21, in <module>
print oldList2.difference(oldList1)
AttributeError: 'list' object has no attribute 'difference'
我是 python 的新手,一般是编码,我还没有完成这段代码(我必须确保将差异存储到变量并将差异写入新的 csv 文件.).我整天都在努力解决这个问题,但我根本做不到.您的帮助将不胜感激.
I am new to python, and coding in general, and I am not done with this code just yet (I have to make sure to store the differences to a variable and write the difference to a new csv file.). I have been trying to solve this all day and I simply can't. Your help would be greatly appreciated.
推荐答案
差异是什么意思?答案为您提供了两种截然不同的可能性.
What do you mean by difference? The answer to that gives you two distinct possibilities.
如果所有列都相同时认为某行相同,那么您可以通过以下代码得到答案:
If a row is considered same when all columns are same, then you can get your answer via the following code:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
print [row for row in oldList1 if row not in oldList2]
但是,如果两行相同且某个关键字段(即列)相同,那么以下代码将为您提供答案:
However, if two rows are same if a certain key field (i.e. column) is same, then the following code will give you your answer:
import csv
f1 = open ("olddata/file1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
oldList1.append(row)
f2 = open ("newdata/file2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
oldList2.append(row)
f1.close()
f2.close()
keyfield = 0 # Change this for choosing the column number
oldList2keys = [row[keyfield] for row in oldList2]
print [row for row in oldList1 if row[keyfield] not in oldList2keys]
注意: 对于超大文件,上述代码可能运行缓慢.相反,如果您希望通过散列加速代码,您可以在使用以下代码转换 oldList 后使用 set:
Note: The above code might run slow for extremely large files. If instead, you wish to speed up code through hashing, you can use set after converting the oldLists using the following code:
set1 = set(tuple(row) for row in oldList1)
set2 = set(tuple(row) for row in oldList2)
在这之后,你可以使用set1.difference(set2)
这篇关于比较 2 个单独的 csv 文件并将差异写入新的 csv 文件 - Python 2.7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)