问题描述
我正在使用 Google 云存储客户端库.
I am using the Google Cloud Storage Client Library.
我正在尝试使用以下代码打开和处理一个 CSV 文件(已上传到存储桶):
I am trying to open and process a CSV file (that was already uploaded to a bucket) using code like:
filename = '/<my_bucket/data.csv'
with gcs.open(filename, 'r') as gcs_file:
csv_reader = csv.reader(gcs_file, delimiter=',', quotechar='"')
响应 csv.reader(即 gcs_file)的第一个参数时,我收到错误参数 1 必须是迭代器".显然 gcs_file 不支持迭代器 .next 方法.
I get the error "argument 1 must be an iterator" in response to the first argument to csv.reader (i.e. the gcs_file). Apparently the gcs_file doesn't support the iterator .next method.
关于如何进行的任何想法?我需要包装 gcs_file 并在其上创建一个迭代器还是有更简单的方法?
Any ideas on how to proceed? Do I need to wrap the gcs_file and create an iterator on it or is there an easier way?
推荐答案
我认为最好有自己的为 csv.reader 设计的包装器/迭代器.如果 gcs_file 要支持 Iterator 协议,不清楚是什么next() 应该返回以始终适应其消费者.
I think it's better you have your own wrapper/iterator designed for csv.reader. If gcs_file was to support Iterator protocol, it is not clear what next() should return to always accommodate its consumer.
根据 csv reader doc,它
According to csv reader doc, it
返回一个读取器对象,它将遍历给定 csvfile 中的行.csvfile 可以是任何支持迭代器协议并在每次调用其 next() 方法时返回一个字符串的对象——文件对象和列表对象都适用.如果 csvfile 是文件对象,则必须在不同的平台上使用b"标志打开它.
Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable. If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
它需要来自底层文件的一大块原始字节,不一定是一行.你可以有这样的包装器(未测试):
It expects a chunk of raw bytes from the underlying file, not necessarily a line. You can have a wrapper like this (not tested):
class CsvIterator(object)
def __init__(self, gcs_file, chunk_size):
self.gcs_file = gcs_file
self.chunk_size = chunk_size
def __iter__(self):
return self
def next(self):
result = self.gcs_file.read(size=self.chunk_size)
if not result:
raise StopIteration()
return result
关键是一次读取一个块,这样当你有一个大文件时,你不会炸毁内存或遇到 urlfetch 超时.
The key is to read a chunk at a time so that when you have a large file, you don't blow up memory or experience timeout from urlfetch.
甚至更简单.要使用 iter 内置:
Or even simpler. To use iter built in:
csv.reader(iter(gcs_file.readline, ''))
这篇关于如何使用 Python 打开和处理存储在 Google Cloud Storage 中的 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)