问题描述
为什么下面的代码只适用于multiprocessing.dummy,而不适用于简单的multiprocessing.
Why does the code below work only with multiprocessing.dummy, but not with simple multiprocessing.
import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool
urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']
if __name__ == '__main__':
with Pool(5) as p:
results = p.map(urllib.request.urlopen, urls)
错误:
Traceback (most recent call last):
File "urlthreads.py", line 31, in <module>
results = p.map(urllib.request.urlopen, urls)
File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
缺少什么才能在没有虚拟"的情况下工作?
What's missing so that it works without "dummy" ?
推荐答案
你从 urlopen() 得到的 http.client.HTTPResponse-object 有一个 >_io.BufferedReader - 附加对象,这个对象不能被pickle.
The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled.
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object
multiprocessing.Pool 将需要腌制(序列化)结果以将其发送回父进程,但此处失败.由于 dummy 使用线程而不是进程,因此不会出现酸洗,因为同一进程中的线程自然共享它们的内存.
multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.
这个TypeError的一般解决方案是:
A general solution to this TypeError is:
- 读出缓冲区并保存内容(如果需要)
- 从您尝试腌制的对象中删除对
'_io.BufferedReader'的引用
在您的情况下,在 http.client.HTTPResponse 上调用 .read() 将清空并删除缓冲区,因此是用于将响应转换为可腌制内容的函数可以这样做:
In your case, calling .read() on the http.client.HTTPResponse will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:
def read_buffer(response):
response.text = response.read()
return response
例子:
r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'x80x03chttp.client
HTTPResponse...
在考虑这种方法之前,请确保您确实想要使用多处理而不是多线程.对于像您在此处拥有的 I/O 绑定任务,多线程就足够了,因为无论如何大部分时间都花在等待响应上(不需要 cpu 时间).多处理和所涉及的 IPC 也会带来大量开销.
Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.
这篇关于multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!


大气响应式网络建站服务公司织梦模板
高端大气html5设计公司网站源码
织梦dede网页模板下载素材销售下载站平台(带会员中心带筛选)
财税代理公司注册代理记账网站织梦模板(带手机端)
成人高考自考在职研究生教育机构网站源码(带手机端)
高端HTML5响应式企业集团通用类网站织梦模板(自适应手机端)