带有工作进程的 python 池

python Pool with worker Processes(带有工作进程的 python 池)
本文介绍了带有工作进程的 python 池的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我正在尝试使用进程对象在 python 中使用工作池.每个工人(一个进程)进行一些初始化(花费大量时间),传递一系列作业(理想情况下使用 map()),并返回一些东西.除此之外,不需要任何沟通.但是,我似乎无法弄清楚如何使用 map() 来使用我的工人的 compute() 函数.

I am trying to use a worker Pool in python using Process objects. Each worker (a Process) does some initialization (takes a non-trivial amount of time), gets passed a series of jobs (ideally using map()), and returns something. No communication is necessary beyond that. However, I can't seem to figure out how to use map() to use my worker's compute() function.

from multiprocessing import Pool, Process

class Worker(Process):
    def __init__(self):
        print 'Worker started'
        # do some initialization here
        super(Worker, self).__init__()

    def compute(self, data):
        print 'Computing things!'
        return data * data

if __name__ == '__main__':
    # This works fine
    worker = Worker()
    print worker.compute(3)

    # workers get initialized fine
    pool = Pool(processes = 4,
                initializer = Worker)
    data = range(10)
    # How to use my worker pool?
    result = pool.map(compute, data)

是作业队列代替,还是我可以使用 map()?

Is a job queue the way to go instead, or can I use map()?

推荐答案

我建议你为此使用队列.

I would suggest that you use a Queue for this.

class Worker(Process):
    def __init__(self, queue):
        super(Worker, self).__init__()
        self.queue = queue

    def run(self):
        print('Worker started')
        # do some initialization here

        print('Computing things!')
        for data in iter(self.queue.get, None):
            # Use data

现在您可以开始一堆这些,所有这些都从一个队列中获取工作

Now you can start a pile of these, all getting work from a single queue

request_queue = Queue()
for i in range(4):
    Worker(request_queue).start()
for data in the_real_source:
    request_queue.put(data)
# Sentinel objects to allow clean shutdown: 1 per worker.
for i in range(4):
    request_queue.put(None) 

这样的事情应该可以让您将昂贵的启动成本分摊给多个工人.

That kind of thing should allow you to amortize the expensive startup cost across multiple workers.

这篇关于带有工作进程的 python 池的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

groupby multiple coords along a single dimension in xarray(在xarray中按单个维度的多个坐标分组)
Group by and Sum in Pandas without losing columns(Pandas中的GROUP BY AND SUM不丢失列)
Is there a way of group by month in Pandas starting at specific day number?( pandas 有从特定日期开始的按月分组的方式吗?)
Group by + New Column + Grab value former row based on conditionals(GROUP BY+新列+基于条件的前一行抓取值)
Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)
Pandas - Group Rows based on a column and replace NaN with non-null values(PANAS-基于列对行进行分组,并将NaN替换为非空值)