具有单个函数的 Python 多处理

2023-03-13Python开发问题

本文介绍了具有单个函数的 Python 多处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我有一个当前正在运行的模拟，但 ETA 大约需要 40 小时 - 我正在尝试通过多处理来加速它.

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.

它本质上迭代了一个变量 (L) 的 3 个值，以及第二个变量 (a) 的 99 个值.使用这些值，它实际上运行了一个复杂的模拟并返回 9 个不同的标准偏差.因此(尽管我还没有这样编码)它本质上是一个函数，它接受两个值作为输入 (L,a) 并返回 9 个值.

It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.

这是我拥有的代码的精髓:

Here is the essence of the code I have:

STD_1 = []
STD_2 = []
# etc.

for L in range(0,6,2):
    for a in range(1,100):
        ### simulation code ###
        STD_1.append(value_1)
        STD_2.append(value_2)
        # etc.

以下是我可以修改的内容:

Here is what I can modify it to:

master_list = []

def simulate(a,L):
    ### simulation code ###
    return (a,L,STD_1, STD_2 etc.)

for L in range(0,6,2):
    for a in range(1,100): 
        master_list.append(simulate(a,L))

由于每个模拟都是独立的，因此它似乎是实现某种多线程/处理的理想场所.

Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.

我将如何编写这个代码?

How exactly would I go about coding this?

另外，是否所有内容都会按顺序返回到主列表，或者如果多个进程正在工作，它可能会出现故障?

Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?

编辑 2:这是我的代码——但它运行不正确.它询问我是否想在我运行程序后立即终止它.

EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.

import multiprocessing

data = []

for L in range(0,6,2):
    for a in range(1,100):
        data.append((L,a))

print (data)

def simulation(arg):
    # unpack the tuple
    a = arg[1]
    L = arg[0]
    STD_1 = a**2
    STD_2 = a**3
    STD_3 = a**4
    # simulation code #
    return((STD_1,STD_2,STD_3))

print("1")

p = multiprocessing.Pool()

print ("2")

results = p.map(simulation, data)

编辑 3:还有什么是多处理的限制.我听说它不能在 OS X 上运行.这是正确的吗?

EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

推荐答案

将每次迭代的数据包装成一个元组.
列出这些元组的data
编写函数f处理一个元组并返回一个结果
创建 p = multiprocessing.Pool() 对象.
调用results = p.map(f, data)

Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)

这将运行尽可能多的 f 实例，因为您的机器在不同进程中拥有内核.

This will run as many instances of f as your machine has cores in separate processes.

Edit1:示例:

from multiprocessing import Pool

data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]

def f(t):
    name, a, b, c = t
    return (name, a + b + c)

p = Pool()
results = p.map(f, data)
print results

多处理应该可以在 OSX 等类 UNIX 平台上正常工作.只有缺少 os.fork 的平台(主要是 MS Windows)需要特别注意.但即使在那里它仍然有效.请参阅多处理文档.

Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

这篇关于具有单个函数的 Python 多处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

The End

相关推荐

在xarray中按单个维度的多个坐标分组

Pandas中的GROUP BY AND SUM不丢失列

GROUP BY+新列+基于条件的前一行抓取值

PANDA中的Groupby算法和插值算法

PANAS-基于列对行进行分组，并将NaN替换为非空值

按10分钟间隔对 pandas 数据帧进行分组

热门文章

热门精品源码

最新VIP资源