我有一个当前正在运行的模拟,但 ETA 大约需要 40 小时 - 我正在尝试通过多处理来加速它.
I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
它本质上迭代了一个变量 (L) 的 3 个值,以及第二个变量 (a) 的 99 个值.使用这些值,它实际上运行了一个复杂的模拟并返回 9 个不同的标准偏差.因此(尽管我还没有这样编码)它本质上是一个函数,它接受两个值作为输入 (L,a) 并返回 9 个值.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
这是我拥有的代码的精髓:
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
以下是我可以修改的内容:
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
由于每个模拟都是独立的,因此它似乎是实现某种多线程/处理的理想场所.
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
我将如何编写这个代码?
How exactly would I go about coding this?
另外,是否所有内容都会按顺序返回到主列表,或者如果多个进程正在工作,它可能会出现故障?
Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
编辑 2:这是我的代码——但它运行不正确.它询问我是否想在我运行程序后立即终止它.
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
编辑 3:还有什么是多处理的限制.我听说它不能在 OS X 上运行.这是正确的吗?
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?
dataf处理一个元组并返回一个结果p = multiprocessing.Pool() 对象.results = p.map(f, data)data of those tuplesf to process one tuple and return one resultp = multiprocessing.Pool() object.results = p.map(f, data)这将运行尽可能多的 f 实例,因为您的机器在不同进程中拥有内核.
This will run as many instances of f as your machine has cores in separate processes.
Edit1:示例:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
多处理应该可以在 OSX 等类 UNIX 平台上正常工作.只有缺少 os.fork 的平台(主要是 MS Windows)需要特别注意.但即使在那里它仍然有效.请参阅多处理文档.
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.
这篇关于具有单个函数的 Python 多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
Python 多处理模块的 .join() 方法到底在做什么?What exactly is Python multiprocessing Module#39;s .join() Method Doing?(Python 多处理模块的 .join() 方法到底在做什么?)
在 Python 中将多个参数传递给 pool.map() 函数Passing multiple parameters to pool.map() function in Python(在 Python 中将多个参数传递给 pool.map() 函数)
multiprocessing.pool.MaybeEncodingError: 'TypeError("multiprocessing.pool.MaybeEncodingError: #39;TypeError(quot;cannot serialize #39;_io.BufferedReader#39; objectquot;,)#39;(multiprocessing.pool.MaybeEnc
Python 多进程池.当其中一个工作进程确定不再需要Python Multiprocess Pool. How to exit the script when one of the worker process determines no more work needs to be done?(Python 多进程池.当其中一
如何将队列引用传递给 pool.map_async() 管理的函数How do you pass a Queue reference to a function managed by pool.map_async()?(如何将队列引用传递给 pool.map_async() 管理的函数?)
与多处理错误的另一个混淆,“模块"对象没yet another confusion with multiprocessing error, #39;module#39; object has no attribute #39;f#39;(与多处理错误的另一个混淆,“模块对象