我正在尝试学习如何使用 Python 的 multiprocessing
包,但我不明白 map_async
和 imap
之间的区别.我注意到 map_async
和 imap
都是异步执行的.那么我什么时候应该使用其中一个呢?以及我应该如何检索 map_async
返回的结果?
I'm trying to learn how to use Python's multiprocessing
package, but I don't understand the difference between map_async
and imap
.
I noticed that both map_async
and imap
are executed asynchronously. So when should I use one over the other? And how should I retrieve the result returned by map_async
?
我应该使用这样的东西吗?
Should I use something like this?
def test():
result = pool.map_async()
pool.close()
pool.join()
return result.get()
result=test()
for i in result:
print i
imap
/imap_unordered
和map
/有两个关键区别map_async
:
There are two key differences between imap
/imap_unordered
and map
/map_async
:
map
通过将可迭代对象转换为列表(假设它还不是列表)、将其分成块并将这些块发送到 中的工作进程来使用您的可迭代对象池
.将 iterable 分成块比在进程之间一次传递一个 item 的 iterable 中的每个项目执行得更好——尤其是在 iterable 很大的情况下.但是,将可迭代对象转换为列表以对其进行分块可能会产生非常高的内存成本,因为整个列表都需要保存在内存中.
map
consumes your iterable by converting the iterable to a list (assuming it isn't a list already), breaking it into chunks, and sending those chunks to the worker processes in the Pool
. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.
imap
不会将您提供的可迭代对象转换为列表,也不会将其分成块(默认情况下).它将一次迭代一个可迭代的元素,并将它们每个发送到一个工作进程.这意味着您不会将整个可迭代对象转换为列表而对内存造成影响,但这也意味着大型可迭代对象的性能较慢,因为缺少分块.但是,可以通过传递大于默认值 1 的 chunksize
参数来缓解这种情况.
imap
doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksize
argument larger than default of 1, however.
imap
/imap_unordered
和 map
/map_async
的另一个主要区别在于 imap
/imap_unordered
,您可以在工作人员准备好后立即开始接收结果,而不必等待所有工作人员完成.使用 map_async
,会立即返回 AsyncResult
,但在所有结果都被处理之前,您实际上无法从该对象中检索结果,此时它返回相同的结果map
的列表(map
实际上在内部实现为 map_async(...).get()
).没有办法得到部分结果;你要么得到全部结果,要么什么都没有.
The other major difference between imap
/imap_unordered
and map
/map_async
, is that with imap
/imap_unordered
, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async
, an AsyncResult
is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map
does (map
is actually implemented internally as map_async(...).get()
). There's no way to get partial results; you either have the entire result, or nothing.
imap
和 imap_unordered
都立即返回可迭代对象.使用 imap
,结果将在准备就绪后立即从可迭代对象中生成,同时仍保留输入可迭代对象的顺序.使用 imap_unordered
,结果将在它们准备好后立即生成,而不管输入迭代的顺序如何.所以,假设你有这个:
imap
and imap_unordered
both return iterables right away. With imap
, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. With imap_unordered
, results will be yielded as soon as they're ready, regardless of the order of the input iterable. So, say you have this:
import multiprocessing
import time
def func(x):
time.sleep(x)
return x + 2
if __name__ == "__main__":
p = multiprocessing.Pool()
start = time.time()
for x in p.imap(func, [1,5,3]):
print("{} (Time elapsed: {}s)".format(x, int(time.time() - start)))
这将输出:
3 (Time elapsed: 1s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
如果你使用 p.imap_unordered
而不是 p.imap
,你会看到:
If you use p.imap_unordered
instead of p.imap
, you'll see:
3 (Time elapsed: 1s)
5 (Time elapsed: 3s)
7 (Time elapsed: 5s)
如果你使用 p.map
或 p.map_async().get()
,你会看到:
If you use p.map
or p.map_async().get()
, you'll see:
3 (Time elapsed: 5s)
7 (Time elapsed: 5s)
5 (Time elapsed: 5s)
因此,使用 imap
/imap_unordered
而不是 map_async
的主要原因是:
So, the primary reasons to use imap
/imap_unordered
over map_async
are:
这篇关于multiprocessing.Pool:map_async 和 imap 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!