我注意到 Python 的内置 sum 函数在对 1 000 000 个整数列表求和时比 for 循环快大约 3 倍:
I noticed that Python's built-in sum function is roughly 3x faster than a for loop when summing a list of 1 000 000 integers:
import timeit
def sum1():
s = 0
for i in range(1000000):
s += i
return s
def sum2():
return sum(range(1000000))
print 'For Loop Sum:', timeit.timeit(sum1, number=10)
print 'Built-in Sum:', timeit.timeit(sum2, number=10)
# Prints:
# For Loop Sum: 0.751425027847
# Built-in Sum: 0.266746997833
这是为什么呢?sum是如何实现的?
Why is that? How is sum implemented?
速度差异实际上大于 3 倍,但是您通过首先创建一个包含 100 万个整数的巨大内存列表来减慢任一版本的速度.将其与计时赛分开:
The speed difference is actually greater than 3 times, but you slow down either version by first creating a huge in-memory list of 1 million integers. Separate that out of the time trials:
>>> import timeit
>>> def sum1(lst):
... s = 0
... for i in lst:
... s += i
... return s
...
>>> def sum2(lst):
... return sum(lst)
...
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736
现在速度差已经上升到5倍以上了.
The speed difference has risen to over 5 times now.
for 循环作为解释的 Python 字节码执行.sum() 完全在 C 代码中循环.解释字节码和 C 代码之间的速度差异很大.
A for loop is executed as interpreted Python bytecode. sum() loops entirely in C code. The speed difference between interpreted bytecode and C code is large.
此外,如果 C 代码可以将总和保留在 C 类型中,则确保不会创建新的 Python 对象;这适用于 int 和 float 结果.
In addition, the C code makes sure not to create new Python objects if it can keep the sum in C types instead; this works for int and float results.
反汇编的 Python 版本是这样做的:
The Python version, disassembled, does this:
>>> import dis
>>> def sum1():
... s = 0
... for i in range(1000000):
... s += i
... return s
...
>>> dis.dis(sum1)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (s)
3 6 SETUP_LOOP 30 (to 39)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 2 (1000000)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (i)
4 25 LOAD_FAST 0 (s)
28 LOAD_FAST 1 (i)
31 INPLACE_ADD
32 STORE_FAST 0 (s)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
5 >> 39 LOAD_FAST 0 (s)
42 RETURN_VALUE
除了解释器循环比 C 慢之外,INPLACE_ADD 将创建一个新的整数对象(超过 255,CPython 将小的 int 对象缓存为单例).
Apart from the interpreter loop being slower than C, the INPLACE_ADD will create a new integer object (past 255, CPython caches small int objects as singletons).
您可以在Python mercurial 代码存储库,但它在评论中明确指出:
You can see the C implementation in the Python mercurial code repository, but it explicitly states in the comments:
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
这篇关于Python 内置 sum 函数 vs. for 循环性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
列表字典中的总和值Sum values in a dict of lists(列表字典中的总和值)
如何在 Python 中对文本文件中的数字求和How to sum numbers from a text file in Python(如何在 Python 中对文本文件中的数字求和)
什么是类似于 sum() 的减法函数,用于减去列表中What is a subtraction function that is similar to sum() for subtracting items in list?(什么是类似于 sum() 的减法函数,用于减去列表中的
Python 等价于 sum() 使用 xor()Python equivalent of sum() using xor()(Python 等价于 sum() 使用 xor())
python中的求和矩阵列sum matrix columns in python(python中的求和矩阵列)
N个列表元素的总和pythonsum of N lists element-wise python(N个列表元素的总和python)