我最近在姐妹网站上回答了一个
为什么手动 for
循环如此之快?它几乎比使用 sum
快两倍.而且由于内置的 sum
应该比手动对列表求和快大约五倍(根据
sum
相当快,但 sum
并不是导致速度变慢的原因.三个主要因素导致放缓:
sum
使用其整数快速路径.与列表推导相比,生成器有两个主要优势:它们占用的内存少得多,并且如果不需要所有元素,它们可以提前终止.它们不是旨在在需要所有元素的情况下提供时间优势.每个元素暂停和恢复一次生成器非常昂贵.
如果我们用列表推导替换 genexp:
在 [66]: def f1(x):.....:返回总和(c in '02468' for c in str(x)).....:在 [67] 中:定义 f2(x):.....:返回总和([c in '02468' for c in str(x)]).....:在 [68] 中:x = int('1234567890'*50)在 [69] 中:%timeit f1(x)10000 次循环,5 次中的最佳:每个循环 52.2 µs在 [70] 中:%timeit f2(x)10000 次循环,5 次中的最佳:每个循环 40.5 µs
我们看到了立竿见影的加速,但代价是在列表上浪费了大量内存.
<小时>如果您查看您的 genexp 版本:
def count_even_digits_spyr03_sum(n):return sum(c in "02468" for c in str(n))
你会发现它没有if
.它只是将布尔值扔到 sum
中.相反,您的循环:
def count_even_digits_spyr03_for(n):计数 = 0对于 str(n) 中的 c:如果 c 在02468"中:计数 += 1返回计数
仅当数字为偶数时才添加任何内容.
如果我们将之前定义的 f2
更改为也包含 if
,我们会看到另一个加速:
在 [71]: def f3(x):.....: return sum([True for c in str(x) if c in '02468']).....:在 [72] 中:%timeit f3(x)10000 次循环,最好的 5 次:每个循环 34.9 µs
f1
,与您的原始代码相同,耗时 52.2 µs,而 f2
,仅更改列表理解,耗时 40.5 µs.
在 f3
中使用 True
而不是 1
可能看起来很尴尬.这是因为将其更改为 1
会激活最终加速.sum
有一个 快速路径 用于整数,但快速路径仅对类型为 int
的对象激活.bool
不算数.这是检查项目是否为 int
类型的行:
if (PyLong_CheckExact(item)) {
一旦我们做出最后的改变,将 True
改为 1
:
在 [73]: def f4(x):.....:返回总和([1 for c in str(x) if c in '02468']).....:在 [74] 中:%timeit f4(x)10000 次循环,5 次中的最佳:每个循环 33.3 µs
我们看到了最后一个小幅加速.
<小时>那么,毕竟,我们是否击败了显式循环?
在 [75]: def explicit_loop(x):.....:计数 = 0....:对于 str(x) 中的 c:.....:如果'02468'中的c:....:计数 += 1.....:返回计数.....:在 [76] 中:%timeit 显式循环(x)10000 次循环,5 次中的最佳:每个循环 32.7 µs
不.我们大致收支平衡,但我们没有打败它.剩下的大问题是列表.构建它很昂贵,并且 sum
必须通过列表迭代器来检索元素,这有其自身的成本(尽管我认为这部分非常便宜).不幸的是,只要我们通过 test-digits-and-call-sum
方法,我们就没有任何好的方法来摆脱列表.显式循环获胜.
我们还能走得更远吗?好吧,到目前为止,我们一直在尝试使 sum
更接近显式循环,但是如果我们被这个愚蠢的列表所困扰,我们可能会偏离显式循环并调用 len
而不是 sum
:
def f5(x):return len([1 for c in str(x) if c in '02468'])
单独测试数字也不是我们可以尝试打破循环的唯一方法.进一步偏离显式循环,我们还可以尝试 str.count
.str.count
直接在 C 中迭代字符串的缓冲区,避免了很多包装对象和间接.我们需要调用它 5 次,对字符串进行 5 次传递,但仍然有回报:
def f6(x):s = str(x)return sum(s.count(c) for c in '02468')
不幸的是,这就是我用来计时的网站因为使用太多资源而让我陷入tarpit"的时候,所以我不得不切换网站.以下时序无法与上述时序直接比较:
>>>导入时间>>>定义 f(x):... return sum([1 for c in str(x) if c in '02468'])...>>>定义 g(x):... return len([1 for c in str(x) if c in '02468'])...>>>定义 h(x):... s = str(x)... return sum(s.count(c) for c in '02468')...>>>x = int('1234567890'*50)>>>timeit.timeit(lambda: f(x), number=10000)0.331528635986615>>>timeit.timeit(lambda: g(x), number=10000)0.30292080697836354>>>timeit.timeit(lambda: h(x), number=10000)0.15950968803372234>>>def 显式循环(x):...计数 = 0... 对于 str(x) 中的 c:... 如果 c 在 '02468' 中:...计数 += 1...返回计数...>>>timeit.timeit(lambda:explicit_loop(x), number=10000)0.3305045129964128
I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):
def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count
def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))
In addition I looked at using a list comprehension and list.count
:
def count_even_digits_spyr03_list(n):
return [c in "02468" for c in str(n)].count(True)
The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum
. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:
Why is the manual for
loop so much faster? It's almost a factor two faster than using sum
. And since the built-in sum
should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?
Using an if
as a filter like so:
def count_even_digits_spyr03_sum2(n):
return sum(1 for c in str(n) if c in "02468")
Improves the timing only to the same level as the list comprehension.
When extending the timings to larger numbers, and normalizing to the for
loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n)
takes:
sum
is quite fast, but sum
isn't the cause of the slowdown. Three primary factors contribute to the slowdown:
sum
from using its integer fast path.Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.
If we replace the genexp with a list comprehension:
In [66]: def f1(x):
....: return sum(c in '02468' for c in str(x))
....:
In [67]: def f2(x):
....: return sum([c in '02468' for c in str(x)])
....:
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop
we see an immediate speedup, at the cost of wasting a bunch of memory on a list.
If you look at your genexp version:
def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))
you'll see it has no if
. It just throws booleans into sum
. In constrast, your loop:
def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count
only adds anything if the digit is even.
If we change the f2
defined earlier to also incorporate an if
, we see another speedup:
In [71]: def f3(x):
....: return sum([True for c in str(x) if c in '02468'])
....:
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop
f1
, identical to your original code, took 52.2 µs, and f2
, with just the list comprehension change, took 40.5 µs.
It probably looked pretty awkward using True
instead of 1
in f3
. That's because changing it to 1
activates one final speedup. sum
has a fast path for integers, but the fast path only activates for objects whose type is exactly int
. bool
doesn't count. This is the line that checks that items are of type int
:
if (PyLong_CheckExact(item)) {
Once we make the final change, changing True
to 1
:
In [73]: def f4(x):
....: return sum([1 for c in str(x) if c in '02468'])
....:
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop
we see one last small speedup.
So after all that, do we beat the explicit loop?
In [75]: def explicit_loop(x):
....: count = 0
....: for c in str(x):
....: if c in '02468':
....: count += 1
....: return count
....:
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop
Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum
has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum
approach, we don't have any good way to get rid of the list. The explicit loop wins.
Can we go further anyway? Well, we've been trying to bring the sum
closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len
instead of sum
:
def f5(x):
return len([1 for c in str(x) if c in '02468'])
Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count
. str.count
iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:
def f6(x):
s = str(x)
return sum(s.count(c) for c in '02468')
Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:
>>> import timeit
>>> def f(x):
... return sum([1 for c in str(x) if c in '02468'])
...
>>> def g(x):
... return len([1 for c in str(x) if c in '02468'])
...
>>> def h(x):
... s = str(x)
... return sum(s.count(c) for c in '02468')
...
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
... count = 0
... for c in str(x):
... if c in '02468':
... count += 1
... return count
...
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128
这篇关于为什么`for`循环计算真值的速度如此之快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!