我有一个这样定义的多行字符串:
I have a multi-line string defined like this:
foo = """
this is
a multi-line string.
"""
我们将这个字符串用作我正在编写的解析器的测试输入.解析器函数接收一个 file 对象作为输入并对其进行迭代.它还直接调用 next() 方法来跳过行,所以我真的需要一个迭代器作为输入,而不是一个可迭代的.我需要一个迭代器来迭代该字符串的各个行,就像 file-object 将遍历文本文件的行一样.我当然可以这样做:
This string we used as test-input for a parser I am writing. The parser-function receives a file-object as input and iterates over it. It does also call the next() method directly to skip lines, so I really need an iterator as input, not an iterable.
I need an iterator that iterates over the individual lines of that string like a file-object would over the lines of a text-file. I could of course do it like this:
lineiterator = iter(foo.splitlines())
有没有更直接的方法?在这种情况下,字符串必须遍历一次以进行拆分,然后再由解析器遍历.在我的测试用例中没关系,因为那里的字符串很短,我只是出于好奇而问.Python 为这些东西提供了很多有用且高效的内置函数,但我找不到适合这种需要的东西.
Is there a more direct way of doing this? In this scenario the string has to traversed once for the splitting, and then again by the parser. It doesn't matter in my test-case, since the string is very short there, I am just asking out of curiosity. Python has so many useful and efficient built-ins for such stuff, but I could find nothing that suits this need.
这里有三种可能:
foo = """
this is
a multi-line string.
"""
def f1(foo=foo): return iter(foo.splitlines())
def f2(foo=foo):
retval = ''
for char in foo:
retval += char if not char == '
' else ''
if char == '
':
yield retval
retval = ''
if retval:
yield retval
def f3(foo=foo):
prevnl = -1
while True:
nextnl = foo.find('
', prevnl + 1)
if nextnl < 0: break
yield foo[prevnl + 1:nextnl]
prevnl = nextnl
if __name__ == '__main__':
for f in f1, f2, f3:
print list(f())
将其作为主脚本运行可确认这三个功能是等效的.使用 timeit (以及 * 100 用于 foo 以获得大量字符串以进行更精确的测量):
Running this as the main script confirms the three functions are equivalent. With timeit (and a * 100 for foo to get substantial strings for more precise measurement):
$ python -mtimeit -s'import asp' 'list(asp.f3())'
1000 loops, best of 3: 370 usec per loop
$ python -mtimeit -s'import asp' 'list(asp.f2())'
1000 loops, best of 3: 1.36 msec per loop
$ python -mtimeit -s'import asp' 'list(asp.f1())'
10000 loops, best of 3: 61.5 usec per loop
请注意,我们需要调用 list() 来确保遍历迭代器,而不仅仅是构建迭代器.
Note we need the list() call to ensure the iterators are traversed, not just built.
IOW,天真的实现快得多,甚至都不好笑:比我尝试使用 find 调用的速度快 6 倍,而后者又比较低级别的方法快 4 倍.
IOW, the naive implementation is so much faster it isn't even funny: 6 times faster than my attempt with find calls, which in turn is 4 times faster than a lower-level approach.
要记住的教训:测量总是一件好事(但必须准确);像 splitlines 这样的字符串方法以非常快的方式实现;通过在非常低的级别编程(尤其是通过非常小片段的 += 循环)将字符串放在一起可能会很慢.
Lessons to retain: measurement is always a good thing (but must be accurate); string methods like splitlines are implemented in very fast ways; putting strings together by programming at a very low level (esp. by loops of += of very small pieces) can be quite slow.
编辑:添加了@Jacob 的建议,稍作修改以提供与其他建议相同的结果(保留一行尾随空格),即:
Edit: added @Jacob's proposal, slightly modified to give the same results as the others (trailing blanks on a line are kept), i.e.:
from cStringIO import StringIO
def f4(foo=foo):
stri = StringIO(foo)
while True:
nl = stri.readline()
if nl != '':
yield nl.strip('
')
else:
raise StopIteration
测量给出:
$ python -mtimeit -s'import asp' 'list(asp.f4())'
1000 loops, best of 3: 406 usec per loop
不如基于 .find 的方法好——仍然值得牢记,因为它可能不太容易出现小错误(任何你看到出现+1 和 -1,就像我上面的 f3 一样,应该自动触发一对一的怀疑——许多缺乏这种调整的循环也应该有这些调整——尽管我相信我的代码是也是正确的,因为我能够使用其他功能检查它的输出').
not quite as good as the .find based approach -- still, worth keeping in mind because it might be less prone to small off-by-one bugs (any loop where you see occurrences of +1 and -1, like my f3 above, should automatically trigger off-by-one suspicions -- and so should many loops which lack such tweaks and should have them -- though I believe my code is also right since I was able to check its output with other functions').
但基于拆分的方法仍然适用.
But the split-based approach still rules.
顺便说一句:f4 可能更好的样式是:
An aside: possibly better style for f4 would be:
from cStringIO import StringIO
def f4(foo=foo):
stri = StringIO(foo)
while True:
nl = stri.readline()
if nl == '': break
yield nl.strip('
')
至少,它不那么冗长了.不幸的是,需要去除尾随
的需要禁止用 return iter(stri) 更清晰、更快速地替换 while 循环(>iter 部分在现代版本的 Python 中是多余的,我相信从 2.3 或 2.4 开始,但它也是无害的).也许也值得一试:
at least, it's a bit less verbose. The need to strip trailing
s unfortunately prohibits the clearer and faster replacement of the while loop with return iter(stri) (the iter part whereof is redundant in modern versions of Python, I believe since 2.3 or 2.4, but it's also innocuous). Maybe worth trying, also:
return itertools.imap(lambda s: s.strip('
'), stri)
或其变体——但我在这里停下来,因为它几乎是一个基于 strip 的理论练习,最简单,最快,一个.
or variations thereof -- but I'm stopping here since it's pretty much a theoretical exercise wrt the strip based, simplest and fastest, one.
这篇关于遍历字符串的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何在python中的感兴趣区域周围绘制一个矩形How to draw a rectangle around a region of interest in python(如何在python中的感兴趣区域周围绘制一个矩形)
如何使用 OpenCV 检测和跟踪人员?How can I detect and track people using OpenCV?(如何使用 OpenCV 检测和跟踪人员?)
如何在图像的多个矩形边界框中应用阈值?How to apply threshold within multiple rectangular bounding boxes in an image?(如何在图像的多个矩形边界框中应用阈值?)
如何下载 Coco Dataset 的特定部分?How can I download a specific part of Coco Dataset?(如何下载 Coco Dataset 的特定部分?)
根据文本方向检测图像方向角度Detect image orientation angle based on text direction(根据文本方向检测图像方向角度)
使用 Opencv 检测图像中矩形的中心和角度Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 检测图像中矩形的中心和角度)