我正在读取二进制文件(在本例中为 jpg),并且需要在该文件中找到一些值.对于那些感兴趣的人,二进制文件是一个 jpg,我试图通过寻找二进制结构来挑选它的尺寸 详细在这里.
I'm reading in a binary file (a jpg in this case), and need to find some values in that file. For those interested, the binary file is a jpg and I'm attempting to pick out its dimensions by looking for the binary structure as detailed here.
我需要在二进制数据中找到 FFC0,向前跳过一些字节,然后读取 4 个字节(这应该给我图像尺寸).
I need to find FFC0 in the binary data, skip ahead some number of bytes, and then read 4 bytes (this should give me the image dimensions).
在二进制数据中搜索值的好方法是什么?是否有相当于find"或类似 re 的东西?
What's a good way of searching for the value in the binary data? Is there an equivalent of 'find', or something like re?
您实际上可以将文件加载到一个字符串中,并使用 str.find 在该字符串中搜索字节序列
方法.它适用于任何字节序列.0xffc0
()
You could actually load the file into a string and search that string for the byte sequence 0xffc0
using the str.find()
method. It works for any byte sequence.
执行此操作的代码取决于几件事.如果您以二进制模式打开文件并且使用的是 Python 3(这两种方法都可能是这种情况下的最佳实践),您需要搜索字节字符串(而不是字符串),这意味着您必须在字符串前面加上 b
.
The code to do this depends on a couple things. If you open the file in binary mode and you're using Python 3 (both of which are probably best practice for this scenario), you'll need to search for a byte string (as opposed to a character string), which means you have to prefix the string with b
.
with open(filename, 'rb') as f:
s = f.read()
s.find(b'xffxc0')
如果您在 Python 3 中以文本模式打开文件,则必须搜索字符串:
If you open the file in text mode in Python 3, you'd have to search for a character string:
with open(filename, 'r') as f:
s = f.read()
s.find('xffxc0')
虽然没有特别的理由这样做.与以前的方式相比,它不会给您带来任何优势,并且如果您使用的平台处理二进制文件和文本文件的方式不同(例如 Windows),这可能会导致问题.
though there's no particular reason to do this. It doesn't get you any advantage over the previous way, and if you're on a platform that treats binary files and text files differently (e.g. Windows), there is a chance this will cause problems.
Python 2 没有区分字节串和字符串,所以如果你使用那个版本,在 中包含还是排除
.而且,如果您的平台对二进制文件和文本文件的处理方式相同(例如 Mac 或 Linux),则无论您使用 b
都没有关系b'xffxc0''r'
还是 'rb'
作为文件都没有关系模式.但我仍然建议使用类似于上面第一个代码示例的东西,只是为了向前兼容 - 如果您确实切换到 Python 3,那么修复它就少了一件事情.
Python 2 doesn't make the distinction between byte strings and character strings, so if you're using that version, it doesn't matter whether you include or exclude the b
in b'xffxc0'
. And if your platform treats binary files and text files identically (e.g. Mac or Linux), it doesn't matter whether you use 'r'
or 'rb'
as the file mode either. But I'd still recommend using something like the first code sample above just for forward compatibility - in case you ever do switch to Python 3, it's one less thing to fix.
这篇关于在 Python 中搜索/读取二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!